Length of Longest Substring
Find the longest non-repeating substring of an input string.
The problem may be found on leetcode.
Analysis
A basic example of a search with linear time complexity is
abcabcbb -> abcabcbb --> abcabcbb --> abcabcbb --> abcabcbb --> abcabcbb
^ ^^ ^^^ ^^^ ^^^ ^^^
k=0 k=1 k=2 k=3 k=4 k=5
start=0 start=0 start=0 start=1 start=2 start=3It would appear that
- the end of the range should be incremented on every loop,
- when a character in the selected range is encountered again, the start of the range should be incremented from its last occurrence.
However, with strings with many repeats, it is apparent that there are some subtleties to this approach:
foofoofiifii --> foofoofiifii --> foofoofiifii --> foofoofiifii --> foofoofiifii
^ ^^ ^ ^^^^ ^^^^
k=0 k=1 k=2 k=3 k=4
start=0 start=0 start=2 start=1 start=4
lastseen: lastseen: lastseen:
f: 0 f: 0 f: 0
o: 1 o: 2The step where k=3 makes it clear that backtracking start is not an option, and that
foofoofiifii --> foofoofiifii --> foofoofiifii --> foofoofiifii --> foofoofiifii
^ ^^ ^ ^^ ^
k=0 k=1 k=2 k=3 k=4
start=0 start=0 start=2 start=2 start=4A modification to this fixes the problem:
- when a character in the selected range is encoutered again, the start of the range should be incremented from its last occurrence unless this new value is less than the current value of
start, in which casestartshould be maintained.
Implementation
My initial solution was
class SolutionInitial:
def longestSubstring(self, s: str) -> tuple[str, int]:
seen: dict[str, int] = dict()
start = 0
best_start, best_stop = 0, 0
best_size = -1
for k, char in enumerate(s):
if (char_last_seen := seen.get(char)) is not None:
start = max(start, char_last_seen + 1)
if (diff := k - start) > best_size:
# print("-->", "new best")
best_start, best_stop = start, k
best_size = diff
seen[char] = k
# print(s)
# print(((start) * " ") + ((diff + 1) * "^"))
if best_size == -1:
return "", 0
return s[best_start : best_stop + 1], 1 + best_stop - best_start
def lengthOfLongestSubstring(self, s: str) -> int:
_, best_size = self.longestSubstring(s)
return best_sizewhich performed reasonably well in runtime (\(75\%\) quantile) and an abysmal \(8\%\) quantile in memory. Some small modifications made my solution much more performant, with \(93\%\) quantile runtime and \(39\%\) runtime:
class SolutionInitial:
def longestSubstring(self, s: str) -> tuple[str, int]:
seen: dict[str, int] = dict()
start = 0
best_start, best_stop = 0, 0
best_size = -1
for k, char in enumerate(s):
if (char_last_seen := seen.get(char)) is not None:
start = max(start, char_last_seen + 1)
if (diff := k - start) > best_size:
# print("-->", "new best")
best_start, best_stop = start, k
best_size = diff
seen[char] = k
# print(s)
# print(((start) * " ") + ((diff + 1) * "^"))
if best_size == -1:
return "", 0
return s[best_start : best_stop + 1], 1 + best_stop - best_start
def lengthOfLongestSubstring(self, s: str) -> int:
_, best_size = self.longestSubstring(s)
return best_sizeIt would appear that seen is the main issue with memory usage, however other options probably have performance costs for searching (for instance maintaining a copy of the longest nonrepeating substring instead of its bounds).