Length of Longest Substring
Find the longest non-repeating substring of an input string.
The problem may be found on leetcode.
Analysis
A basic example of a search with linear time complexity is
abcabcbb -> abcabcbb --> abcabcbb --> abcabcbb --> abcabcbb --> abcabcbb
^ ^^ ^^^ ^^^ ^^^ ^^^
k=0 k=1 k=2 k=3 k=4 k=5 start=0 start=0 start=0 start=1 start=2 start=3
It would appear that
- the end of the range should be incremented on every loop,
- when a character in the selected range is encountered again, the start of the range should be incremented from its last occurrence.
However, with strings with many repeats, it is apparent that there are some subtleties to this approach:
foofoofiifii --> foofoofiifii --> foofoofiifii --> foofoofiifii --> foofoofiifii
^ ^^ ^ ^^^^ ^^^^
k=0 k=1 k=2 k=3 k=4
start=0 start=0 start=2 start=1 start=4
lastseen: lastseen: lastseen:
f: 0 f: 0 f: 0 o: 1 o: 2
The step where k=3
makes it clear that backtracking start
is not an option, and that
foofoofiifii --> foofoofiifii --> foofoofiifii --> foofoofiifii --> foofoofiifii
^ ^^ ^ ^^ ^
k=0 k=1 k=2 k=3 k=4 start=0 start=0 start=2 start=2 start=4
A modification to this fixes the problem:
- when a character in the selected range is encoutered again, the start of the range should be incremented from its last occurrence unless this new value is less than the current value of
start
, in which casestart
should be maintained.
Implementation
My initial solution was
class SolutionInitial:
def longestSubstring(self, s: str) -> tuple[str, int]:
dict[str, int] = dict()
seen:
= 0
start = 0, 0
best_start, best_stop = -1
best_size
for k, char in enumerate(s):
if (char_last_seen := seen.get(char)) is not None:
= max(start, char_last_seen + 1)
start
if (diff := k - start) > best_size:
# print("-->", "new best")
= start, k
best_start, best_stop = diff
best_size
= k
seen[char] # print(s)
# print(((start) * " ") + ((diff + 1) * "^"))
if best_size == -1:
return "", 0
return s[best_start : best_stop + 1], 1 + best_stop - best_start
def lengthOfLongestSubstring(self, s: str) -> int:
= self.longestSubstring(s)
_, best_size return best_size
which performed reasonably well in runtime (\(75\%\) quantile) and an abysmal \(8\%\) quantile in memory. Some small modifications made my solution much more performant, with \(93\%\) quantile runtime and \(39\%\) runtime:
class SolutionInitial:
def longestSubstring(self, s: str) -> tuple[str, int]:
dict[str, int] = dict()
seen:
= 0
start = 0, 0
best_start, best_stop = -1
best_size
for k, char in enumerate(s):
if (char_last_seen := seen.get(char)) is not None:
= max(start, char_last_seen + 1)
start
if (diff := k - start) > best_size:
# print("-->", "new best")
= start, k
best_start, best_stop = diff
best_size
= k
seen[char] # print(s)
# print(((start) * " ") + ((diff + 1) * "^"))
if best_size == -1:
return "", 0
return s[best_start : best_stop + 1], 1 + best_stop - best_start
def lengthOfLongestSubstring(self, s: str) -> int:
= self.longestSubstring(s)
_, best_size return best_size
It would appear that seen
is the main issue with memory usage, however other options probably have performance costs for searching (for instance maintaining a copy of the longest nonrepeating substring instead of its bounds).