Daily Question 09/25/2024: Sum of Prefix Scores of Strings

Use a trie to count the number of occurrences of a prefix in a list.

Author

Adrian Cederberg

Published

September 25, 2024

Modified

September 25, 2024

The problem statement may be found on leetcode.

Analysis

Dynamic Programming Solution

This is pretty clearly a situation where using a trie is very helpful. I already implemented a trie for yesterdays problem but decided to update the terminates attribute to keep count of repeated additions since repeats may show up in the list. More or less, the answer is the sum of terminates on the subtree for each prefix for each word. For instance

words: ["ab", "abc", "abf", "b", "bc"]
drawing: |
             a  b
            /   ^\
           b      c
          /^\     ^
         c   f
         ^   ^
solution: [6, 7, 1, 2, 3]
#[3 + 3, 3 + 3 + 1, 3 + 3 + 1, 2, 2 + 1]

Counting the Number of Entries for a Subtree.

Given a prefix, the tree should be traversed until this prefix is matched. If it is not, then return 0.
Count the number of terminating nodes on the subtrees.

Counting the Number of Entries for a Subtree for a Word

Start with the longest prefix. Count all entries bellow and memoize or use memoized result.
Truncate the prefix by one. Count all entries bellow that. This will involve calculating entries in any sibling trees, which should be remembered.
Repeat 2.

Repeated Words

When a word is repeated, it is useful to use the terminates attribute as an integer.

Improved Solution

Since the terminates value is incremented every time a value that has occurred is reinserted, then it makes since to make a trie of prefixes, e.g.

  a   ->       a   ->    a
 /^           /^        /^
b 1          b 2       b 3
^           /^        /^\
1          c 2       c 3 f
           ^         ^   ^
           1         1   1

and determine the sum over the path to each word.

Solution

The trie implementation is as follows:

class TrieNode:
    terminates: int
    children: dict[str, Self]

    def __init__(
        self,
        children: dict[str, Self],
        terminates: int = False,
    ):
        self.children = children
        self.terminates = terminates

The Top Down Dynamic Programming Solution

The following solution works, but can be optimized. Unfortunately, in one case the solution ran out of memory. This goes to show that dynamic programming is not always the solution.

class SolutionInitial:

    def sumPrefixScores(self, words: list[str]) -> list[int]:

        def countOne(pfx: str, *, node: TrieNode | None = None) -> int:
            """Count for a specific prefix."""

            if pfx in memo:
                return memo[pfx]

            # NOTE: Traverse until prefix is done. If the prefix cannot be
            #       traversed through, then no words start with the prefix.
            if node is None:
                node = root
                for char in pfx:
                    if char not in node.children:
                        memo[pfx] = 0
                        return 0

                    node = node.children[char]

            # NOTE: If the current node is a terminating node, then one word
            #       is matched.
            count = 0
            if node.terminates:
                count += node.terminates

            # NOTE: Count the number of terminating nodes bellow path. This
            #       should be the sum for each subtree.
            for char in node.children:
                count += countOne(pfx + char, node=node.children[char])

            memo[pfx] = count
            return count

        memo: dict[str, int] = dict()
        root = TrieNode(dict())
        for word in words:
            root.insert(word)

        out = []
        for word in words:
            count = 0
            for end in range(len(word), 0, -1):
                count += countOne(word[:end])

            out.append(count)

        return out

The Improved Solution

This solution placed in the \(50\%\) quantile of runtime and \(70\%\) quantile of runtime:

class Solution:

    def sumPrefixScores(self, words: list[str]) -> list[int]:
        root = TrieNode(dict())
        for word in words:

            node = root
            for char in word:

                if char not in node.children:
                    new_node = TrieNode(dict(), terminates=1)
                    node.children[char] = new_node
                    node = new_node
                else:
                    node = node.children[char]
                    node.terminates += 1

        def count_path(word: str):
            node = root
            count = 0
            for char in word:
                node = node.children[char]
                count += node.terminates

            return count

        return [count_path(word) for word in words]