Knuth–Morris–Pratt (KMP) for Typescript - Blog

Came on something lovely today that sent me down a rabbit hole. #LeetCode 28. Find the Index of the First Occurrence in a String.

Of course the naive solution is "Isn't... that just a method on the string...?" return haystack.indexOf(needle). Lo and behold, my solution beat 100% of submissions, time complexity and memory! 🔥

... Well that can't be right.. So I started looking over solutions. Most noted that browser's built in indexOf method uses brute force comparison. That should be O(n^2). Most instead used Knuth–Morris–Pratt (KMP) algorithm which should run in O(n+m). Since I hadn't learned that one yet, I took a couple hours to get my head around it and have my own copy to reach for again. A huge thanks goes out to the video walkthrough by NeetCode for the best explanation I found.

For myself later and happily you if needed:

/**
 * Parse the string looking for prefix suffix matches.
 * Time complexity: O(2*m) ~ O(m)
 **/
const getLongestPrefixSuffix = (needle: string): number[] => {
  if (!needle) return [0];

  const result = new Array(needle.length).fill(0);
  // First can never match as the suffix can't be the entire string
  result[0] = 0;
  let m = 0;
  let i = 1;
  while (i < needle.length) {
    if (needle[i] === needle[m]) {
      m++;
      result[i] = m;
      i++;
    } else {
      if (m === 0) {
        result[i] = 0;
        i++;
      } else {
        // We want the lps *length* at the previous index, not the length minus one.
        m = result[m - 1];
        result[i] = m;
      }
    }
  }
  return result;
};

export const indexOfLPS = (haystack: string, needle: string): number => {
  const lps = getLongestPrefixSuffix(needle);

  let i = 0; // Position in haystack
  let m = 0; // Position in needle
  while (i <= haystack.length) {
    // If we have a match increment both
    if (haystack[i] === needle[m]) {
      i++;
      m++;
    } else {
      // If we didn't match, we know we need to jump forward.
      // How far we move can be shortcut at times thanks to prefixes and suffixes.
      if (m === 0) {
        i++;
      } else {
        // Again, we use the *value* of previous lps.
        m = lps[m - 1];
      }
    }
    if (m === needle.length) {
      return i - needle.length;
    }
  }
  return -1;
};
/**
 * Parse the string looking for prefix suffix matches.
 * Time complexity: O(2*m) ~ O(m)
 **/
const getLongestPrefixSuffix = (needle: string): number[] => {
  if (!needle) return [0];

  const result = new Array(needle.length).fill(0);
  // First can never match as the suffix can't be the entire string
  result[0] = 0;
  let m = 0;
  let i = 1;
  while (i < needle.length) {
    if (needle[i] === needle[m]) {
      m++;
      result[i] = m;
      i++;
    } else {
      if (m === 0) {
        result[i] = 0;
        i++;
      } else {
        // We want the lps *length* at the previous index, not the length minus one.
        m = result[m - 1];
        result[i] = m;
      }
    }
  }
  return result;
};

export const indexOfLPS = (haystack: string, needle: string): number => {
  const lps = getLongestPrefixSuffix(needle);

  let i = 0; // Position in haystack
  let m = 0; // Position in needle
  while (i <= haystack.length) {
    // If we have a match increment both
    if (haystack[i] === needle[m]) {
      i++;
      m++;
    } else {
      // If we didn't match, we know we need to jump forward.
      // How far we move can be shortcut at times thanks to prefixes and suffixes.
      if (m === 0) {
        i++;
      } else {
        // Again, we use the *value* of previous lps.
        m = lps[m - 1];
      }
    }
    if (m === needle.length) {
      return i - needle.length;
    }
  }
  return -1;
};

But this got me started wondering. Why browsers wouldn't just implement KMP initially? You could make an argument for KMP, or Boyer-Moore (BM) which can reach O(n/m). That got me looking for heuristics. Something that I could do to scan the data, and decide the best to use intelligently. That search turned up A Hybrid Length-Based Pattern Matching Algorithm for Text Searching published April 2025. At a high level, it suggests that highly structurally text based off language could be more efficiently searched (for entire words only, a fatal flaw for now) by chunking it and seeing which words are length matches. The results look impressive on text like books, and horrible on things like genetic sequences.

And that's the answer. Why isn't indexOf more intelligent? It can't know the structure, repetition, length of what you're checking from one run to the next. The developer has to know that, and make the right decision.