kmp algorithm
The kmp algorithm is used for pattern matching of strings, that is, to find the first occurrence of the pattern string in the target string
for example
abababc
Then bab is at its position 1 and bc is at its position 5
The first and simplest thing that comes to mind is to brute force match character by character, but that would have a time complexity of O(m*n)
The kmp algorithm guarantees a time complexity of O(m+n)
rationale
An example:
Move after finding that x is different from c
a is different from x. Move again
comparing to c and y at this point.
So the next step moved to the following
This time the move is different from the previous two moves, before each time after comparing to the character position of the long string above, directly align the first character of the pattern string with it, this time it does not, the reason is that this time before the move, y and c alignment, but y before the ab is the same as their own prefix ab, so the ab and do not have to be compared again, directly from the third position of the comparison, as shown in the figure:
So the kmp algorithm for this case just uses the longest identical prefix and suffix before the current comparison character, and then aligns the prefix to the long string above and continues the comparison
later strings.
Here comes an important point in the kmp algorithm, how to find the longest identical prefix/suffix before each character in the pattern string?
Here's an example to continue with:
The following numbers record the length of the longest prefix and suffix of the same substring ending with the character, initialized to 0, and the first character under the number of confirmation to 0
Then start comparing:
a is different from b, then the number under b is also 0 Similarly a is different from c, the number under c is also 0. Next a is aligned with a as follows
At this point, a is the same as a, then the number below a is 1, which is the index of the currently compared character in the second row of strings + 1
Next b is the same as b, c is not the same as a, then at this time the upper string does not move, the following string moves to the current position of the comparison that is c's first digit below the number of positions, as shown in the figure:
premovement
The first digit of c is b, and the digit below b is 0. So the 0th digit of the following string is compared to the previous pair of positions against it, i.e.:
The current comparison position is shown by the arrow, and then the comparison continues backward, all the way to c vs. a:
At this point c is not the same as a, then compare the position of the digit below the previous character of the following character as shown
That is, the position 2 is aligned with the above comparison position:
At this point, c is the same as c. The entire string self-comparison is complete:
At this point there may be no understanding of why the match is unsuccessful when it is necessary to compare the position of its previous character under the number, that is because this is to find the previous character position under the longest identical prefix in the longest identical prefix, just to cite the example:
At this point a is preceded by abcab, so the longest identical prefix to find abcab is ab, which is
Then just move to the position of ab and ab against it to continue the comparison
time complexity
Simply put, to find the longest prefix and suffix before each character in the pattern string, if the length of the pattern string is m, then the point of the string above is always moving forward, and the whole of the string below is always moving forward, the final result of the move will be the length of 2m, so the number of comparisons is a maximum of 2m, the time complexity of O(m)
The kmp algorithm utilizes the result of this method of finding the longest identical prefix and suffix before each character in the pattern string, and then uses a similar method of comparing and shifting, similar to the explanation above, if the length of this string to be matched is n, then the longest shifting example is no more than 2n, so the total time complexity is O(m+n)
specific code
Validation of incoming parameters is not done here, and care should also be taken when using it.
def same_start_end(s): """Maximum number of character digits with the same prefix and suffix""" n = len(s) # The length of the entire string j = 0 # Prefix matches point to i = 1 # Suffix matches point to result_list=[0]*n while i < n: if j == 0 and s[j] != s[i]: # Comparisons are not equal and the comparison is already on the first character result_list[i] = 0 # The value is 0 i += 1 # Move backward elif s[j] != s[i] and j != 0: # Compare unequal, set the value of j to the value in result_list one before j, in order to find the longest identical prefix and suffix in the previously matched substring. j = result_list[j-1] elif s[j] == s[i]: # Equal then continue to compare result_list[i] = j+1 j = j+1 i = i+1 return result_list def kmp(s,p): """kmp algorithm, s is a string, p is a pattern string, return value is the index of the first character of the first string matched, no match returns -1""" s_length = len(s) p_length = len(p) i = 0 # Pointing to s j = 0 # Pointing to p next = same_start_end(p) while i < s_length: if s[i] == p[j]: # Corresponding characters are the same i += 1 j += 1 if j >= p_length: # Exact match return i-p_length elif s[i] != p[j]: # Not the same if j == 0: # The comparison with the pattern is the first character of the pattern i += 1 else: # Take the last character of the prefix with the longest identical prefix before the current character of the pattern to continue the comparison j = next[j] if i == s_length: # No exact match substring found return -1
summarize
The above is a small introduction to the python implementation of the kmp algorithm example code, I hope to help you, if you have any questions welcome to leave me a message, I will reply to you in a timely manner!