Key Idea
Whenever a mismatch occurs, what is the most we can shift the pattern so as to avoid redundant comparisons?
Answer: The largest prefix of P[0..j] that is a suffix of [1..j]
The KMP algorithm pre-processes the pattern string to find matches of the prefixes of the pattern with the pattern itself. The information thus calculated is used to shift the pattern appropriately whenever a mismatch occurs or a comparison fails. The compuatation is performed by the function called KMP prefix function
KMP Prefix Function
The prefix function F(j) is defined as the size of the largest prefix in P[0..j] that is also a suffix of P[1..j]
Code for computing the KMP Prefix Function
computeKmpPrefix(const std::string &pattern){ int patternSize = pattern.size(); vector<int> kmpPrefix(patternSize); size_t prefixPos = 0; size_t suffixPos = 1; while(suffixPos < patternSize){ if(pattern[prefixPos] == pattern[suffixPos]){ kmpPrefix[suffixPos] = prefixPos + 1; prefixPos++; suffixPos++; } else if(prefixPos > 0){//found some match prefixPos = kmpPrefix[prefixPos -1];//backtrack for matching prefix e.g. aaaaabaaaaaa } else{ kmpPrefix[suffixPos] = 0; suffixPos++; } } return kmpPrefix; }
The algorithm in picture
Example
The Complete Code
#ifndef _PatternMatcher_H_
#define _PatternMatcher_H_
#include <iostream>
#include <string>
#include <vector>
using namespace std;
class PatternMatcher{
public:
static int kmpSearch(const string& text, const string& pattern);
private:
static vector<int> computeKmpPrefix(const string& pattern);
PatternMatcher();
PatternMatcher(const PatternMatcher&);
const PatternMatcher& operator=(const PatternMatcher&);
};
#endif //_PatternMatcher_H_
#include "PatternMatcher.h"
vector<int> PatternMatcher::computeKmpPrefix(const std::string &pattern){
int patternSize = pattern.size();
vector<int> kmpPrefix(patternSize);
size_t prefixPos = 0;
size_t suffixPos = 1;
while(suffixPos < patternSize){
if(pattern[prefixPos] == pattern[suffixPos]){
kmpPrefix[suffixPos] = prefixPos + 1;
prefixPos++;
suffixPos++;
}
else if(prefixPos > 0){//found some match
prefixPos = kmpPrefix[prefixPos -1];//backtrack for matching prefix e.g. aaaaabaaaaaa
}
else{
kmpPrefix[suffixPos] = 0;
suffixPos++;
}
}
return kmpPrefix;
}
int PatternMatcher::kmpSearch(const std::string &text, const std::string &pattern){
size_t textSize = text.size();
size_t patternSize = pattern.size();
if(patternSize > textSize)
return -1;
vector<int> kmpNext = computeKmpPrefix(pattern);
int tIdx = 0;
int pIdx = 0;
while(tIdx < textSize){
if(pattern[pIdx] == text[tIdx]){
if(pIdx == patternSize - 1)
return tIdx - (patternSize - 1);
tIdx++;
pIdx++;
}
else if(pIdx > 0){
pIdx = kmpNext[pIdx - 1];
}
else{
tIdx++;
}
}
return -1;
}
#include "PatternMatcher.h"
int main(){
cout << PatternMatcher::kmpSearch
("abacaabaccabacabaabb", "abacab")
<< endl;
cout << PatternMatcher::kmpSearch
("abacaabaccabacabaabb", "baabb")
<< endl;
cout << PatternMatcher::kmpSearch
("abacaabaccabacabaabb", "abacad")
<< endl;
cout << PatternMatcher::kmpSearch
("abacaabaccabacabaabb", "abacaab")
<< endl;
cout << PatternMatcher::kmpSearch
("abacaabaccabacabaabb", "abacab")
<< endl;
cout << PatternMatcher::kmpSearch
("abacaabaccabacabaabb", "aabaccaba")
<< endl;
cout << PatternMatcher::kmpSearch
("abacaabaccabacabaabb", "abacaabaccabacabaabb")
<< endl;
cout << PatternMatcher::kmpSearch
("abacaabaccabacabaabb", "")
<< endl;
cout << PatternMatcher::kmpSearch
("", "abacaabaccabacabaabb")
<< endl;
cout << PatternMatcher::kmpSearch
("abacaabaccabacabaabb", "bacaabaccabacabaab")
<< endl;
cout << PatternMatcher::kmpSearch
("abacaabaccabacabaabb", "abacaabac")
<< endl;
cout << PatternMatcher::kmpSearch
("abacaabaccabacabaabb", "ccabacabaabb")
<< endl;
cout << PatternMatcher::kmpSearch
("abacaabaccabacabaabb", "bacaabaccabacabaabb")
<< endl;
return 0;
}
Absolutely amazing! Great explanation - great code.
ReplyDeleteExactly what I was looking for!
Thanks a ton!
(Though you could use a unsigned int patternSize)
Nice description and implementation.
ReplyDeleteHow can we reduce the code should also be mentioned.Above all, the technique and thinking plan is excellent.It is really to me as a naive programmer.
Regards
ProgrammingUnlimited
I was looking for a good explanation of KMP and by far you have given the best one. Awesome Job !!
ReplyDelete@Ritesh - Thank you!
ReplyDeleteThanks for the post!! Finally I understood the algo.
ReplyDeleteReally nice coding style, readable and understandable code. Keep on the good work!
ReplyDeletePS: what is the name of the font you use for the code?
Hi Αριστοφάνης Ροντογιἀννης,
ReplyDeleteThanks for your wonderful comments.
I use font-family: 'PT Mono' from Google Web Fonts
Pankaj