In computer science, the Rabin–Karp algorithm or Karp–Rabin algorithm is a string searching algorithm created by Richard M. Karp and Michael O. Rabin that uses hashing to find any one of a set of pattern strings in a text.
The Rabin-karp algorithm is a better way to solve string matching problem. The main feature of this algorithm which makes it best is that :
* It uses hashing function.
* It can deal with multiple pattern matching.
* Effective and easy to implement.
Description :-
* p denote the decimal value of P[1….m] where m is the length of the pattern .
* t(s) denote the decimal value of Text txt[s+1…s+m] where s is the shift value.
* We can compute p in time O(m) using HORNER’S RULE.
p = P[m]+ d(P[m-1] + d(P[m-2]+……d(P[2] + dP[1]….))) [ Where d is decimal value. If we take string as numbers from 0 to 9 d will be 10 so as for characters d will be 256]
* Similarly when shift value is s=0 we can calculate txt[1….m] as t(0) in O(m) time .
t(1) is txt[2….m+1] which can again be calculated in O(m) time .
To Compute remaining ti we can say that :
ts+1 = d[ts – dm-1 T[s+1] ] + T[s+m+1]
* Subtracting d(m-1) T[s+1] removes the higher order digit and adding T[s+m+1] brings in the appropriate lower digit.
- We compute p and t(s) values modulo a suitable prime number in O(n-m+1) to overcome the problem of overflow when decimal value is too large.
- We compute hash values for all the pattern If the two hash values are equal but the characters are not the same it is called a SPURIOUS HIT . As:
How do we calculate ts+1 %q from ts % q (q=prime number) ??? –
3 : old – high order value, 2 : new – low order value
14152 ≡ ((31415 – 3 * 10000)*10 +2) mod 13
≡ 8 ( mod 13 )
now c++ implementation of Rabin- Karp Algorithm :
#include<iostream> #include<cstring> #include<cstdio> #define d 256 // Number of characters in input alphabet using namespace std; void search(char *txt , char *pat , int q){ int m=strlen(pat); int n=strlen(txt); int i,j; int p=0; // hash value for pattern int t=0; // hash value for text int h=1; // d^(m-1) value // The value of h would be "pow(d, M-1)%q" for(i=0;i<m-1;i++){ h=(d*h)%q; } // Calculate the hash value of pattern and first window of text for(i=0;i<m;i++){ p=(d*p+pat[i])%q; t=(d*t+txt[i])%q; } for(i=0;i<=n-m;i++){ // Check if the current sliding window of text and pattern have same hash values if(t==p){ // Check if all characters are same or it's a SPURIOUS HIT ! for(j=0;j<m;j++){ if(txt[i+j]!=pat[j]) { break; } } if(j==m) cout<<endl<<"Pattern matched at index "<<i<<endl; } // Calculate hash value for next window of text: Removing Old-high oder digit // adding new-low order digit if(i<n-m){ t = (d*(t - txt[i]*h) + txt[i+m])%q; // We might get negative value of t, converting it to positive if(t < 0) t = (t + q); } } } int main(){ char a[50],b[50]; cout<<"Enter the txt string :"<<endl; gets(a); cout<<endl<<"Enter the sub-string :"<<endl; gets(b); int q=101; // A Prime Number search(a,b,q); return 0; }
Output :-