Edit Distance Problem

By | October 8, 2015

Edit Distance Problem Statement:

Given two stings, String A and String B, we need to convert A to B in minimum number of steps possible. The following 3 options are allowed:

  1. Insert a character.
  2. Delete a character.
  3. Replace a character.

All the three operations are assumed to equal amount of time ( say 1 unit).

Edit Distance Examples:

  1. Insertion: String A= “Hell” , String B =”Hello”
    Here we just need to add ‘o’ to the end of the string A.
    So minimum operations required are : 1.
  2. Deletion: String A=”swaggy”, String B=”swag”
    Here we just need to delete ‘g’ and ‘y’ from A to make it B.
    So, minimum operations required are : 2.
  3. Replacement:  String A=”Cat” ,String B=”Gap”
    Here we need to replace the ‘t’ with ‘p’ and ‘C’ with ‘G’.
    So, minimum operations required are : 2.

Edit Distance Solution:

Here, the answer to a bigger problem can be derived from smaller sub-problem using DP.

The idea is to start from right side of string A and String B (Let M and N be the respective string lengths) and then make required changes using the following two points :

  1. If the Last character are Same,  we ignore them and recur calculate the answer for strings of length M-1 and N-1.
  2. If they are different, then we consider the following 3 sub-problems, take their minimum and add 1 to get the answer:
    1. Insertion of last character: Recur for M and N-1.
    2. Deletion of last character: Recur for M-1 and N.
    3. Replacement of last character: Recur for M-1 and N-1.

Edit Distance State of DP: 

E[i,j] denotes the minimum moves required to convert A[1 to i] (means first i character of String A) to B[1 to j] .

Edit Distance Recursion:

E[i,j] = min{ E[i][j-1] , E[j-1][i] , E[i-1,j-1] +diff(i,j)} where

diff(i,j) = 1  if A[i]!=B[j]

= 0 if A[i]=B[j]

Here are the videos that explain by taking an example:

Here is the working code:

#include<bits/stdc++.h>
using namespace std;

int EditDistance( string A , string B , int n ,int m){
   
   /*An extra row and column has been 
   included to take care of the cases when
   either of the sritng is of length 0*/
   int E[n+1][m+1];
   int i,j;
   
   for( i=0;i<=n;i++){
      for(j=0;j<=m;j++){
         /*String A is empty,
         Only way is to Insert everything*/
         if( i==0)
            E[i][j]=j; 
         /*String B is empty,
         Only way is to Delete everything*/	
         else if(j==0)
            E[i][j]=i; 
         /*If the last character is same*/	
         else if( A[i-1]==B[j-1])
            E[i][j] = E[i-1][j-1];
         /*If the last character is different*/	
         else
         {
            E[i][j] = min( E[i][j-1],min( E[i-1][j], E[i-1][j-1]))+1;
         }	
      }
   }
   return E[n][m];
}

int main() {
   string A = "Cloud Kaksha";
   string B = "Code Kaksha";
   cout<<"The Answer for this conversion is\n";
   cout<<EditDistance( A , B , A.length(),B.length());
   return 0;
}

Output:

The Answer for this conversion is
3

Time Complexity: O(M*N)
Auxiliary Space: O(M*N)

An application of this algorithm is the Spoj EDIST question. To understand how solution works, read the question and then refer to Spoj EDIST solution.