Edit Distance

Problem

Given two strings str1 and str2 and below operations that can performed on str1. Find minimum number of edits (operations) required to convert ‘str1’ into ‘str2’.

  1. Insert
  2. Remove
  3. Replace

All of the above operations are of equal cost.

Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Input:   str1 = "geek", str2 = "gesek"
Output: 1
We can convert str1 into str2 by inserting a 's'.

Input: str1 = "cat", str2 = "cut"
Output: 1
We can convert str1 into str2 by replacing 'a' with 'u'.

Input: str1 = "sunday", str2 = "saturday"
Output: 3
Last three and first characters are same. We basically
need to convert "un" to "atur". This can be done using
below three operations.
Replace 'n' with 'r', insert t, insert a

Algorithm

Method: Dynamic Programming

Problem Structure:

  • Define OPT(i, j) = min cost of aligning prefix string $x_1, x_2, …, x_i$ and $y_1, y_2, …y_j$

  • Different cases:

    • Case 1: OPT(i, j) matches $x_i, y_j$

      Pay mismatch for $x_i$ and $y_j$ + OPT(i - 1, j - 1)

    • Case 2: OPT(i, j) leaves $x_i$ unmatched

      Pay gap for $x_i$ + OPT(i - 1, j)

    • Case 3: OPT(i, j) leaves $y_j$ unmatched

      Pay gap for $y_j$ + OPT(i, j - 1)

Time Complexity: O(mn), where m is the length of string x and n is the length of string y

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def edit_distance_dp(x, y):
opt = [[0] * (len(y) + 1)for _ in range(len(x) + 1)]
for idx_x in range(len(x) + 1):
for idx_y in range(len(y) + 1):
if idx_x == 0:
opt[idx_x][idx_y] = idx_y
elif idx_y == 0:
opt[idx_x][idx_y] = idx_x
else:
alpha = 1
if x[idx_x - 1] == y[idx_y - 1]:
alpha = 0
opt[idx_x][idx_y] = min(opt[idx_x - 1][idx_y - 1] + alpha,\
opt[idx_x - 1][idx_y] + 1,\
opt[idx_x][idx_y - 1] + 1)
return opt[len(x)][len(y)]

Weird Bug

The following two scripts will produce different answers.

1
opt = [[0] * (len(y) + 1)for _ in range(len(x) + 1)]
1
opt = [[0] * (len(y) + 1)] * (len(x) + 1)

Reference