{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook was prepared by [Donne Martin](https://github.com/donnemartin). Source and license info is on [GitHub](https://github.com/donnemartin/interactive-coding-challenges)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Solution Notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem: Given two strings, find the longest common subsequence.\n", "\n", "* [Constraints](#Constraints)\n", "* [Test Cases](#Test-Cases)\n", "* [Algorithm](#Algorithm)\n", "* [Code](#Code)\n", "* [Unit Test](#Unit-Test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Constraints\n", "\n", "* Can we assume the inputs are valid?\n", " * No\n", "* Can we assume the strings are ASCII?\n", " * Yes\n", "* Is this case sensitive?\n", " * Yes\n", "* Is a subsequence a non-contiguous block of chars?\n", " * Yes\n", "* Do we expect a string as a result?\n", " * Yes\n", "* Can we assume this fits memory?\n", " * Yes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Test Cases\n", "\n", "* str0 or str1 is None -> Exception\n", "* str0 or str1 equals 0 -> ''\n", "* General case\n", "\n", "str0 = 'ABCDEFGHIJ'\n", "str1 = 'FOOBCDBCDE'\n", "\n", "result: 'BCDE'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Algorithm\n", "\n", "We'll use bottom up dynamic programming to build a table. \n", "\n", "
\n",
"\n",
"The rows (i) represent str0.\n",
"The columns (j) represent str1.\n",
"\n",
" str1\n",
" -------------------------------------------------\n",
" | | | A | B | C | D | E | F | G | H | I | J |\n",
" -------------------------------------------------\n",
" | | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
" | F | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 |\n",
" | O | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 |\n",
"s | O | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 |\n",
"t | B | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |\n",
"r | C | 0 | 0 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |\n",
"0 | D | 0 | 0 | 1 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |\n",
" | B | 0 | 0 | 1 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |\n",
" | C | 0 | 0 | 1 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |\n",
" | D | 0 | 0 | 1 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |\n",
" | E | 0 | 0 | 1 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | 4 |\n",
" -------------------------------------------------\n",
"\n",
"if str1[j] != str0[i]:\n",
" T[i][j] = max(\n",
" T[i][j - 1],\n",
" T[i - 1][j])\n",
"else:\n",
" T[i][j] = T[i - 1][j - 1] + 1\n",
"\n",
"\n",
"Complexity:\n",
"* Time: O(m * n), where m is the length of str0 and n is the length of str1\n",
"* Space: O(m * n), where m is the length of str0 and n is the length of str1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Code"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"class StringCompare(object):\n",
"\n",
" def longest_common_subseq(self, str0, str1):\n",
" if str0 is None or str1 is None:\n",
" raise TypeError('str input cannot be None')\n",
" # Add one to number of rows and cols for the dp table's\n",
" # first row of 0's and first col of 0's\n",
" num_rows = len(str0) + 1\n",
" num_cols = len(str1) + 1\n",
" T = [[None] * num_cols for _ in range(num_rows)]\n",
" for i in range(num_rows):\n",
" for j in range(num_cols):\n",
" if i == 0 or j == 0:\n",
" T[i][j] = 0\n",
" elif str0[j - 1] != str1[i - 1]:\n",
" T[i][j] = max(T[i][j - 1],\n",
" T[i - 1][j])\n",
" else:\n",
" T[i][j] = T[i - 1][j - 1] + 1\n",
" results = ''\n",
" i = num_rows - 1\n",
" j = num_cols - 1\n",
" # Walk backwards to determine the subsequence\n",
" while T[i][j]:\n",
" if T[i][j] == T[i][j - 1]:\n",
" j -= 1\n",
" elif T[i][j] == T[i - 1][j]:\n",
" i -= 1\n",
" elif T[i][j] == T[i - 1][j - 1] + 1:\n",
" results += str1[i - 1]\n",
" i -= 1\n",
" j -= 1\n",
" else:\n",
" raise Exception('Error constructing table')\n",
" # Walking backwards results in a string in reverse order\n",
" return results[::-1] "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Unit Test"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting test_longest_common_subseq.py\n"
]
}
],
"source": [
"%%writefile test_longest_common_subseq.py\n",
"from nose.tools import assert_equal, assert_raises\n",
"\n",
"\n",
"class TestLongestCommonSubseq(object):\n",
"\n",
" def test_longest_common_subseq(self):\n",
" str_comp = StringCompare()\n",
" assert_raises(TypeError, str_comp.longest_common_subseq, None, None)\n",
" assert_equal(str_comp.longest_common_subseq('', ''), '')\n",
" str0 = 'ABCDEFGHIJ'\n",
" str1 = 'FOOBCDBCDE'\n",
" expected = 'BCDE'\n",
" assert_equal(str_comp.longest_common_subseq(str0, str1), expected)\n",
" print('Success: test_longest_common_subseq')\n",
"\n",
"\n",
"def main():\n",
" test = TestLongestCommonSubseq()\n",
" test.test_longest_common_subseq()\n",
"\n",
"\n",
"if __name__ == '__main__':\n",
" main()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Success: test_longest_common_subseq\n"
]
}
],
"source": [
"%run -i test_longest_common_subseq.py"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}