Algorithms/Dynamic Programming: Difference between revisions
From charlesreid1
(→Code) |
|||
| Line 162: | Line 162: | ||
=Code= | =Code= | ||
Link to practice problems [https://people.cs.clemson.edu/~bcdean/dp_practice/] | |||
==Racing Gems== | ==Racing Gems== | ||
| Line 184: | Line 186: | ||
(Note, this problem is only interesting if there are negative numbers.) | (Note, this problem is only interesting if there are negative numbers.) | ||
=Resources= | =Resources= | ||
Revision as of 07:05, 9 August 2017
Notes
Skiena Chapter 8
Fibonacci dynamic programming example
Skiena begins the chapter with the Fibonacci example.
Worst way to do Fibonacci calculation: using recursion
More efficient way to do Fibonacci calculation: cache the intermediate results
Most efficient way: specify the order of evaluation in such a way that the results that are needed are always available. No caching necessary. It's a bit trivial once you reach this point, but it is still dynamic programming - you're just specifying the order in which to evaluate the sub-problems so that you always have the information you need.
Binomial dynamic programming example
Skiena uses the binomial coefficient as an example of a problem with a single expression that can be evaluated, but that single expression will cause integer overflow.
Instead: use identity,
$ \binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k} $
and the base cases:
$ \binom{p}{0} = 1 $
$ \binom{p}{1} = p $
$ \binom{p}{p} = 1 $
To implement this, we can build a table/array, which will contain Pascal's triangle...
Left column: m, top column: n
Now the order of operations is to fill in the edges of Pascal's triangle (the base cases) first
- Initialize column 0 = 1
- Initialize diagonal = 1
Now loop over each row and each column of the row to compute the binomial coefficient:
for i = 1 to n:
for j = 1 to i:
calculate binomial coefficient
Fuzzy string matching example
The next example Skiena covers is an important one: fuzzy string matching.
We are trying to pattern-match the string P in a larger text T.
To do approximate string matching, we must begin with a cost function. The cost function assigns a cost to each possible modification of our original string.
3 classes of edits:
- Substitution (turning one letter into another)
- Insertion
- Deletion
To find the minimum cost function to turn our pattern P into something in the original text T, we need to work backwards: from the ending point, work backwards 1 edit at a time.
Maintain a table:
- D(i,j) is the distance (the number of differences) between P1, P2, ..., Pi and the segment of T ending at j
- D(i,j) is the minimum of the three possible ways to extend the current string
- In other words, it is the minimum of the following:
Substitution: if $ P_i = T_j $ then $ D(i-1, j-1) $, otherwise $ D(i-1, j-1) + 1 $
Insert: $ D(i-1, j) + 1 $
Delete: $ D(i, j-1) + 1 $
Pseudocode for program:
# Non-Dynamic Programming Version of Fuzzy String Matching def string_compare(s, t, i, j): # Base case if i==0 or j==0: return # Option: Match match = string_compare(s, t, i-1, j-1) + match(s[i], t[j]) # Option: Insert insert = string_compare(s, t, i, j-1) + insertdelete(t[j]) # Option: Delete delete = string_compare(s, t, i-1, j) + insertdelete(s[i]) # Pick minimum of these return min(match, insert, delete)
But this is horribly inefficient. Why? Because it does not memoize. This means that it is repeating the same tasks over and over again.
Fuzzy string matching dynamic programming example
Define a C struct to hold information for cells. Each cell is a particular D(i,j) location.
Each cell holds cost, and parent cell. Now, in place of a recursive call, we have a table/cell lookup.
There is a fixed number of entries: size of P times size of T $ |T| \times |P| $. The table has one column for each character in T, and one row for each character in P.
There are two ways to approach this problem:
- We can get the edit distance, but not the actual sequence of edits. This is a bit easier.
- We can get the actual sequence of edits. This is a little harder, but more useful.
To reconstruct the tour, we can use the cost matrix D(i,j) and work backwards to reconstruct the tour from D(i,j).
Here are the functions to define:
- Initialization - this creates the "boundary conditions" or base cases for the dynamic program. This is the starting point that we build on
- Penalty cost - this is a function that determines the actual cost of each edit (modify, insert, delete). The simplest case assigns a cost of 1, but this can tackle more general problems by making it a function.
- Goal cell - this is a function that returns the indices of the endpoint cell. Again, the simplest case is the last cell in the table (which has a fixed size), but making this a function makes it more general.
- Traceback - defining a "trace" action allows printing information about where in the dynamic program we are (for example, printing out the name of each operation, so we can see MMDMDIIMMD to indicate the sequence of edits being made)
Dynamic programming algorithms are easy to design, but getting boundary conditions and index manipulations correct requires great care.
Why is there so much extra fluff?
- Fuzzy string matching is solved, but the extra fluff solves other problems too
- Fuzzy string matching in large text body (slight modifications required to row initialization and goal cell functions)
- Largest common subsequence - finding number of characters in common between two strings, also called the LCS (longest common subsequence) problem; requires modifying match cost function
- Maximum monotone subsequence - finding the fewest numbr of elements to delete to make a sequence monotonic - for example, 243517698 -> 23568 (this is identical to the LCS problem)
Longest Increasing Sequence Example
Three dynamic programming steps:
1. Formulate answer as recurrence relation/recursive algorithm
2. Show that the number of parameters taken on by the recurrence is bounded by a polynomial
3. Specify the order of evaluation for the recurrence so that partial results that are needed are always available
Longest increasing (non-contiguous) subsequence:
Step 1 - formulate answer as recurrence relation
- We want to know the longest subsequence in the preceding numbers - if we are considering $ S_n $, we want to know the longest increasing subsequence in $ S_1, S_2, \dots, S_{n-1} $
- However, we also want to know the length of the longest sequence that $ S_n $ will extend
Step 2 - To formulate the recurrence relation:
- Let L_i be length of the longest subsequence that ends with S_i
- Base case: L_0 = 0
- L_i given by:
$ L_i = \max_{0 < j < i} L_j + 1 \qquad S_j < S_i $
Skiena does not explain this very clearly, in particular the case where i = 1, so... not sure what's going on there.
Code
Link to practice problems [1]
Racing Gems
Link: https://charlesreid1.com:3000/cs/java/src/master/dynamic-programming/racing-gems
This problem is an ICPC programming competition problem.
In this problem, we have a race track of width x = 0 to x = w, height y = 0 to y = h. We have a character that begins at the x-axis (y=0) and moves toward the finish line (y=h) with a steady velocity v.
You control the character's horizontal velocity, -v/r to +v/r, and your goal is to obtain as many gems as possible.
The location of gems are given by (x,y) integer pairs on the grid. Determine the maximum number of gems that can be obtained by the character.
Git repo has solution explanation: https://charlesreid1.com:3000/cs/java/src/master/dynamic-programming/racing-gems
Maximum Value Contiguous Subsequence
Link: https://charlesreid1.com:3000/cs/java/src/master/dynamic-programming/max-val-seq
Given a sequence of integers, $ a_1, a_2, \dots, a_n $, determine the subsequence of integers with the maximum sum.
(Note, this problem is only interesting if there are negative numbers.)
Resources
MIT Open Courseware 6.006
Links to Videos
Dynamic programming 1: Fibonacci, shortest paths: https://www.youtube.com/watch?v=OQ5jsbhAv_M
Dynamic programming 2: Blackjack: https://www.youtube.com/watch?v=ENyox7kNKeY
Dynamic programming 3: Parenthesization, edit distance, knapsack: https://www.youtube.com/watch?v=ocZMDMZwhCY
Dynamic programming 4: guitars, tetris, super mario: https://www.youtube.com/watch?v=tp4_UXaVyx8
MIT Open Courseware 6.046
Links to Videos
The topic of dynamic programming is covered in the MIT 6.046 open courseware playlist: https://www.youtube.com/watch?v=2P-yW7LQr08&list=PLUl4u3cNGP6317WaSNfmCvGym2ucw3oGp
The course begins with dynamic programming, by covering the divide and conquer approach for finding a convex hull: https://www.youtube.com/watch?v=EzeYI7p9MjU&list=PLUl4u3cNGP6317WaSNfmCvGym2ucw3oGp&index=2
It continues with a divide and conquer approach for matrix algebra and Fast Fourier Transforms: https://www.youtube.com/watch?v=iTMn0Kt18tg&list=PLUl4u3cNGP6317WaSNfmCvGym2ucw3oGp&index=4
It then covers divide and conquer for van Emde Boas trees: https://www.youtube.com/watch?v=hmReJCupbNU&index=6&list=PLUl4u3cNGP6317WaSNfmCvGym2ucw3oGp
Recitation with more dynamic programming examples: https://www.youtube.com/watch?v=krZI60lKPek&list=PLUl4u3cNGP6317WaSNfmCvGym2ucw3oGp&index=12
Advanced dynamic programming: https://www.youtube.com/watch?v=Tw1k46ywN6E&index=14&list=PLUl4u3cNGP6317WaSNfmCvGym2ucw3oGp
Dynamic programming for all pairs, shortest paths: https://www.youtube.com/watch?v=NzgFUwOaoIw&index=15&list=PLUl4u3cNGP6317WaSNfmCvGym2ucw3oGp