Professor Shai Simonson

CSC-311 Algorithms

Assignment 3 - 40 points

Due a week after Easter Break - April 17, Monday

1. Convex Hull Algorithms (10 points)

To visually test your data for 2-dimensional geometry algorithms, you can use https://www.geogebra.org/graphing?lang=en

Write code to define the point class, as discussed in class and in the notes.
Write code to implement the Graham Scan algorithm.
Write code to implement Jarvis’ algorithm.
Test your programs on the data: (A, 0, 0) (B, -5, -2) (C, -2, -1) (D, -6, 0) (E, -3.5, 1) (F, -4.5, 1.5) (G, -2.5, -5) (H, 1, -2.5) (I, 2.5, .5) (J, -2.2, 2.2).
Using test data of 1000 points (i, i²), for i = 1 to 1000, test both algorithms to see how fast each runs. Repeat for 10000 points (i, i²), for i = 1 to 10000. Report all results.
Add the three points: (0, 0), (1001, 0), (1001, 1002001), to the 1000 points in (e) and test again. Add the three points: (0, 0), (10001, 0), (10001, 100020001), to the 10000 points in (e) and test again. Report all results.
In (e) all the points are on the convex hull, and in (f) only the added three points are on the hull. Which algorithm (Jarvis or Graham) would you expect to run faster on (e) and on (f). Compare this to your empirical results.

2. A Recursive Convex Hull Algorithm Using Merging (8 points)

Write code to recursively compute a convex hull in O(n log n) time. The algorithm is described below and should remind you of Mergesort.

a. Explain why your method for merging is O(n).

b. Implement these methods and test the algorithm on the data from problem 1.

c. Compare results with the Graham and Jarvis algorithms.

Algorithm to Compute a Convex Hull Recursively Using Merging
    // base case
The base case will be either two or three points. If the number of points is three, return the points in counterclockwise order (you will need to use one call of "orient" to check this). If you think of your array as circular then for two points, it makes no difference what order you store the points.

else
{
    Sort the points by x-coordinate.
    Recursively compute the convex hull of the leftmost half of the points, and recursively compute the convex hull of the rightmost half of the points. Recall that a convex hull is returned as an array of points in counterclockwise order.
    For simplicity, you can assume that all the x-coordinates of the points are distinct.
    "Merge" the two convex hulls together into one convex hull.
}

Merging disjoint
convex hulls

Note that the two convex hulls recursively computed are non-intersecting. In order to merge two non-intersecting convex hulls, the trick is to identify the bridges between the two hulls, indicated by dotted blue lines in the picture above. These bridges will be identified by their endpoints, Top_left, Top_right, Bottom_left, and Bottom_right. Note that these points need not be the topmost and bottom-most points in each hull. Indeed, in the picture example above, the point clockwise of the Top_right bridge point is slightly higher than the Top_right bridge point. The new "merged" convex hull is the set of points in the left convex hull from Top_left through Bottom_left followed by the points in the right convex hull from Bottom_right to Top_right.

Note that there are some degenerate cases that might cause some bugs in your algorithm. To make things easier, you may assume that these cases never occur. Specifically, you may assume that there are no three points on the same line. For completeness, I describe the details of these cases below -- but you can skip the small print and continue with A Method to Determine... without missing anything.

1. Both hulls might be on the same straight line. When you merge these hulls, you need to make sure that the points are ordered correctly because the bridges may cause the points to permute strangely. To make sure of this, check every merged hull, and if the the points are all oriented 0 (straight line), then reorder them by their y-coordinate.

2. The hulls are two different straight lines. See the figure below.

In this case, the Bottom and Top bridge points on one hull may be identical. When merging the hulls in this case, make sure to include only one point (Bottom = Top) from the side where the Bottom and Top are identical.. In the example below, the Top-left and Bottom-left are distinct but the Top-right and Bottom-right are identical, so you include all the points from the left hull as usual but only one from the right. The lines indicate the top and bottom bridges. The merged hull in counterclockwise order is: Top-left, followed by the points on the left line in counterclockwise order through Bottom-left, followed by Bottom-right (same as Top-right).

A Method to Determine the Bottom_right and Bottom_left Points that Define the Lower Bridge

Let x be the rightmost point in the left convex hull, and y be the leftmost point in the right convex hull.
    // The values x and y can be determined by examining the x-coordinates of each point.

While the line x-y is not tangent to both hulls (on the bottom) do // It is tangent to the bottom when the next point in the left hull is a right turn and the next point in the right hull is a left turn.
{
   While (x-y is not tangent to the left hull) do
        {set x to the next clockwise point in the left hull}
   While (x-y is not tangent to the right hull) do
        {set y to the next counterclockwise point in the right hull}
}

    // When this loop is finished, x will be Bottom_left and y will be Bottom_right. A similar method can be used to determine Top_left and Top_right.

How to determine if a line is tangent to a hull:

A line is tangent to a hull if it intersects the hull in exactly one point. You check differently for being tangent to the top or bottom. In order to determine whether or not a line x-y is tangent to the bottom of the left hull, let z be the next clockwise point of the left hull. Check whether the sweeping movement from line y-x to line y-z, (i.e., y.orient(x,z) ) is straight or clockwise (tangent), or counterclockwise (not tangent). In order to determine whether or not a line x-y is tangent to the bottom of the right hull, let z be the next counterclockwise point of the right hull. Check whether the sweeping movement from line x-y to line x-z is clockwise (not tangent) or straight or counterclockwise (tangent). Similar if-statements work for checking whether a line is tangent to the top of the left and right hulls. Note that when the hull gets to be just two points, you can simply use the higher and lower points as the upper and lower bridge points. That special case helps avoid a lot of bugs.

You should be sure to implement the merge method in O(n) time, which gives a recurrence equation just like Mergesort, namely: T(n) = 2T(n/2) + n, whose solution is O(n log n). Note that if you implement the merge method carelessly, it can be O(n²) time which results in a recurrence of T(n) = 2T(n/2) + n², whose solution is O(n²).

3. A Line Segment Class (5 points)

A line segment is made up of two points.

a. Design a method (no code necessary) to determine if two line segments intersect.

b. Design a brute force algorithm to determine all pairs of intersecting line segments among n line segments. Analyze the complexity of your algorithm The "sweep-line" algorithm we mentioned in class runs in O(n log n + k) algorithm to find all the k intersection points among n line segments. Compare the time complexity of your result to the sweep-line algorithm.

c. Design a brute force algorithm that given a set of line segments, determines the largest number of segments that do not mutually intersect. This means that not one of the lines intersects with any other. What is the time complexity of your algorithm? Note that this problem in two dimensions is NP-Complete, see notes below.

(Notes of interest on the complexity of problem (c), thanks to Pankaj Kumar Agarwal, an expert in computational geometry at Duke University: If the line segments are in one dimension then you get an interval graph and the maximum independent set of nodes on an interval graph can be found in O(n log n), although the algorithm is not obvious. If the segments are in two dimensions, then the problem is NP-complete. Pankaj reports to me as of Winter 2003, that "recently one of my students has developed a couple of approximation algorithms for this problem.")

4. World Series Odds (2 points)

The chances of team A winning a World Series (a best 4 out of 7 competition for you non-baseball fans) in a match with an evenly matched team B, is denoted as P(a,b), where a is the number of games already won by team A, and b is the number of games already won by team B. Design a dynamic programming algorithm to calculate P(a,b). Your solution should work for any series of odd length 2n+1, not just for 7.

5. The Electoral College (5 points)

Modify the Knapsack algorithm and write code to solve the Partition problem. The Partition problem gives a set of integers and asks if the set can be partitioned into two parts so that the sums of the integers in each part are equal. (Hint: Let P(n,k) be True if the sum k can be achieved using a subset of the the first n integers. Describe a recurrence for P(n,k) and go from there.)

Write code for your algorithm and use it to check whether or not it was possible in 2010 to have a tie vote in our electoral college.

Enhance your code to backtrack and print out the actual values whose sums are equal. In this case, the numbers in each set sum to 269.

In 2010, here were the 51 electoral college values (DC has its own number):

3,3,3,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,6,6,6,7,7,7,7,8,8,9,9,9,10,10,10,10,11,11,11,11,12,13,15,15,15,17,20,21,21,27,31,34,55.

6. Cylinders (2 points)

Consider an n by n array of integers (a_ij) 1 <= i,j <= n rolled into a cylinder, so that the top and bottom rows are glued together.

A path is to be threaded from the left side of the cylinder to the right side, subject to the restriction that from any given square (i, j) it is possible to move to (i+1, j+1), (i, j+1) or (i-1, j+1). (The arithmetic is modulo the number of rows because of the gluing). That is, you can move directly to the right, up and to the right, or down and to the right. The path may begin at any position on the left side and end at any position on the right side. The cost of such a path is the sum of the integers in the squares through which it passes. You should figure out the minimum cost path. Note that the numbers can be negative.

a. Design a recursive algorithm to this problem and analyze its complexity.

b. Design a dynamic programming algorithm and show its complexity to be O(n²) (Note, this is really a special case of the shortest path problem on an acyclic graph, so it can be solved by conventional techniques in O(e) time but I don’t want you to do it this way).

7. Applications of the Knapsack Problem (2 points)

Write a recursive algorithm to determine how many different ways there are to make n cents in change using any coins from among pennies, nickels, dimes, quarters and half dollars. (e.g. there are 6 ways to make 17 cents in change).

Write a dynamic programming version of this algorithm and analyze its complexity.

Big hint: This is very similar to the Partition problem (number 5 in this homework). Let C(i,j) be the number of ways to make change for j cents using coins a₁ through a_i, where coin a₁ is a penny coin a₂ is a nickel etc. Note that this hint solves the problem independent of the actual denominations of the coins.

8. Liquid Knapsack (2 points)

Consider a version of the knapsack problem (liquid version) where you are allowed to place fractional amounts of each object into the knapsack.

a. Describe a greedy style algorithm that solves this problem.

b. Analyze how much time your algorithm requires and explain why it works.

9. Approximate String Matching (4 points)

String matching algorithms are an important application used in editors, word processors, and computational biology. Spell-checker programs must flag a word and give suggestions for its correct spelling; web searching tools need to help you correct your misspelled queries; hence these programs need to match strings approximately. That is, they must be able to check how "far apart" two strings are from each other. Approximate string matching algorithms are also prominent in computational biology where they are used to compare "similar" strands of DNA, each of which is represented as a sequence of characters from the set {A, C, T, G}.

We say that two strings are distance n apart if there is a way to change one into another by a combination of n insertions, deletions or changes of single letters.

Let s = s₁s₂…s_m be a search string of length m and t = t₁t₂…t_p be a text string of length p.
Let C(i, j) be the minimum distance between s₁…s_i and a segment of t ending at t_j. That is, without penalty, you can start the of match s₁...s_i at any symbol of t as long as it ends at t_j. You need not match s with all of t.
For example: if s = "testme" and t = "thisisateststring" then C(5, 11) = 1, because the string "testm" can be matched with the string "test", and only one deletion is needed.

    a. Explain why for i > 0 and j > 0:
        if s_i = t_j
                then C(i, j) = min {C(i-1, j)+1, C(i, j-1)+1, C(i-1, j-1)}
                else C(i, j) = min {C(i-1, j)+1, C(i, j-1)+1, C(i-1, j-1)+1}
    b. Explain why: C(0, j) = 0 and C(i, 0) = i. (Hint: If we insisted on matching s with all of t, rather than just a segment of t, then C(0, j) = j.)

    c. Design a dynamic programming algorithm that given a search string of length m and a text string of length p, finds j where C(m, j) is minimum.

    d. Explain how to backtrack and find out exactly which letters in the text and pattern match, and which are deleted or inserted.