CMSC 435 Algorithm
Analysis & Design
Greedy Algorithms
Greedy algorithms are often employed in optimization problems where it can be proved that a greedy choice selection results in an optimal solution to the given problem.
Let us assume that we have a set of objects X, where X = { x1, x2, x3,..., xn} and S is the solution set (or choice of the objects from X that result in an optimal solution to a given problem P). Initially S = F (the null set).
Let g(X) be a greedy-choice ordering of the set X, g(X) = { x1, x2, x3, ..., xn}, and let P(xi) be true if xi can be added to the solution set when it is selected from g(X) for the problem P. Then we can write the general form for (most) greedy algorithms as follows:
algorithm greedy(X) {
use an appropriate greedy choice property to form g(X);
S = F; //where S is the solution set
while (g(X) is not empty) {
x = deletefront(g(X));
//remove
the top object from g(X) and set it
equal to x
if (P(x))
//if selected item can be added to solution set of problem P
S = S È {x}; //add it to the solution set
}
Let us consider an example of where this algorithm may be applied.
The Activity Selection Problem --
Consider the problem of scheduling activities into a room or auditorium over some period of time. Let X = the set of activities that have requested time in the auditorium where xi denotes the ith such request and it is delineated by its start and end times:
xi = (si, fi)
Let S be the set of scheduled activities, and P the problem of scheduling the largest number of requests in the given time interval from T0 to Tf. P(xi) will be true if, after ordering (or selecting) according to a greedy choice property, the ith activity selected can be scheduled without causing a conflict with any previously scheduled activity.
The first requirement is to determine an appropriate greedy-choice property and then to show that selection based solely upon this property will always yield an optimal solution. There are three possible bases for the greedy-choice property:
1. select according to earliest start time,
2. select according to earliest finish time,
3. select according to shortest duration.
Although the third choice is intuitively appealing, the second choice -- select according to earliest finish time -- is the correct one. We need to prove that this is so.
Proof by contradiction --
Suppose S is a maximal solution set and it does not contain the object with the earliest finish time x1.
Then a maximal set S' can be formed by removing the first item from S and replacing it with x1.
(where by first we mean the object in S with the earliest
finish time.) No confilct with any of the other objects in S can occur since
they must all start later than the end time of the object that was removed.
Since nothing better can be done in scheduling the first event, the problem then becomes the task of scheduling activities in the reduced time interval from f1 to Tf (the interval from the end of the first activity to the end of the scheduling period) from among the remaining choices with start times later than f1. (Note that all of the activities that start before x1 finishes can be eliminated because at best all they can do is replace the activity x1 with one that ends later.) The problem now is the same one as before with a shorter time interval and a reduced set of requests, so we apply the same argument in showing that the non-overlapping activity with the earliest end time can be included in any maximal set for this sub-problem, and continue until the set of requests has been exhausted..
The algorithm for finding a maximal set is given below. It is a special case of the general greedy algorithm. (The maximal set is not necessarily unique, but the algorithm is guaranteed to find one such maximal set.)
algorithm ActivitySelection (tuple X[], double T0, double Tf) {
//X is an array of tuples -- pairs of double precisions reals -- start, finish
Heap H(X); //the heap is ordered in increasing order of the finish times
//the heap constructor heap orders an array that is passed as a parameter -- Q(n)
tuple S[H.size()]; //declare an array to hold up to the full set of requests
int length = 0; //the length of the selection set, and first avail. location in S
double last = T0;
while (!H.empty()) {
tuple z = H.deletemin();
if (z.start >= last) { //no overlap
//S = S È {z} -- add z to the solution set
S[length] = z;
length++;
last = z.finish; //end time of scheduled activities
}
}
}
Kruskal's Minimum
Spanning Tree Algorithm
Given an undirected, connected graph G = (V, E), where V is a set of vertices and E a set of edges (each edge consists of the two end vertices and a "cost" associated with that edge), find a spanning tree of minimum cost for this graph.
We will develop a greedy algorithm to find this minimum spanning tree. The "greedy" organizing property will be to order the edges according to increasing cost. This ordering will be achieved by placing the edges in a min-ordered heap by using the "build-heap" constructor that forms the initial heap in time Q(|E|), where |E| is the cardinality of the set of edges -- the number of edges in the set. An expanded discussion of binary heaps provides pseudo code implementations of the key methods and proofs of the run-time efficiency of these algorithms.
The edge will be represented by a structure that contains a listing of the two end vertices and the cost associated with the edge:
struct edge {
int vert1, vert2;
double cost;
}
Kruskal's minimum spanning tree algorithm is a particular case of the general greedy algorithm shown above.
void Kruskal (edge E[], vector<edge> T, double & S) {
Heap h(E);
//use the "build-heap" constructor to place edges in a min-cost ordered heap
S = 0;
while (!h.empty()) {
edge e = h.deletemin( );
if (find(e.vert1) != find(e.vert2)) {
//inclusion of edge does not create a cycle
S += e.cost; //accumulated cost of the spanning tree edges
T.push_back(e); //add edge to the spanning tree list
union(find(e.vert1), find(e.vert2));
}
}
}
Testing that a selected edge can be added to the spanning tree set without creating a cycle can be performed economically by using a pair of algorithms called find and union. Let the vertices of the graph form a set, U, and let R be the relation "can be reached from" that is defined on this set U. Then for any x, y Î U, x R y is true if there is a path in the spanning tree that connects vertex x to vertex y. Initially all of the vetices of U are disjoint, but as the spanning tree is developed, distinct subsets of U are formed in which all of the elements of the subset are related by R.
R is an equivalence relation:
Let x, y, z be any three elements of U, then
· R is reflexive. Every element of U "can be reached from" itself.
· R is symmetric If x R y is true (x can be reached from y), then y R x is also true.
· R is transitive If x R y and y R z, then x R z.
A relation that is reflexive, symmetric, and transitive is an equivalence relation, and an equivalence relation partitions a set into disjoint subsets whose union covers the entire set. After each addition of an edge to the spanning tree a union of two of the disjoint sets is performed, and when the tree has been completely formed, all of the vertices of U will belong to a single equivalence class.
To implement the union-find algorithm we construct two arrays -- one labeled parent and the other labeled rank, The rank array is initialized to 0 and the parent[i] = i for all i = 0 to |V| - 1.

For each equivalence class, one of the vertices serves as a label or a root for that class. The find function returns the label, and it reduces the height of the tree as it searches through the list of parents for the class label.
int find (int vertex) {
int above = parent[vertex];
while (above != vertex) {
int gparent =
parent[above];
parent
[vertex] = gparent;
vertex
= above;
above =gparent;
}
return above;
}
The union function combines two disjoint equivalence classes into one assigning the label to the one of higher rank.
void union (int i, int j) {
if (rank[i] >= rank[j]) {
parent[j] = i;
if (rank[i] == rank[j])
rank[i]++;
}
else
parent[i] = j;
}
Consider the example below where we are at a stage with four equivalence classes and we are performing a comparison of find(0) and find(6)

The trees represent the information contained in the two arrays. While doing a find(0), during the first iteration of the while loop, a compression is made and the vertices are re-labeled as shown below.

One more iteration of the loop is required for find (0) to return 5 and attach vertex 1 directly to the root. In this second iteration, the parent of 0 is made its grandparent (node 2), and the previous parent (node 1) becomes the new vertex. In find (6), the node labeled above (node 8) is its own parent, so the loop is terminated after one iteration, and 8 is returned. Since both ends of this edge belong to distinct equivalence classes, the edge is added to a minimum spanning tree, and a union of the two labels -- 5 and 8 is performed. The union of 5 and 8 will make 5 the parent of 8 since 5 is the label of a class with higher rank. When this stage of the algorithm is completed, the tables will convey the following picture.

Proof that Kruskal's algorithm produces a MST
Let G be a connected, weighted graph.
Let T be a minimum spanning tree (MST) of G. (G may have several spanning trees of minimal weight.)
Let G' be a subgraph of G that is also contained in a MST of G
Let C be a component of G'.
Let (v, w) be a minimum weight edge with v Î C and w Ï C.
We must show that the graph obtained by adding (v, w) to G' is contained in a MST of G
If (v, w) Î T then we have completed the proof.
If (v, w) Ï T, suppose we add this edge to T. Adding an edge to a MST will create a cycle, so we must remove an edge in T from this cycle and the resulting graph will also be a spanning tree of G. We need to show that the resulting spanning tree will also be an MST.
We need to decide which edge to remove from T. In the diagram below, edge (v', w') is an edge in S (hence also in T) that has one vertex in C and the other not in C. This is the edge that we will remove from T.

Let T' be the new spanning tree created by adding (v, w) and removing (v', w') from T. Since (v, w) is a minimum weight edge with one vertex in C and the other one not, we must have:
weight(v, w) £ weight (v', w')
Therefore
weight(T) ≥ weight (T')
but T is an MST, therefore weight(T) = weight(T') and T' is also an MST.