Chapter 5 Disjoint Set

Equivalence Relations

The Dynamic Equivalence Problem

  • Elements: 1, 2, …, N
  • Sets: SiSj=S_i\cup S_j=\empty\to disjoint

Operations

  1. Union(i, j)
  2. Find(i)

 

Basic Data Structure

Union

set the parent pointer of one of the roots to the other root

Implementation

// Linked List is bad!!!
typedef struct SetStruct* Set;
typedef struct SetNodeStruct* SetNode;
struct SetStruct{
SetNode array[MAXSIZE];
};
struct SetNodeStruct{
SetNode parent;
};

void Union(Set set, int i, int j){ // Union Set j to Set i
set->array[i]->parent = set->array[j];
}
typedef struct SetStruct* Set;
struct SetStruct{
int root[MAXSIZE];
};

void Union(Set set, int i, int j){ // Union Set j to Set i
set->root[j] = i;
}

T(N)=O(1)T(N)=O(1) always

 

Find

Implementation

Set Find(int key, Set sets[], int size);    // Linked List is bad!!!
int Find(int key, Set set){
while(set->root[key] > 0){
key = set->root[key];
}
return key;
}

 

Analysis

Union-Find Operation

try Union(2, 1) find(1);
Union(3, 2) find(2);

Union(N + 1, N) find(N);

then, the tree becomes a linked list while the time complexity becomes O(N2)O(N^2)

 

Smart Union Algorithms

Union by Size

Always union the smaller tree to the larger one

Define set->root[root] = -Size

Make the signal bit of the root ndoe to represent the size of the tree
Initialize all signal bit by -1

When trying to Union Set i and Set j, always let the Set with larger size unioned by the one with smaller size

Now the time complexity reduces to T(N)=O(N)T(N)=O(N)

Lemma Let T be a tree created by union-by-size with N nodes, then

height(T)int(log2N)+1height(T)\le\mathrm{int}(\log_2N)+1

the worst case is binary tree, while the best case is 1 root node with (N - 1) leaf nodes

time complexity of N union and M find: T(N,M)=O(N+Mlog2N)T(N,M)=O(N+M\log_2N)

 

Union by Height

Always union the shallower tree to the deeper one

 

Path Compression

int Find(int key, Set set){
if(set->root[key] <= 0){
return key;
}
else{
// Let key and its all ancestors node point to the root DIRECTLY
return set->root[key] = Find(set->root[key], set);
}
}
int Find(int key, Set set){
int root, trai, lead;
root = key;
while(set->root[root] > 0){
root = set->root[root]
}
for(trail = key; trail != root; trail = lead){
lead = set->root[trail];
set->root[trail] = root;
}
return root;
}

Only can be used with Union-by-Size, because path compression will change the height of the tree, we need to change the height frequantly

 

Worst Case For Union-by-Rank and Path Compression

Lemma(Tarjan) k1Mα(M,N)<T(M,N)<k2Mα(M,N)k_1M\alpha(M,N)<T(M,N)<k_2M\alpha(M,N)

Ackermann’s Function

A(i,j)={2j,i=1,j1A(i1,2),i2,j=1A(i1,A(i,j1)),i2,j2A(i,j)=\begin{cases} 2^j&,i=1,j\ge 1\\ A(i-1,2)&,i\ge 2,j=1\\ A(i-1,A(i,j-1))&,i\ge 2,j\ge 2 \end{cases}

A(2,4)=265536A(2,4)=2^{65536}

α(M,N)=min{i1  |  A(i,int(M/N))>logN}\alpha(M,N)=\min\Set{i\ge 1|A(i,\mathrm{int}(M/N))>\log N}

for common M and N, α(M,N)O(logN)4\alpha(M,N)\le O(\log^\star N)\le 4
logN\log^\star N: inverse Ackermann’s Function

So α(M,N)\alpha(M,N) can be taken as a const number

While k1,k2k_1,k_2 are const numbers, the final time complexity is T(M,N)=O(M)T(M,N)=O(M)