数据结构基础(六)

Chapter 5 Disjoint Set

Equivalence Relations

The Dynamic Equivalence Problem

Elements: 1, 2, …, N
Sets: $S_i\cup S_j=\empty\to$ disjoint

Operations

Union(i, j)
Find(i)

Basic Data Structure

Union

set the parent pointer of one of the roots to the other root

Implementation

// Linked List is bad!!!
typedef struct SetStruct* Set;
typedef struct SetNodeStruct* SetNode;
struct SetStruct{
    SetNode array[MAXSIZE];
};
struct SetNodeStruct{
    SetNode parent;
};

void Union(Set set, int i, int j){   // Union Set j to Set i
    set->array[i]->parent = set->array[j];
}

typedef struct SetStruct* Set;
struct SetStruct{
    int root[MAXSIZE];
};

void Union(Set set, int i, int j){  // Union Set j to Set i
    set->root[j] = i;
}

$T(N)=O(1)$ always

Find

Implementation

Set Find(int key, Set sets[], int size);    // Linked List is bad!!!

int Find(int key, Set set){
    while(set->root[key] > 0){
        key = set->root[key];
    }
    return key;
}

Analysis

Union-Find Operation

try Union(2, 1) find(1);
Union(3, 2) find(2);
…
Union(N + 1, N) find(N);

then, the tree becomes a linked list while the time complexity becomes $O(N^2)$

Smart Union Algorithms

Union by Size

Always union the smaller tree to the larger one

Define set->root[root] = -Size

Make the signal bit of the root ndoe to represent the size of the tree
Initialize all signal bit by -1

When trying to Union Set i and Set j, always let the Set with larger size unioned by the one with smaller size

Now the time complexity reduces to $T(N)=O(N)$

Lemma Let T be a tree created by union-by-size with N nodes, then

$height(T)\le\mathrm{int}(\log_2N)+1$

the worst case is binary tree, while the best case is 1 root node with (N - 1) leaf nodes

time complexity of N union and M find: $T(N,M)=O(N+M\log_2N)$

Union by Height

Always union the shallower tree to the deeper one

Path Compression

int Find(int key, Set set){
    if(set->root[key] <= 0){
        return key;
    }
    else{
        // Let key and its all ancestors node point to the root DIRECTLY
        return set->root[key] = Find(set->root[key], set);  
    }
}

int Find(int key, Set set){
    int root, trai, lead;
    root = key;
    while(set->root[root] > 0){
        root = set->root[root]
    }
    for(trail = key; trail != root; trail = lead){
        lead = set->root[trail];
        set->root[trail] = root;
    }
    return root;
}

Only can be used with Union-by-Size, because path compression will change the height of the tree, we need to change the height frequantly

Worst Case For Union-by-Rank and Path Compression

Lemma(Tarjan) $k_1M\alpha(M,N)<T(M,N)<k_2M\alpha(M,N)$

Ackermann’s Function

$A(i,j)=\begin{cases} 2^j&,i=1,j\ge 1\\ A(i-1,2)&,i\ge 2,j=1\\ A(i-1,A(i,j-1))&,i\ge 2,j\ge 2 \end{cases}$

$A(2,4)=2^{65536}$

$\alpha(M,N)=\min\Set{i\ge 1|A(i,\mathrm{int}(M/N))>\log N}$

for common M and N, $\alpha(M,N)\le O(\log^\star N)\le 4$
$\log^\star N$ : inverse Ackermann’s Function

So $\alpha(M,N)$ can be taken as a const number

While $k_1,k_2$ are const numbers, the final time complexity is $T(M,N)=O(M)$