TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap

Largest elements 0831

For Big O notation:

$\underbrace{O(n)}_{\le c n} \underbrace{=}_{\text{belong to}} \underbrace{O(n^2)}_{\le c n^2}$

Therefore, we know:

$O(n^2) \ne O(n)$

Algorithm

An algorithm should be presented with 4 components:

problem statelemt
algorithm design
algorthm analysis
discussion

Example

e.g. 1

find largest element in array $A[1,n]$
design
- a. max = A[1]
- b. FOR i = 2 to n DO
  - IF max < A[i]
  - THEN max = A[i]
- c. RETURN max
make sure it’s currect, time complexity is $O(n)$
discussion

e.g. 2

largest(i, j)

IF i == j
THEN return A[i]

k = (i + j) / 2

max1 = largest(i, k)
max2 = largest(k+1, j)

RETURN max(max1, max2)

main()

RETURN largest(1, n)

Analysis:

Let $T(h)$ be the time for largest when it’s applied on a subarray of length h.

We know $T(1) = O(1)$ .

When $h > 1$ :

$T(h) = \underbrace{O(1)}_{\text{step 1}} + \underbrace{O(1)}_{\text{step 2}} + \underbrace{2 T(\frac{h}{2})}_{\text{step 3}} + \underbrace{O(1)}_{\text{step 3 assignment}}$

Therefore,

$T(h) = 2 T(\frac{h}{2}) + O(1)$

Equations can be converted to:

$T(1) = O(1) \Rightarrow T(1) \le C_1$ $T(h) = 2 T(\frac{h}{2}) + O(1) \Rightarrow T(h) \le 2 T(\frac{h}{2}) + C_2$

By recursion:

$T(h) \le 2^2 T(\frac{h}{2^2}) + 2 C_2 + C_2$ $T(h) \le 2^{\log h} T(1) + (1 + 2 + 2^2 + ... + 2^{\log h - 1}) C_2$ $T(h) \le h C_1 + (h - 1) C_2$

In conclusion:

$T(h) = O(h)$

We can also do “guess and verify”.

0905

To organize the data to speed up searching, we can sort the data in $O(n \log n)$ . Then do binary search in $O(\log n)$ .

No searching algorithm can do better than $O(\log n)$ .

Binary search tree 0907

A set is a collection of elements. All elements of a set are different, which means no set can contain two copies of the same element.

For a node in a BST, all values in its left subtree is less than its value; all values in its right subtree is no less than its value.

Operations

Search(x, A)
Insert(x, A)
Delete(x, A)

All operations can not be done in time better than $O(\log n)$ .

Balance BST

Use rotation, AVL, Red-black tree
split & merge, B+ tree

Rotation

AVL tree: for each node, remember 2 heights of its two subtrees, so can know how to rotate
Red-black: add color to the node (1 bit). For every path root to leaf
- numbers of black are the same
- no consecutive two red

Split & merge

2-3 tree

Definition

tree

A tree is

undirected, no cycles, connected, has a root
recursive definition

2-3 tree

A tree is a 2-3 tree if

each non-leaf has either 2 or 3 children
all leaves are at the same level / every path from the root to a leaf is of the same length

An empty set or a single root can be regarded as 2-3 tree

Data are stored in leaves, sorted in increasing order.

With n leaves, the tree has at most n-1 non-leaves.

A 2-3 tree of n leaves has a height bounded by $\log n$ .

Prove: remove a branch if having 3 children for each node, get a complete binary tree with $n_1 \le n$ leaves. Meanwhile, height doesn’t change.

$h = \log_2(n_1) \le \log_2(n)$

A linearly ordered set of elements can be represented by a 2-3 tree by assigning the elements to the leaves of the tree in such a way that for any non-leaf node of the tree, all elements stored in its first child are less than any elements stored in its second child, and all elements stored in its second child are less than any elements stored in its third child (if it has a third child).

Each non-leaf node v of a 2-3 tree has two or three children, which will be named child1(v), child2(v), child3(v), resp.. The node v also keeps three values for the corresponding subtrees:

l(v): the largest element stored in the subtree rooted at child1(v).
m(v): the largest element stored in the subtree rooted at child2(v).
h(v): the largest element stored in the subtree rooted at its child3(v) (if child3(v) exists).

Strictly speaking, the third value h(v) is not needed. All algorithms can be implemented without the value h(v), and without increasing the time complexity. However, we suggest to keep the third value in an implementation, which will simplify certain implementation details.

Operations

Search

root: r
element to search: x

SEARCH(r, x)

If (r is empty) report failure;
If (r is a leaf node) report (value(r) == x);
If (l(r) >= x) SEARCH(child1(r), x);
Else If (m(r) >= x) SEARCH(child2(r), x);
Else If (r has a third child) SEARCH(child3(r), x);
Else report failure.

Since the height of a 2-3 tree is $O(\log n)$ , and the algorithm simply follows a path in the tree from the root to a leaf, and spends time $O(1)$ on each level, the time complexity of the algorithm SEARCH is $O(\log n)$ .

Insert

To insert a new element x into a 2-3 tree T rooted at r, we apply a recursive algorithm that dose two things:

insert x into the tree T rooted at r;
report whether this insertion splits the tree T rooted at r into two 2-3 trees.

If the 2-3 tree T has at most one leaf, then the job is easy:

if T has no leaf (i.e., T represents an empty set), then we simply make a 2-3 tree that consists of a single node, which is both the root and the leaf of the tree, with a value x.
if T has only one leaf of value y, then the tree T is a single-node tree, inserting x into T makes a two-leaf tree, whose values are x and y, respectively, and the leaves are ordered properly.

Now if the 2-3 tree T has a height at least 1 with at least two leaves, then we proceed at first as if we were searching x in the tree T. However, at the level just above the leaves, we start our insertion operation recursively. In general, suppose that we want to add a new child w to a node v in the 2-3 tree. If v has only two children, we simply make w a new child of v, placing the children in the proper order and updating the information of the node v.

Suppose, however, that v already has three children v1, v2, and v3. Then w would be the fourth child of v. We cannot have a node with four children in a 2-3 tree, so we split the node v into two nodes, which we call v and v’. With the new node v’, we can let the first two of (v1, v2, v3, w) (in terms of the linear order) be children of v, and let the rest two be children of v’. Now, the node v’ is the root of a subtree and should be added as a new child to the parent of v. Thus, the operation now can be recursively done at the level of the parent of v.

One special case occurs when we wind up splitting the root. In that case we create a new root, whose two children are the two nodes into which the old root was split. This is how the number of levels in (i.e., the height of) a 2-3 tree increases.

INSERT(r, x)

If (the tree rooted at r has < 2 leaves)

process directly; return;

Addson(r, x, r’);

If (r’ != NULL)

create a new node v; let r and r’ be children of v;

return v.

The procedure Addson(r,x,r’) above is implemented by the following recursive algorithm, which inserts a new element x to the 2-3 tree rooted at r. Moreover, if this insertion causes splitting the node r due to exceeding the number of children, then a new node r’ is created to take two children from r. Therefore, if r’ is not empty when the procedure returns, then r and r’, respectively, are the roots of two 2-3 trees of the same height.

Addson(r, x, r’)

r’ = NULL;

If (r is a parent of leaves)

If (r has 2 children) add x as a new child of r;
Else \\ r has 3 children

order x and the three children of r in the linear order;
let the first two be children of r; and the rest two be children of r’;

return;

If (l(r) >= x) v = child1(r);
Else If (m(r) >= x or child3(r)==NULL) v = child2(r);
Else v = child3(r);

Addson(v, x, v’);

If (v’ == NULL) return;

If (v’ != NULL and r has 2 children) add v’ as a new child of r;
Else \\ r has 3 children and v’ is not NULL

order v’ and the three children of r in the linear order;
let the first two be children of r; and the rest two be children of r’;

Minimum and maximum

Given a 2-3 tree T we want to find out the minimum element stored in the tree. Recall that in a 2-3 tree the numbers are stored in leaf nodes in ascending order from left to right. Therefore the problem is reduced to going down the tree, always selecting the left most link, until a leaf node is reached. This leaf node should contain the minimum element stored in the tree. Evidently, the time complexity of this algorithm is $O(\log n)$ for a 2-3 tree with n leaves.

MINIMUM(r)

If (r is empty) return failure;
If (r is a leaf) return value(r);
Else return MINIMUM(child1(r)).

The operation MAXIMUM can be similarly implemented to run in time $O(\log n)$ .

Delete

When we delete a leaf from a 2-3 tree, we may leave its parent v with only one child. If v is the root, delete v and let its lone child be the new root. Otherwise, let p be the parent of v. If p has another child, adjacent to v on either the right or the left, and that child of p has three children, we can transfer the proper one of those three to v. Then v has two children, and we are done.

If the children of p adjacent to v have only two children, transfer the lone child of v to an adjacent sibling of v, and delete v. Should p now have only one child, repeat all the above, recursively, with p in place of v.

Summarizing these discussions together, we get the algorithm DELETE, as shown below, where procedure DELETE() is merely a driver for sub-procedure DEL() in which the actual work is done.

The variables done and 1son in DEL() are boolean flags used to indicate successful deletion and to detect the case when a node in the tree has only one child, respectively.

In the worst case we need to traverse a path in the tree from root to a leaf to locate the node to be deleted, then from that leaf node to the root, in case that every non-leaf node on the path has only two children in the original 2-3 tree T. Thus the time complexity of DELETE algorithm for a 2-3 tree with n leaves is $O(\log n)$ .

Algorithm DELETE(r, x)

If (r == Null) return failure;

If (r is a leaf)

If (x == l(r)) return Null;
Else return failure;

DEL(r, x, done, 1son);

If (done == false) return failure;

If (1son == true) return child1(r);
ELSE return r.

Algorithm DEL(r, x, done, 1son)

done = true;

If (r is a parent of leaves) process properly and return;
\\ i.e., delete x if it is in the tree; update done and 1son;

If (x <= l(r)) r’ = child1(r);
Else if (x <= m(r)) or (child3(r) == Null) r’ = child2(r);
Else r’ = child3(r);

DEL(r’, x, done’, 1son’);

If (done’ == false) done = false; return;

If (1son’ == true)

If (r has at least 4 grandchildren)

reorganize the grandchildren of r so that each of r and its children has either 2 or 3 children
1son = false

Else make r a 1-child node (with 3 grandchildren);

1son = true; return.

Splice

Splice(T1, T2) will merge two trees into one. all values in T1 is no greater than those in T2. It can be done in $O(\log n)$ .

Suppose the number of nodes on T1 and T2 are n1 and n2, then the height of the two trees are h1 and h2, where the h1, h2 are in the order of $\log n1$ and $\log n2$ . There are three possible cases:

h1 = h2:
we can simply add one node as the root of the new tree, whose left child is T1, right child is T2. We choose the key values of root. The operation needed is O(1)
h1 < h2:
The basic idea is to insert T1 as the subtree of T2. We take the leftest path from the root of T2 to the node N1. The height of N1 in T2 is h1+1.
- If N1 has only two children, we can insert T1 as the leftest child of N1. Then done. It takes O(h2-h1) operation.
- If N1 has already three children, N1 will have four children after the insertion of T1. We need to split the N1 into two new nodes. If N1 has no parent, the tree grows in height by one, the height of new tree is h2+1 now; if N1’s parent becomes full, we split the parent; otherwise we are done. This split takes O(h2-h1). So totally it’s still O(h2-h1).
h1 > h2:
Similar to case 2.

We need first get the height of T1 and T2 in $MAX(O(\log n1), O(\log n2))$ . Then merging takes no more than $MAX(O(\log n1), O(\log n2))$ .

Split

Split(x, T) will split tree T into 2 trees according to x value. So that all values in left tree is less than or equal to x; all values in right tree is no less than x. It can be done in $O(\log n)$ . Basically, it’s search and split along path.

We march from the root T to the key x, the node we pass are $n_1$ , …, $n_m$ . We delete the edge bewtween $n_i$ and $n_{i+1}$ . So every time deleting an edge, we split one tree into 2 or 3 parts. One is a 2-3 tree, we keep marching on this part. The others are trees whose root’s degree is 1, with a 2-3 tree below the root. We throw away the root, keep the rest as $R[i]$ if that part is on the right of the original tree, keep the rest as $L[i]$ if that part is on the left of original tree. So finally we we come down to the x, we have a sequence of $\log n$ trees, $L[1], L[2],...,L[k]$ and $R[1],...,R[m]$ . L and R are in decreasing order of height. This marching takes $O(\log n)$ .

Furthermore, we can merge all trees in L and R in increasing order of height. Each merging/splicing takes $O(h2 - h1)$ , the difference of the height. Then merge two L and R trees. Total time is $O(\log n)$ .

Find first k largest

DFS, time is $O(\log n + k)$ .

Node data structure

key values: k1, k2, k3
pointers to subtree: p1, p2, p3
- if p1 is null, it means this is a leaf node.

Why do we prefer a 2-3 tree instead of 2-4, 2-5?

2-5 tree can waste space. More than half space can be empty. We want fill as many as possible in a node
- 2-3 tree: A little bit more than half
2-4 tree cannot hold a set with 3 or 5 … leaves

Heap 0919

A heap is a complete binary tree with possibly some rightmost leaves removed.

At most one with only one child
Father’s value cannot be larger than its children’s - min-heap
- The largest element must be a leaf

Operations

Insert

Insert to the right of the rightmost leaf, then swim it up (swap larger father with it). It takes $O(\log n)$ .

Delete

Give the pointer of the node to delete, put the rightmost leaf at that position, swim it up or sink it down.

Search

Heap cannot do search in $O(\log n)$ .

Implementation

Store the heap in an array; maintain the position of end, lastIndex.

If index starting with 1, two sons of father i are 2i and 2i+1.
If index starting with 0, two sons of father i are 2i+1 and 2i+2.

Reference

This is my class note of CSCE 629 in Texas A&M taught by Dr. Jianer Chen. Credit to him.

AI 2

Algorithm 17

Amazon 1

Authorization 1

Blog 3

Bootstrap 1

C++ 1

CCpp 5

CSS 2

Cloud 3

Code 1

Crawler 1

DNS 1

Database 17

DeepLearning 1

Design 17

Development 1

Docker 1

English 1

Express 1

GDB 1

Go 3

Google 4

HTML 3

IOS 1

Java 17

Javascript 4

Jekyll 1

Linux 4

MacOS 2

MachineLearning 17

Markdown 4

Mobile 1

MongoDB 2

Multi-threading 3

NAS 1

Network 11

NeuralNetwork 10

Node 1

OS 8

Public-speaking 1

Python 15

RESTful 1

Rails 9

React 1

Redis 1

Ruby 6

Shell 2

Spring 2

System 17

TCP 1

TDD 1

Thread 2

Vim 1

awk 1

git 1

jQuery 1

media 1

network 1

php 1