merge sort

divide array into two halves
recursively sort each half
merge two halves

mergesort

time: at most \(N \log N\) compares and \(6N \lg N\) array accesses to sort any array of size \(N\).
memory: use extra space proportional to N

merge

Goal: given two sorted subarrays a[lo] to a[mid] and a[mid+1] to a[hi], replace with sorted subarray a[lo] to a[hi]

Copy to auxiliary array aux[], use index i and j for each half, use index k for the output a[];
compare aux[] of each half with the minimum index, then copy the smaller one to a[];
increase both the corresponding index of the smaller one and k by one.

code for merge

// assertion  
assert isSorted(a,lo,mid);
assert isSorted(a, mid+1, hi);  

for (int k = lo; k<=hi; k++)
    aux[k] = a[k];
    
int i = lo, j = mid + 1;
for (int k = lo; k <= hi, k++)
{
    if      (i > mid)               a[k] = aux[j++];
    else if (j > hi)                a[k] = aux[i++];
    else if (less(aux[j], aux[i])   a[k] = aux[j++];
    else                            a[k] = aux[i++];
}

assert isSorted(a, lo, hi);

code for mergesort

create aux array outside the merge class

if (hi <= lo) return;  
int mid = lo + (hi - lo)/2;  
sort(a, aux, lo, mid);
sort(a, aux, mid+1, hi);
merge(a, aux, lo, mid, hi);

assertion

java -ea prog   // enable assertion  
java -da prog   // disable assertion  

improvement

Use insertion sort for small subarrays.
- mergesort has too much overhead for tiny subarrays
- cutoff to insertion sort for about 7 items.
```
if (hi <= lo +cutoff -1)
{
 Insertion.sort(a, lo, hi);
 return;
}
```
Stop if already sorted
- if the biggest in the first half is less than or equal to the smallest in second half.
- Helps for partially-ordered arrays
```
if (!less(a[mid+1], a[mid]) return;
```
eliminate the copy to the auxiliary array (save time but not space)

bottom-up mergesort

no recursion

pass through array, merging subarrays of size 1
repeat for subarray of size 2, 4, 8, 18, …

int N = a.length;
aux = new Comparable[N];
for (int sz = 1; sz < N; sz = sz + sz)
    for (int lo = 0; lo < N - sz; lo += sz+sz)
        merge(a, lo, lo+sz-1, Math.min(lo+sz+sz-1, N-1__;

sorting complexity

Computational complexity: to study efficiency of algorithms for solving a particular problem X.

Model of computation: allowable operations
Cost model: Operation count(s)
Upper bound: Cost guarantee provided by some algorithms for X
Lower bound: Proven limit on cost guarantee of all algorithms for X
Optimal algorithm: Algorithm with best possible cost guarantee for X

Example: sorting

Model of computation: decision tree
Cost model: # of compares
Upper bound: \(\sim N \lg N\) from mergesort
Lower bound: \(N \lg N\)
Optimal algorithm: mergesort (but not for space)

Any compare-based sorting algorithm must use at least \(\lg(N!) \sim N \lg N\) compares in the worst case.

worst case dictated by height h of decision tree.
binary tree of height h has at most \(2^h\) leaves
\(N!\) different orderings \(\rightarrow\) at least \(N!\) leaves

\(2^h \ge \text{# leaves} \ge N! \)

Lower bound may not hold if the algorithm has some information:

initial order of the input
distribution of key values
representation of the keys

comparators

Can sort using an alternate order or many different order
Must be a total order

Arrays.sort(a, String.CASE_INSENSITIVE_ORDER);  

public static void sort(Object[] a, Comparator comparator)
{   
    int N = a.length;
    for (int i = 0; i<N; i++)
        for (int j = i; j > 0 && less(comparator, a[j], a[j-1]); j--)
            exch(a,j,j-1);
            
    private static boolean less(Comparator c, Object v, Object w)
    { return c.compare(v,w) < 0;    }
    private static void exch(Object[] a, int i, int j)
    { Object swap = a[i]; a[i]=a[j]; a[j] = swap;   }
}

implement

public static final Comparator<Student> BY_NAME = new ByName();
private static class ByName implements Comparator<Student>
{
    public int compare(Student v, Student w)
    { return v.name.compareTo(w.name);  }
}

// use
Arrays.sort(a, Student.BY_NAME);

stability

Insertion sort and mergesort are stable (but not selection sort or shellsort)

For Insertion sort, equal items never move pass each other.
Selection sort is not stable since it have long-distance exchange might move an item pass some equal item.
The original order of equal items may be changed so unstable.

quicksort

shuffle the array
partition so that, for some j
- entry a[j] is in place
- no larger entry to the left of j
- no smaller entry to the right of j
sort each piece recursively.

Phase 1: repeat until i and j pointers cross.

scan i from the left to right so long as (a[i]<a[lo])
scan j from the right to left so long as (a[j]>a[lo])
exchange a[i] with a[j]

Phase 2: When pointer cross

exchange a[lo] with a[j]

private static int partition(Comparable[] a, int lo, int hi)
{
    int i = lo, j = hi + 1;
    while (true)
    {
        while (i < hi && less(a[++i], a[lo]);
        while (j > lo && less(a[lo], a[--j]);
        if (i >= j) break;
        exch(a, i, j);
    }
    
    // phase 2
    exch(a, lo, j);
    return j;
}

// quicksort, recursively

public class Quick
{
    private static int partition(Comparable[] a, int lo, int hi)
    {...}
    public static void sort(Comparable[] a)
    {
        StdRandom.shuffle(a);
        sort(a, 0, a.length-1);
    }
    private static void sort(Comparable[] a, int lo, int hi)
    {
        if (hi <= lo + CUTOFF - 1) 
        {
            Insertion.sort(a, lo, hi);
            return;
        }
        int j =  partition(a, lo, hi);
        sort(a, lo, j-1);
        sort(a, j+1, hi);
    }

details

partition in-place: does not use extra space
terminating the loop: testing whether the pointers cross is a bit trickier than it might seem.
staying in bounds. The (j == lo) test is redundant but the (i == hi) is not.
preserving randomness: shuffling is needed for performance guarantee.
Equal keys: when duplicates are presents, it is better to stop on keys equal to the partitioning item’s key.

best case: \(\sim N \log N\)
worst case: \(\sim 0.5 N^2\) (if already sorted, so shuffling is needed.)

The average number of compares to quicksort an array of N distinct keys is \(\sim 2N \ln N\) and the number of exchanges is \(\sim \frac{1}{3}N \lg N\)

average case:

number of compare is \(\sim 1.39 N \lg N\)
39% more compares than mergesort
but faster than mergesort because of less data movement

properties

partitioning: constant extra space
depth of recursion: logarithmic extra space
not stable

improvements

insertion sort small subarrays (around 10)
best choice of pivot item is the median

// median
int m = medianOf3(a, lo, lo + (hi - lo)/2, hi);
swap(a, lo, m);

selection

Goal: Given an array of N items, find a \(k^{th}\) smallest item.
Upper bound: \(N \log N\)
Lower bound: \(N\)

quick-select

entry a[j] is in place
no larger entry to the left of j
no smaller entry to the right of j

StdRandom.shuffle(a);
int lo = 0; hi = a.length - 1;
while (hi > lo)
{
    int j = partition(a, lo, hi);
    if          (j < k) lo = j + 1;
    else if     (j > k) hi = j - 1;
    else               return a[k];
}
return a[k];

Quick-select takes linear time on average.

There is a compare-based selection algorithm whose worst-case running time is linear. But constants are too high \(\rightarrow\) not used in practice.

duplicate keys

Often, purpose of sort is to bring items with equal keys together.

sort population by age
find collinear points
remove duplicates from mailing list
sort job applicants

mergesort with duplicate keys: between \(\frac{1}{2}N \lg N\) and \(N\lg N\) compares.

quicksort with duplicate keys: algorithm goes quadratic unless partitioning stops on equal keys.

Recommend: Stop scans on items equal to the partitioning item. \(\sim N \lg N\) compares when all keys equal.
Desirable: Put all items equal to the partitioning item in place.

3-way partitioning (Dijkstra)

entries between lt and gt equal to partition item v.
no larger entries to left of lt
no smaller entries to the right of lt

public static void sort(Comparable[] a, int lo, int hi)
{
    if hi <= lo return;
    int lt = lo; gt = hi;
    Comparable v = a[lo];
    int i = lo;
    while (i <= gt)
    {
        int cmp = a[i].compareTo(v);
        if cmp > 0      exch(a, i, gt--);
        else if cmp <0  exch(a, i++, lt++);
        else            i++;
    }
    sort(a, lo, lt - 1);
    sort(a, gt + 1, hi);
}

Quicksort with 3-way partitioning is entropy optimal.

system sorts

`Arrays.sort()`

has different method for each primitive type
has a method for data types that implement Comparable
has a method that uses a Comparator
use tuned quicksort for primitive types (performance is key factor); tuned mergesort for objects

import java.util.Arrays;

Arrays.sort(a)

AI 2

Algorithm 17

Amazon 1

Authorization 1

Blog 3

Bootstrap 1

C++ 1

CCpp 5

CSS 2

Cloud 3

Code 1

Crawler 1

DNS 1

Database 17

DeepLearning 1

Design 16

Development 1

Docker 1

Express 1

GDB 1

Go 3

Google 4

HTML 3

IOS 1

Java 17

Javascript 4

Jekyll 1

Linux 4

MacOS 2

MachineLearning 16

Markdown 4

Mobile 1

MongoDB 2

Multi-threading 3

NAS 1

Network 11

NeuralNetwork 10

Node 1

OS 8

Public-speaking 1

Python 15

RESTful 1

Rails 9

React 1

Redis 1

Ruby 6

Shell 2

Spring 2

System 16

TCP 1

TDD 1

Thread 2

Vim 1

awk 1

git 1

jQuery 1

media 1

network 1

php 1

Princeton Algorithms P1W3 Mergesort and Quicksort

merge sort

mergesort

merge

code for merge

code for mergesort

assertion

improvement

bottom-up mergesort

sorting complexity

comparators

implement

stability

quicksort

quicksort

details

properties

improvements

selection

quick-select

duplicate keys

`Arrays.sort()`