/  Back  /  Home  / 

The Algorithm

Bucket sort is a sorting algorithm that performs in linear time when the input is uniformly distributed over the interval \([0,1)\).

Description

The interval is divided into \(n\) buckets, and data is distributed to these buckets. Since data is uniformly distributed, we do not expect many numbers to fall into the same bucket. After distribution, each bucket is sorted and then the data is globally sorted by concatenating the buckets in order.


Note: If the numbers are integers uniformly distributed over some range, then the range can be reduced to \([0, 1)\) by dividing each number to the size of that range.

Implementation

void sort(vector<int>& a, int M) {
    int n = (int)a.size();
    vector<list<int>> bucket(n, list<int>());

    // distribute into buckets
    for (int i = 0; i < n; i++) {
        int b = (int)floor(((double)a[i] / (double)M) * (double) n);
        bucket[b].push_back(a[i]);
    }

    // local sort each bucket
    for (int i = 0; i < n; i++)
        if (!bucket[i].empty()) bucket[i].sort();

    // concatenate buckets in order
    int j = 0;
    for (int i = 0; i < n; i++) {
        while (!bucket[i].empty()) {
            a[j++] = bucket[i].front();
            bucket[i].pop_front();
        }
    }
}

Correctness

Entire set is traversed and all items are placed into buckets. Since the buckets themselves are ordered by construction, if two items are in two different buckets, then the items are ordered. If two items are in the same bucket then they are sorted because the buckets are sorted before concatenation. Since the buckets are concatenated in sorted order and each bucket contains sorted elements, then the algorithm produces a sorted sequence for any given input.

Complexity Analysis

Time Complexity

In the worst case, the numbers fall into the same bucket, and the complexity will be driven by the complexity of the sorting algorithm, so \(T(n) = O(n^2)\) if insertion is used or \(T(n) = O(n \cdot logn)\) if qsort is used.


In the average case and best case, the numbers are evenly distributed over the buckets and \(n \sim k\), so \(T(n) = O(n)\).


Note: In CLRS, section 8 can be found a detailed proof for average case using random variable indicators.

Space Complexity (auxiliary)

We need a list of buckets of size \(n + k\), so \(S(n) = O(n + k)\) where \(k\) is the number of buckets and \(n\) is the size of the original array.

Stability

Bucket sort is stable, if the underlying sort is also stable, as equal keys are inserted in order to each bucket. In the implementation above the buckets are sorted using qsort which is unstable, but if insertion sort is used instead, then the implementation above becomes stable.


/  Back  /  Home  /