Detailed explanation of how to sort sort in go language

sort package source code interpretation

Preface

In many places in our code business, we need to sort ourselves. The go standard library provides the sort package to implement the sort function. Here we see how the production-level sort function is implemented.

go version go1.16.13 darwin/amd64

How to use

Let's take a look at the main functions provided by sort

Sorting support for basic data type slices
Custom Less Sort Comparator
Customize the sorting of data structures
Determine whether the basic data type slices have been sorted
Basic data element search

Sort of basic data type slices

The sorting of []int, []float, []string has been implemented in the sort package.

func TestSort(t *) {
    s := []int{5, 2, 6, 3, 1, 4}
    ("Is it sorted out?", (s))
    (s)
    // Formal order    (s)
    // Inverse order    (((s)))
    (s)
    // Stable sorting    ((s))
    ("Is it sorted out?", (s))
    ("Find whether it exists", (s, 5))
    (s)

    str := []string{"s", "f", "d", "c", "r", "a"}
    (str)
    (str)

    flo := []float64{1.33, 4.78, 0.11, 6.77, 8.99, 4.22}
    sort.Float64s(flo)
    (flo)
}

Look at the output

Is it sorted false
[1 2 3 4 5 6]
[6 5 4 3 2 1]
Is it sorted true
Find if it exists 4
[1 2 3 4 5 6]
[a c d f r s]
[0.11 1.33 4.22 4.78 6.77 8.99]

sort itself is not a stable sort, it needs to be used in stable sorting. At the same time, the sorting is in ascending order by default, and can be used in descending order.

Custom Less Sort Comparator

If the sorting content we need is some complex structures, such as the chestnut below, is a structure, sorted according to a certain attribute in the structure, it can be implemented through a custom Less comparator

Using the less function, we can customize this function and then sort it, instead of stable sorting, stable sorting can be used

type Person struct {
    Name string
    Age  int
}

func TestSortSlice(t *) {
    people := []Person{
        {"Bob", 31},
        {"John", 42},
        {"Michael", 17},
        {"Jenny", 26},
    }

    (people, func(i, j int) bool {
        return people[i].Age &lt; people[j].Age
    })
    // Age positive order    (people)
    // Age reverse order    (people, func(i, j int) bool {
        return people[i].Age &gt; people[j].Age
    })
    (people)

    // Stable sorting    (people, func(i, j int) bool {
        return people[i].Age &gt; people[j].Age
    })
    (people)
}

Look at the output

[{Michael 17} {Jenny 26} {Bob 31} {John 42}]
[{John 42} {Bob 31} {Jenny 26} {Michael 17}]
[{John 42} {Bob 31} {Jenny 26} {Michael 17}]

Customize the sorting of data structures

In addition to customizing the Less sorting comparator, the sort package also provides an interface. As long as we implement the three methods provided in the sort package, we can complete sorting, searching and other operations through functions in the sort package.

// An implementation of Interface can be sorted by the routines in this package.
// The methods refer to elements of the underlying collection by integer index.
type Interface interface {
    // Len is the number of elements in the collection.
    Len() int

    // Less reports whether the element with index i
    // must sort before the element with index j.
    //
    // If both Less(i, j) and Less(j, i) are false,
    // then the elements at index i and j are considered equal.
    // Sort may place equal elements in any order in the final result,
    // while Stable preserves the original input order of equal elements.
    //
    // Less must describe a transitive ordering:
    //  - if both Less(i, j) and Less(j, k) are true, then Less(i, k) must be true as well.
    //  - if both Less(i, j) and Less(j, k) are false, then Less(i, k) must be false as well.
    //
    // Note that floating-point comparison (the < operator on float32 or float64 values)
    // is not a transitive ordering when not-a-number (NaN) values are involved.
    // See  for a correct implementation for floating-point values.
    Less(i, j int) bool

    // Swap swaps the elements with indexes i and j.
    Swap(i, j int)
}

See how to use it

type ByAge []Person

func (a ByAge) Len() int           { return len(a) }
func (a ByAge) Swap(i, j int)      { a[i], a[j] = a[j], a[i] }
func (a ByAge) Less(i, j int) bool { return a[i].Age < a[j].Age }

func TestSortStruct(t *) {
    people := []Person{
        {"Bob", 31},
        {"John", 42},
        {"Michael", 17},
        {"Jenny", 26},
    }

    (ByAge(people))
    (people)
}

Output

[{Michael 17} {Jenny 26} {Bob 31} {John 42}]

Of course, the sorting of []int, []float, and []string that have been implemented in the sort package also implements the interface

For the above three sorts, the first and the second ones can basically meet our needs, but the third one is more flexible.

Analyze the source code

Let's first look at what stability sorting is

Li Ru: Sort an array. If there are duplicate data in it, and the relative index position of the same data does not change when sorting, then it is a stable sort.

That is, there are two 5s and 5s in it. After the order is completed, the first 5 is still in the front and there is no position exchange with the subsequent repeated data 5, so this is the stable sorting.

Unstable sorting

The sorting algorithm in sort is used, quickSort (fast sort), heapSort (heap sort), insertionSort (insert sort), shellSort (hill sort)

Let's analyze the use of these sorting algorithms

You can see the call to Sort for sorting, and quickSort will eventually be called

func Sort(data Interface) {
    n := ()
    quickSort(data, 0, n, maxDepth(n))
}

Let's take a look at the implementation of quickSort

func quickSort(data Interface, a, b, maxDepth int) {
    // Use quick row when the slice length is greater than 12    for b-a &gt; 12 { // Use ShellSort for slices &lt;= 12 elements
        // maxDepth returns the threshold value that should be switched in quick sort        // Do heap sort        // Heap sorting is performed when maxDepth is 0        if maxDepth == 0 {
            heapSort(data, a, b)
            return
        }
        maxDepth--
        // doPivot is the fast-row core algorithm. It takes a point as the axis, puts the element not larger than the axis to the left, and the element greater than the axis to the right, returning the last subscript of the data smaller than the axis and the first subscript of the data larger than the axis.        // Subscript position a...mlo,pivot,mhi...b        // data[a...mlo] &lt;= data[pivot]
        // data[mhi...b] &gt; data[pivot]
        // There is no need to exchange data with the same median. Maintaining this range value can reduce the number of data.        mlo, mhi := doPivot(data, a, b)
        // Avoid recursion too deep        // Looping saves time than recursion. If there are large-scale child nodes, let the smaller ones recurse first, and maxDepth is the condition that can trigger heap sorting, and then sort it using heap sorting        if mlo-a &lt; b-mhi {
            quickSort(data, a, mlo, maxDepth)
            a = mhi // ., quickSort(data, mhi, b)
        } else {
            quickSort(data, mhi, b, maxDepth)
            b = mlo // ., quickSort(data, a, mlo)
        }
    }
    // If the length of the slice is greater than 1 or less than 12, use shell sort    if b-a &gt; 1 {
        // Do ShellSort pass with gap 6
        // It could be written in this simplified form cause b-a &lt;= 12
        // Here we do a round of shell sorting        for i := a + 6; i &lt; b; i++ {
            if (i, i-6) {
                (i, i-6)
            }
        }
        // Sorting the insertion        insertionSort(data, a, b)
    }
}

// maxDepth returns the threshold value that should be switched in quick sort// Do heap sortfunc maxDepth(n int) int {
    var depth int
    for i := n; i &gt; 0; i &gt;&gt;= 1 {
        depth++
    }
    return depth * 2
}

// doPivot is the fast-row core algorithm. It takes a point as the axis, puts the element not larger than the axis to the left, and the element greater than the axis to the right, returning the last subscript of the data smaller than the axis and the first subscript of the data larger than the axis.// Subscript position lo...midlo,pivot,midhi...hi// data[lo...midlo] &lt;= data[pivot]
// data[midhi...hi] &gt; data[pivot]
func doPivot(data Interface, lo, hi int) (midlo, midhi int) {
    m := int(uint(lo+hi) &gt;&gt; 1) // Written like this to avoid integer overflow.
    // Tukey's nine algorithm is used here, article link /blog/2009/06/23/tukey-median-ninther/    // Find the median number through this algorithm    if hi-lo &gt; 40 {
        // Tukey's ``Ninther,'' median of three medians of three.
        s := (hi - lo) / 8
        medianOfThree(data, lo, lo+s, lo+2*s)
        medianOfThree(data, m, m-s, m+s)
        medianOfThree(data, hi-1, hi-1-s, hi-1-2*s)
    }

    // Find the median data[m] <= data[lo] <= data[hi-1]    medianOfThree(data, lo, m, hi-1)

    // Invariants are:
    //    data[lo] = pivot (set up by ChoosePivot)
    //    data[lo &lt; i &lt; a] &lt; pivot
    //    data[a &lt;= i &lt; b] &lt;= pivot
    //    data[b &lt;= i &lt; c] unexamined
    //    data[c &lt;= i &lt; hi-1] &gt; pivot
    //    data[hi-1] &gt;= pivot
    // Median    pivot := lo
    a, c := lo+1, hi-1

    // Processing to make data[lo < i < a] < pivot    for ; a &lt; c &amp;&amp; (a, pivot); a++ {
    }
    b := a
    for {
        // Processing to make data[a <= i < b] <= pivot        for ; b &lt; c &amp;&amp; !(pivot, b); b++ {
        }
        // Processing to make data[c <= i < hi-1] > pivot        for ; b &lt; c &amp;&amp; (pivot, c-1); c-- { // data[c-1] &gt; pivot
        }
        // The left and right overlap or are already on the right side        if b &gt;= c {
            break
        }
        // data[b] &gt; pivot; data[c-1] &lt;= pivot
        // The data on the left is larger than the right, exchange, and then sort        (b, c-1)
        b++
        c--
    }
    // If hi-c&lt;3 then there are duplicates (by property of median of nine).
    // Let's be a bit more conservative, and set border to 5.
    // If hi-c<3 then duplicates exist (by the attribute with median of 9).    // Let's be a little more conservative and set the border to 5.
    // Because c is the critical value for dividing the size of pivot, when dividing 9 values, it should be 4 on both sides.    // Since the left is <=, there is an extra equal situation, so there is no problem in the distribution of 5 and 3.    // If hi-c<3, the value of c is obviously biased to hi, it means that there are multiple and pivot repeated values    // To be more conservative, set to 5 (it's just a multiple verification anyway)    protect := hi-c &lt; 5
    // Even if it is greater than or equal to 5, it may be because the total value of the element is large, so compare whether hi-c is less than 1/4 of the total number    if !protect &amp;&amp; hi-c &lt; (hi-lo)/4 {
        // Compare some special points and intermediate numbers        dups := 0
        // Processing to make data[hi-1] = pivot        if !(pivot, hi-1) {
            (c, hi-1)
            c++
            dups++
        }
        // Processing to make data[b-1] = pivot        if !(b-1, pivot) {
            b--
            dups++
        }
        // m-lo = (hi-lo)/2 &gt; 6
        // b-lo &gt; (hi-lo)*3/4-1 &gt; 8
        // ==&gt; m &lt; b ==&gt; data[m] &lt;= pivot
        if !(m, pivot) { // data[m] = pivot
            (m, b-1)
            b--
            dups++
        }
        // If the above if enters twice, it proves that it is now a skewed distribution (that is, the left and right imbalance)        protect = dups &gt; 1
    }
    // Unbalanced, then process it    // Here are two groups of <pivot and =pivot    if protect {
        // Protect against a lot of duplicates
        // Add invariant:
        //    data[a &lt;= i &lt; b] unexamined
        //    data[b &lt;= i &lt; c] = pivot
        for {
            // Processing to make data[b] == pivot            for ; a &lt; b &amp;&amp; !(b-1, pivot); b-- {
            }
            // Processing to make data[a] < pivot            for ; a &lt; b &amp;&amp; (a, pivot); a++ {
            }
            if a &gt;= b {
                break
            }
            // data[a] == pivot; data[b-1] &lt; pivot
            (a, b-1)
            a++
            b--
        }
    }
    // Exchange median to middle    (pivot, b-1)
    return b - 1, c
}

For the use of these sorting algorithms, the sort package is used in a mixed manner

1. If the slice length is greater than 12, use fast order. If the conditions for using heap sort are met, if the following data are processed without this sorting, it will be converted into heap sorting;

2. If the slice length is less than 12, use shell sorting. Shell sorting only processes one round of data, and the subsequent data sorting is sorted using insertion;

Heap sorting and insert sorting are normal sorting processing

// insertionSort sorts data[a:b] using insertion sort.
// Insert sortfunc insertionSort(data Interface, a, b int) {
    for i := a + 1; i &lt; b; i++ {
        for j := i; j &gt; a &amp;&amp; (j, j-1); j-- {
            (j, j-1)
        }
    }
}

// Heap sortingfunc heapSort(data Interface, a, b int) {
    first := a
    lo := 0
    hi := b - a

    // Build heap with greatest element at top.
    for i := (hi - 1) / 2; i &gt;= 0; i-- {
        siftDown(data, i, hi, first)
    }

    // Pop elements, largest first, into end of data.
    for i := hi - 1; i &gt;= 0; i-- {
        (first, first+i)
        siftDown(data, lo, i, first)
    }
}

Stable sorting

The sort package also provides stable sorting, which is implemented by calling.

// It makes one call to  to determine n, O(n*log(n)) calls to
//  and O(n*log(n)*log(n)) calls to .
func Stable(data Interface) {
    stable(data, ())
}

func stable(data Interface, n int) {
    // Define the size of the slice block    blockSize := 20 // must be &gt; 0
    a, b := 0, blockSize
    // If the slice length is greater than the size of the block, sort each block in multiple times    for b &lt;= n {
        insertionSort(data, a, b)
        a = b
        b += blockSize
    }
    insertionSort(data, a, n)

    // If there are multiple blocks, merge the sorted blocks    for blockSize &lt; n {
        a, b = 0, 2*blockSize
        for b &lt;= n {
            symMerge(data, a, a+blockSize, b)
            a = b
            b += 2 * blockSize
        }
        if m := a + blockSize; m &lt; n {
            symMerge(data, a, m, n)
        }
        // block Expand twice each loop until it is larger than the total number of elements, and then ends        blockSize *= 2
    }
}

func symMerge(data Interface, a, m, b int) {
    // If there is only one element to avoid unnecessary recursion, insert it directly here    // Process the left part    if m-a == 1 {
        // Use binary search to find the lowest index i        // In this way data[i] >= data[a] for m <= i < b.        // If such an index does not exist, use i == b to exit the search loop.        i := m
        j := b
        for i &lt; j {
            h := int(uint(i+j) &gt;&gt; 1)
            if (h, a) {
                i = h + 1
            } else {
                j = h
            }
        }
        // Swap values until data[a] reaches the position before i.
        for k := a; k &lt; i-1; k++ {
            (k, k+1)
        }
        return
    }

    // Same as above    // Process the right part    if b-m == 1 {
        // Use binary search to find the lowest index i
        // such that data[i] &gt; data[m] for a &lt;= i &lt; m.
        // Exit the search loop with i == m in case no such index exists.
        i := a
        j := m
        for i &lt; j {
            h := int(uint(i+j) &gt;&gt; 1)
            if !(m, h) {
                i = h + 1
            } else {
                j = h
            }
        }
        // Swap values until data[m] reaches the position i.
        for k := m; k &gt; i; k-- {
            (k, k-1)
        }
        return
    }

    for start &lt; r {
        c := int(uint(start+r) &gt;&gt; 1)
        if !(p-c, c) {
            start = c + 1
        } else {
            r = c
        }
    }

    end := n - start
    if start &lt; m &amp;&amp; m &lt; end {
        rotate(data, start, m, end)
    }
    // Recursively merge    if a &lt; start &amp;&amp; start &lt; mid {
        symMerge(data, a, start, mid)
    }
    if mid &lt; end &amp;&amp; end &lt; b {
        symMerge(data, mid, end, b)
    }
}

For stable sorting, insertion sorting and merge sorting are used

1. First, the data will be divided into chunks in each 20 groups, and the data in each block will be sorted using insert sort;

2. Then, use merge sorting to sort the sorted data blocks in pairs, complete the sorting one time, expand the data blocks to twice the previous one until all sorting is completed.

Find

The search function in sort is ultimately implemented by calling the search function.

func SearchInts(a []int, x int) int {
    return Search(len(a), func(i int) bool { return a[i] &gt;= x })
}

// Use binary searchfunc Search(n int, f func(int) bool) int {
    // Define f(-1) == false and f(n) == true.
    // Invariant: f(i-1) == false, f(j) == true.
    i, j := 0, n
    for i &lt; j {
                // Binary search        h := int(uint(i+j) &gt;&gt; 1) // avoid overflow when computing h
        // i ≤ h &lt; j
        if !f(h) {
            i = h + 1 // preserves f(i-1) == false
        } else {
            j = h // preserves f(j) == true
        }
    }
    // i == j, f(i-1) == false, and f(j) (= f(i)) == true  =&gt;  answer is i.
    return i
}

Searching in sort is relatively simple, using binary search

Interface

The sort package provides an Interface interface. We can customize the data structure and then implement the corresponding interface of the Interface, and we can use the methods in the sort package.

type Interface interface {
    Len() int

    Less(i, j int) bool

    Swap(i, j int)
}

Looking at the source code, you can see the existing sorting of data structures such as []int in the sort package, which also implements Interface

// Convenience types for common cases

// IntSlice attaches the methods of Interface to []int, sorting in increasing order.
type IntSlice []int

func (x IntSlice) Len() int           { return len(x) }
func (x IntSlice) Less(i, j int) bool { return x[i] < x[j] }
func (x IntSlice) Swap(i, j int)      { x[i], x[j] = x[j], x[i] }

This idea is quite good. You can learn from it later to provide abstract interfaces for the variable parts so that users can implement them according to their own scenarios.

For basic sorting, as long as you implement the Interface method, you can have these basic abilities.

Summarize

sort For the implementation of sorting algorithm, a high-performance sorting algorithm is finally implemented.

The IntSlice interface has been abstracted, and users can implement the corresponding methods themselves, and then they can have the capabilities provided in sort

refer to

【Example code in the article】/boilingfrog/Go-POINT/blob/master/golang/sort/sort_test.go
【Golang sort sort】/K346K346/article/details/118314382
【John Tukey’s median of medians】/blog/2009/06/23/tukey-median-ninther/
【code_reading】/Junedayday/code_reading/blob/master/sort/
【sort package in go】/2022/03/06/go/ sort package/

This is the end of this article about how to sort sort in go language. For more related go languages, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!