Introduction to algorithm learning: Using C language to implement major basic sorting algorithms

First, let’s take a look at some related concepts of the sorting algorithm:
1. Stable sorting and non-stable sorting
Simply put, all equal numbers can still maintain their relative order before sorting after a certain sorting method. Let's say that this sorting method is stable. On the contrary, it is non-stable.
For example: A set of numbers is before sorting a1, a2, a3, a4, a5, where a2=a4, and after some sorting, it is a1, a2, a4, a3, a5, then we say that this sorting is stable, because a2 is before sorting, and after sorting, it is still before a4. If it becomes a1, a4, a2, a3, a5, it is not stable.

2. Inner sort and outer sort
During the sorting process, all the numbers that need to be sorted are in memory, and their storage order is adjusted in memory, which is called in-sorting;
During the sorting process, only part of the numbers are tuned into memory, and the storage order of the numbers in external memory is adjusted by the memory. The method of sorting the order of the numbers in external memory is called external sorting.

3. The time complexity and spatial complexity of the algorithm
The so-called time complexity of an algorithm refers to the computational workload required to execute the algorithm.
The spatial complexity of an algorithm generally refers to the memory space required to execute this algorithm.

Next, let’s actually look at the specific C language implementation of several major sorting algorithms:

Bubble Sort

If the sequence is arranged from small to large, then any two adjacent elements should satisfy the relationship of a[i-1] <= a[i]. In bubble sorting, we traverse the array from right to left, comparing two adjacent elements. If the order of the two elements is wrong, then exchange them. If the order of the two elements is correct, no exchange is done. After a traversal, we can ensure that the smallest element (bubble) is at the leftmost position.

After a traversal, bubble sorting does not guarantee that all elements have been arranged from small to large. Therefore, we need to re-traverse the array elements from right to left and do bubble sorting. This time we traversal, we don’t have to consider the leftmost element. Then continue the traversal of up to n-1 times.

If no elements are exchanged during a certain traversal, it means that the array has been sorted and you can abort and stop sorting. The worst case is that in the starting array, the largest element is located on the leftmost, so the bubble algorithm must go through n-1 traversals to arrange the array well, and cannot complete the sorting in advance.

/*By Vamei*/
/*swap the neighbors if out of order*/
void bubble_sort(int a[], int ac)
{
  /*use swap*/
  int i,j;
  int sign;
  for (j = 0; j < ac-1; j++) {
    sign = 0;
    for(i = ac-1; i > j; i--)
    {
      if(a[i-1] > a[i]) {
        sign = 1;
        swap(a+i, a+i-1);
      }
    }
    if (sign == 0) break;
  }
}

Insertion Sort

Suppose that when the freshmen report, we line up the freshmen according to their height (that is, sort). If a student joins at this time, we will add the student to the end of the team. If the student is lower than the front student, then let the student exchange positions with the front student. The student will eventually change to where he should be. This is the basic principle of insertion sorting.

For the starting array, we think that initially, there is a student, that is, the leftmost element (i=0), forming an ordered team.

Then a second student (i=1) joins the team, and the second student exchanges to the position he should be; then a third student joins the team, and the third student exchanges to the position he should be... When all n students join the team, our sorting is completed.

/*By Vamei*/
/*insert the next element 
 into the sorted part*/
void insert_sort(int a[], int ac)
{
  /*use swap*/
  int i,j;  
  for (j=1; j < ac; j++) 
  {
    i = j-1;
    while((i>=0) && (a[i+1] < a[i])) 
    {
      swap(a+i+1, a+i);
      i--;
    }
  }
}

Selection Sort

The final result of sorting: No element is greater than the element located to the right (a[i] <= a[j], if i <= j). So, in an ordered sequence, the smallest element is ranked at the leftmost position, the second smallest element is ranked at the position i=1... The largest element is ranked at the last.

Selecting sorting is to find the smallest element in the starting array first and swap it to i=0; then look for the smallest element in the remaining elements, swap it to the position of i=1... until the second largest element is found, swap it to the position of n-2. At this time, the sorting of the entire array is completed.

/*By Vamei*/
/*find the smallest of the rest,
 then append to the sorted part*/
void select_sort(int a[], int ac) 
{
  /*use swap*/
  int i,j;
  int min_idx;
  for (j = 0; j < ac-1; j++) 
  {
    min_idx = j;
    for (i = j+1; i < ac; i++) 
    {
      if (a[i] < a[min_idx]) 
      {
        min_idx = i;
      }
    }
    swap(a+j, a+min_idx);
  }  
}

Hill Sort (Shell Sort)

We mentioned in the bubble sort that the worst happens when the large elements are at the beginning of the array. These large elements at the beginning of the array need to be traversed multiple times before they can be exchanged to the end of the queue. Such elements are called turtles.

The reason for the turtle element is that the bubble sort is always compared and exchanged between two adjacent elements. So every time you traverse from right to left, the large element can only move one to the right. (The small elements are located at the end of the team, called rabbit elements, and they can be quickly exchanged to the head of the team.)

Hill sorting compares and exchanges elements at larger intervals, so that when large elements are exchanged, they can move more than one position to the right, thereby moving the turtle element faster. For example, you can divide the array into 4 subarrays (i=4k, i=4k+1, i=4k+2, i=4k+3) and bubble sort each subarray. For example, subarray i=0, 4, 8, 12.... At this time, the interval of each exchange is 4.

After sorting the four subarrays, the order of the arrays may not be arranged well. Hill sorting will continuously reduce the interval, re-form the sub-array, and bubble sort the sub-array... When the interval is reduced to 1, it is equivalent to a bubble sorting of the entire array. Then, the order of the arrays is arranged.

Hill sorting can not only be done with bubble sorting, but also with other sorting methods.

/*By Vamei*/
/*quickly sort the turtles at the tail of the array*/
void shell_sort(int a[], int ac)
{
  int step;
  int i,j;
  int nsub;
  int *sub;

  /* initialize step */
  step = 1;
  while(step < ac) step = 3*step + 1;

  /* when step becomes 1, it's equivalent to the bubble sort*/
  while(step > 1) {
    /* step will go down to 1 at most */
    step = step/3 + 1;
    for(i=0; i<step; i++) {
      /* pick an element every step, 
       and combine into a sub-array */
      nsub = (ac - i - 1)/step + 1;      
      sub = (int *) malloc(sizeof(int)*nsub);
      for(j=0; j<nsub; j++) {
        sub[j] = a[i+j*step]; 
      }
      /* sort the sub-array by bubble sorting. 
       It could be other sorting methods */
      bubble_sort(sub, nsub);
      /* put back the sub-array*/
      for(j=0; j<nsub; j++) {
        a[i+j*step] = sub[j];
      }
      /* free sub-array */
      free(sub);
    }  
  }
}

Shell Sorting depends on the selection of intervals. A common choice is to set this interval to 1/1.3 of the last interval. See reference books.

Merge Sort

If we were to sort a deck of poker by number size. Previously, two people had arranged half of them in order. Then we can put these two piles of poker upwards, assuming that the small cards are on it. At this point, we will see the two top cards in the deck.

We take out the smallest card and put it in our hands. Two more cards are exposed on the top of the two decks, and continue to take the smaller one in your hand... Until all the cards are put into your hand, then the whole deck will be arranged in order. This is the merge sorting.

In the following implementation, recursion is used:

/*By Vamei*/
/*recursively merge two sorted arrays*/
void merge_sort(int *a, int ac)
{
  int i, j, k;  
  int ac1, ac2;
  int *ah1, *ah2;
  int *container;

  /*base case*/  
  if (ac <= 1) return;

  /*split the array into two*/
  ac1 = ac/2;
  ac2 = ac - ac1;
  ah1 = a + 0;
  ah2 = a + ac1;

  /*recursion*/
  merge_sort(ah1, ac1);
  merge_sort(ah2, ac2);
 
  /*merge*/
  i = 0;
  j = 0;
  k = 0;
  container = (int *) malloc(sizeof(int)*ac);
  while(i<ac1 && j<ac2) {
    if (ah1[i] <= ah2[j]) {
      container[k++] = ah1[i++];
    } 
    else {
      container[k++] = ah2[j++];
    }
  }
  while (i < ac1) {
    container[k++] = ah1[i++];
  }
  while (j < ac2) {
    container[k++] = ah2[j++];
  }

  /*copy back the sorted array*/
  for(i=0; i<ac; i++) {
    a[i] = container[i];
  }
  /*free space*/
  free(container);
}

Quick Sort

We still consider sorting students by height. In the quick sort, we randomly select a student and use the student's height as a reference (pivot). Then let the lower than the student stand on the right side of the student and the rest stand on the left side of the student.

It was obvious that all the students were divided into two groups. The student on the right side of the student is all larger than the student on the left side of the student.

We continued, picking out one student at a low-height student group and dividing the students in the low-height group into two groups (very low and not so low). Similarly, the high student group was divided into two groups (not so high and very high).

Continue to subdivision so until there is only one student in the group. When there is only one student in all groups, the sorting is done.

In the following implementation, use recursion:

/*By Vamei*/
/*select pivot, put elements (<= pivot) to the left*/
void quick_sort(int a[], int ac)
{
  /*use swap*/

  /* pivot is a position, 
    all the elements before pivot is smaller or equal to pvalue */
  int pivot;
  /* the position of the element to be tested against pivot */
  int sample;

  /* select a pvalue. 
    Median is supposed to be a good choice, but that will itself take time.
    here, the pvalue is selected in a very simple wayi: a[ac/2] */
  /* store pvalue at a[0] */
  swap(a+0, a+ac/2);
  pivot = 1; 

  /* test each element */
  for (sample=1; sample<ac; sample++) {
    if (a[sample] < a[0]) {
      swap(a+pivot, a+sample);
      pivot++;
    }
  }
  /* swap an element (which <= pvalue) with a[0] */
  swap(a+0,a+pivot-1);

  /* base case, if only two elements are in the array,
    the above pass has already sorted the array */
  if (ac<=2) return;
  else {
    /* recursion */
    quick_sort(a, pivot);
    quick_sort(a+pivot, ac-pivot);
  }
}

The ideal pivot is to take the median in grouped elements. However, the algorithm for finding medians needs to be implemented separately. You can also randomly select elements as pivot, and random selection also needs to be implemented separately. For simplicity, I use the middle position element as the pivot every time.

Heap Sort

Heaps are common data structures. It is a priority queue. The most common implementation of the heap is a Complete Binary Tree with limited operations. This Complete Binary Tree maintains the characteristics of the heap, that is, the parent node is larger than the child node. Therefore, the root node of the heap is the smallest of all heap elements. The heap definition includes insert nodes and delete root node operations, both of which maintain the characteristics of the heap.

We can form an unordered array into a heap, and then continuously take out the root node, and finally form an ordered array.

For a more detailed description of the pile, please read the bibliography.

Below is the data structure of the heap, as well as the operations of inserting nodes and deleting root nodes. You can easily build the heap and take out the root nodes to form an ordered array.

/* By Vamei 
  Use an big array to implement heap
  DECLARE: int heap[MAXSIZE] in calling function
  heap[0] : total nodes in the heap
  for a node i, its children are i*2 and i*2+1 (if exists)
  its parent is i/2 */

void insert(int new, int heap[]) 
{
  int childIdx, parentIdx;
  heap[0] = heap[0] + 1;
  heap[heap[0]] = new;
  
  /* recover heap property */
  percolate_up(heap);
}

static void percolate_up(int heap[]) {
  int lightIdx, parentIdx;
  lightIdx = heap[0];
  parentIdx = lightIdx/2;
  /* lightIdx is root? && swap? */
  while((parentIdx > 0) && (heap[lightIdx] < heap[parentIdx])) {
    /* swap */
    swap(heap + lightIdx, heap + parentIdx); 
    lightIdx = parentIdx;
    parentIdx = lightIdx/2;
  }
}


int delete_min(int heap[]) 
{
  int min;
  if (heap[0] < 1) {
    /* delete element from an empty heap */
    printf("Error: delete_min from an empty heap.");
    exit(1);
  }

  /* delete root 
    move the last leaf to the root */
  min = heap[1];
  swap(heap + 1, heap + heap[0]);
  heap[0] -= 1;

  /* recover heap property */
  percolate_down(heap);
 
  return min;
}

static void percolate_down(int heap[]) {
  int heavyIdx;
  int childIdx1, childIdx2, minIdx;
  int sign; /* state variable, 1: swap; 0: no swap */

  heavyIdx = 1;
  do {
    sign   = 0;
    childIdx1 = heavyIdx*2;
    childIdx2 = childIdx1 + 1;
    if (childIdx1 > heap[0]) {
      /* both children are null */
      break; 
    }
    else if (childIdx2 > heap[0]) {
      /* right children is null */
      minIdx = childIdx1;
    }
    else {
      minIdx = (heap[childIdx1] < heap[childIdx2]) ?
             childIdx1 : childIdx2;
    }

    if (heap[heavyIdx] > heap[minIdx]) {
      /* swap with child */
      swap(heap + heavyIdx, heap + minIdx);
      heavyIdx = minIdx;
      sign = 1;
    }
  } while(sign == 1);
}

Summarize

In addition to the above algorithms, there are also related to Bucket Sorting and Radix Sorting. I will add it to this article after I have implemented the relevant algorithms in the future. The time complexity analysis of the relevant algorithms can be found in the bibliography. I also did a rough analysis myself. If the blog can support the display of mathematical formulas, I will post my own analysis process to use it as a guide.

I wrote the above code myself and only conducted very simple tests. If there are any mistakes, thank you for your correction first.

Finally, the exchange function used above is:

/* By Vamei */
/* exchange the values pointed by pa and pb*/
void swap(int *pa, int *pb)
{
  int tmp;
  tmp = *pa;
  *pa = *pb;
  *pb = tmp;
}

Comparison and selection of several sorting algorithms
1. Factors to consider when choosing a sorting method:
(1) The number of elements to be sorted n;
(2) The amount of information of the element itself;
(3) The structure and distribution of keywords;
(4) Conditions of language tools, size of auxiliary space, etc.

2. Some suggestions:
(1) If n is smaller (n <= 50), you can use direct insertion sort or direct selection sort. Since the record movement operation required for direct insertion sorting is more than the direct selection sorting, when the information of the record itself is large, it is better to use direct selection sorting.
(2) If the initial state of the file is basically ordered by keywords, it is advisable to use direct insertion or bubble sorting.
(3) If n is large, the sorting method with a time complexity of O(nlog2n) should be adopted: quick sort, heap sort or merge sort. Quick sort is currently considered the best method in the internal sorting method based on comparison.
(4) In the comparison-based sorting method, after comparing the sizes of two keywords, only two possible transfers occur. Therefore, a binary tree can be used to describe the comparison judgment process. It can be proved that when the n keywords of the file are randomly distributed, any sorting algorithm with the help of "compare" requires at least O(nlog2n) time.
(5) When the amount of information is large, in order to avoid spending a lot of time to move the records, a linked list can be used as a storage structure.