The idea of a binary search algorithm is very simple. The description in "Programming Zhuji": In an array containing t, binary search solves the problem by synthesising the range. At the beginning, the range is the entire array. By comparing the element in the middle of the range with t and discarding half of the range, the range is narrowed. This process continues until it is discovered at t, or that range that can contain t has become empty.
Donald Knuth pointed out in his book "Sorting and Searching" that although the first binary search algorithm was published as early as 1946, the first binary search algorithm without bugs was published 12 years later. One common bug is the calculation of the intermediate value subscript. If written as (low+high)/2, it may overflow when low+high is very large, resulting in an error in array access. The improved method is to write the calculation method into the following form: low+ ( (high-low) >>1). The modified algorithm code is given below:
int binarysearch1(int a[],int n,int x) { int l,u,m; l=0;u=n; while(l<u) { m=l+((u-l)>>1); if(x<a[m]) u=m; else if(x==a[m]) return m; else l=m+1; } return -1; }
Note here that since asymmetric intervals are used, the adjustment of the subscript looks a bit irregular. One is u=m, the other is l=m+1. In fact, it is easy to understand. The form of the interval before adjustment should be the form of [ ). If the intermediate value is smaller than the search value, then the adjustment is the left boundary, that is, the closed part, so add 1; otherwise, the adjustment is the right boundary, and the open part, so there is no need to reduce 1. After adjustment, it is still in the form of [ ). Of course, it can also be written in symmetrical form. The code is as follows:
int binarysearch1(int a[],int n,int x) { int l,u,m; l=0;u=n-1; while(l<=u) { m=l+((u-l)>>1); if(x<a[m]) u=m-1; else if(x==a[m]) return m; else l=m+1; } return -1; }
This also looks more regular, but there is a shortcoming. If you want to change the program to a "pure pointer", you will have trouble. The code to modify it into a pure pointer is as follows:
int binarysearch2(int *a,int n,int x) { int *l,*u,*m; l=a;u=a+n-1; while(l<=u) { m=l+((u-l)>>1); if(x<*m) u=m-1; else if(x==*m) return m-a; else l=m+1; } return -1; }
When n is 0, an invalid address will be referenced. However, using asymmetric intervals will not have this problem. The code is as follows:
int binarysearch2(int *a,int n,int x) { int *l,*u,*m; l=a;u=a+n; while(l<u) { m=l+((u-l)>>1); if(x<*m) u=m; else if(x==*m) return m-a; else l=m+1; } return -1; }
The binary search given above is implemented by iterative method, and of course it can also be implemented recursively. The code is as follows:
int binarysearch3(int a[],int l,int u,int x) int m=l+((u-l)>>1); if(l<=u) { if(x<a[m]) return binarysearch3(a,l,m-1,x); else if(x==a[m]) return m; else return binarysearch3(a,m+1,u,x); } return -1;
For the above binary algorithms, if the array elements are repeated, it returns an element of the repeated element. If you want to return the location where the first occurrence of the element being found, you need to modify the code. Here is a solution:
int binarysearch4(int a[],int n,int x) { int l,u,m; int flag=-1; l=0;u=n; while(l<u) { m=l+((u-l)>>1); if(x<a[m]) u=m; else if(x==a[m]) flag=u=m; else l=m+1; } return flag; }
The following is the solution in "Programming Zhuji":
int binarysearch4(int a[],int n,int x) { int l,u,m; l=-1;u=n; while(l+1<u) { m=l+((u-l)>>1); if(a[m]<x) l=m; else u=m; } return (u>=n||a[u]!=x)?-1:u; }
At this point, the code discussion of the binary algorithm has ended. Let’s discuss the program testing issues below. "The Beauty of Code" has a chapter that specifically introduces the testing of binary search algorithms, which is very beautiful. Here we are showing off our skills and give a few test cases. For binarysearch1. The test procedure is as follows:
#include <iostream> #include <cassert> #include <algorithm> #include <ctime> using namespace std; int calmid(int l,int u) { return l+((u-l)>>1); } int binarysearch1(int a[],int n,int x); #define bs1 binarysearch1 int main() { long start,end; start=clock(); int a[9]={-2147483648,-13,-10,-5,-3,0,1,400,2147483647}; // Test of median subscript calculation assert(calmid(0,1)==0); assert(calmid(0,2)==1); assert(calmid(1000000,2000000)==1500000); assert(calmid(2147483646,2147483647)==2147483646); assert(calmid(2147483645,2147483647)==2147483646); //Smoke test assert(bs1(a,9,0)==5); assert(bs1(a,9,1)==6); assert(bs1(a,9,2)==-1); //Border test assert(bs1(a,0,1)==-1); //0 elements assert(bs1(a,1,-2147483648)==0); //1 element Success assert(bs1(a,1,-2147483647)==-1); //1 element failed assert(bs1(a,9,-2147483648)==0); //First element assert(bs1(a,9,-3)==4); //Intermediate element assert(bs1(a,9,2147483647)==8); //The end element //Automated testing int b[10000]; int i,j; for(i=0;i<10000;i++) { b[i]=i*10; for(j=0;j<=i;j++) { assert(bs1(b,i+1,j*10)==j); assert(bs1(b,i+1,j*10-5)==-1); } } //Automated testing introduces random numbers srand(time(0)); for(i=0;i<10000;i++) { b[i]=rand()%1000000; sort(&b[0],&b[i]); for(j=0;j<=i;j++) { int x=rand(); int k=bs1(b,i+1,x); if(k!=-1) assert(b[k]==x); } } end=clock(); cout<<(end-start)/1000.0<<'s'<<endl; return 0; }
Note that the elements of the array have positive numbers, negative numbers, zeros, maximum values, and minimum values. Usually, negative numbers are forgotten, and maximum and minimum values are introduced, mainly for boundary testing.
First, the calculation of median subscript was tested. Another small function was written to test it separately. Considering that the memory may not be able to put such a large array, it is just a mock test and does not really apply for such a large space, but the test for median subscripts is enough.
Second, smoke test. That is, do some basic tests. After the test is passed, the boundary test is performed.
Third, boundary testing. There are three types here, one is for the number of array elements, which are 0 and 1. The second is to target the element position, namely the first element, the middle element, and the end element. The third is to test the element value, the maximum value, the minimum value, and the 0.
Fourth, automated testing. Here an array of tests is automatically generated, and a successful search test is performed for each element.
Fifth, automated testing, except that the elements of the array are random values.
Fifth, performance testing. The relevant code is not listed here. When all the above tests are passed, you can modify the search algorithm and add the code for performance tests. In fact, you can simply add a comparison counter. The return value can be changed from the original search result to the comparison counter value. The code is relatively simple, so I won't list it.
Note: A bug that is easily overlooked by binary search
I believe everyone is familiar with the binary search algorithm. The algorithm finds the specified element from a sorted array. If it is found, it returns the index of the element in the array, otherwise it returns -1. The following is a way to understand.
//a is an array in order, n is the size of the array, and x is the specified elementint binarySearch(int a[], int n, int x) { int left = 0, right = n-1, middle = 0; int tmp = 0; while(left <= right) { middle = (left + right)/2; tmp = a[middle]; if(x < tmp) right = middle - 1; else if(x > tmp) left = middle + 1; else return middle; } return -1; }
At first glance, there is no error, but unfortunately, there is a bug in the program. When the array is extremely large, (left+right) may be a negative number, the array subscript overflows and the program crashes.
Solution: Change middle=(left+right)/2 to middle=left+(right-left)/2. That is, subtraction is used instead of addition to eliminate overflow.
Reference from "The Beauty of Code"