Before you begin
Assume that the length of the longest string is L, taking L as the length of the input, and then assuming that all strings are "completed" to this length, this complement is only logical. We can imagine that there is a "empty character" that is smaller than any other character, and use this character to fill all strings with insufficient length. For example: the longest string has a length of 9, and there is a string A with a length of 6. So when comparing the 7th character, we let A[7] be "empty character".
If it seems not easy to include all characters, we first define a character set, and all characters in the string to be sorted are included in this character set.
//Character Setprivate string _myCharSet = "0123456789qwertyuiopasdfghjklzxcvbnm";
Let’s take another method to generate random strings (C# implementation):
private Random _random = new Random(); string[] GetRandStrings(int size, int minLength, int maxLength) { string[] strs = new string[size]; int len = 0; StringBuilder sb = new StringBuilder(maxLength); for (int i = 0; i < ; i++) { //A random length is determined first len = _random.Next(minLength, maxLength); for (int j = 0; j < len; j++) { //Select a character randomly (_myCharSet[_random.Next(_myCharSet.Length)]); } strs[i] = (); (); } return strs; }
Here we determine the range of the bucket according to the integer representation of the characters, and then prepare a bucket for the "empty character". To represent the special case of "empty character", default(char), i.e. '\0' is used to represent it, because when the (int) method is called, '\0' will be returned if the index is exceeded.
Primary version (C#)
void StringRadixSort(string[] strArray) { if (strArray == null || == 0 || (null)) { return; } //Get the maximum length of the string int maxLength = 0; foreach (string s in strArray) { if ( > maxLength) { maxLength = ; } } //Determine the integer range of characters int rangeStart = _myCharSet[0]; int rangeEnd = _myCharSet[0]; foreach (char ch in _myCharSet) { if (ch < rangeStart) rangeStart = ch; if (ch >= rangeEnd) rangeEnd = ch + 1; } //Also allocate a bucket for "empty characters" with an index of 0 int bucketCount = rangeEnd - rangeStart + 1; LinkedList<string>[] buckets = new LinkedList<string>[bucketCount]; //Initialize all buckets for (int i = 0; i < ; i++) { buckets[i] = new LinkedList<string>(); } //Sorting starting from the last character int currentIndex = maxLength - 1; while (currentIndex >= 0) { foreach (string theString in strArray) { //If the index is exceeded, return the '\0' character (default(char)) char ch = (currentIndex); if (ch == default(char)) { // Handling of "empty characters" buckets[0].AddLast(theString); } else { // Map characters to bucket int index = ch - rangeStart + 1; buckets[index].AddLast(theString); } } //Retrieve the strings from the bucket in turn to complete the ordering int i = 0; foreach (LinkedList<string> bucket in buckets) { while ( > 0) { strArray[i++] = (); (); } } currentIndex--; } }
A little "improved"
The code used to determine the integer range of characters is a bit painful, and according to the character set, not all characters corresponding to integers in the interval may appear, so there will be a situation where we allocate buckets to certain characters that will not appear at all, which is a waste. We can use a dictionary (hash) to record the mapping between a character and its bucket. So the following code is found.
private Dictionary<char, int> _charOrderDict = new Dictionary<char, int>(_myCharSet.Length); void BuildCharOrderDict() { char[] sortedCharSet = _myCharSet.ToArray(); //Sorting with default comparator (sortedCharSet); // Create a map for "empty characters" separately _charOrderDict.Add(default(char), 0); for (int i = 0; i < ; i++) { // What is saved is the index of the characters and their corresponding bucket _charOrderDict.Add(sortedCharSet[i], i + 1); } }
You can also define the size relationship between characters without using the default character sorting as a mapping. Here is the adjusted code:
void StringRadixSort(string[] strArray) { if (strArray == null || == 0 || (null)) { return; } //Get the maximum length of the string int maxLength = 0; foreach (string s in strArray) { if ( > maxLength) { maxLength = ; } } // Assign a bucket for each character (including the empty character '\0') //The index of "empty character" should be 0 int bucketCount = _myCharSet.Length + 1; LinkedList<string>[] buckets = new LinkedList<string>[bucketCount]; //Initialize all buckets for (int i = 0; i < ; i++) { buckets[i] = new LinkedList<string>(); } //Sorting starting from the last character int currentIndex = maxLength - 1; while (currentIndex >= 0) { foreach (string theString in strArray) { //If the index is exceeded, return the '\0' character (default(char)) char ch = (currentIndex); //Query characters according to character order int index = _charOrderDict[ch]; buckets[index].AddLast(theString); } //Retrieve the strings from the bucket in turn to complete the ordering int i = 0; foreach (LinkedList<string> bucket in buckets) { while ( > 0) { strArray[i++] = (); (); } } currentIndex--; } }
Now, it works! If you use quick sorting to do it, its time complexity is O(n∗logn)O(n∗logn). On the surface, the cardinality sort is better, but strictly speaking, the time complexity of cardinality sort should be O(k∗n)O(k∗n), where k and the string length are positively correlated. At this time, the comparison between the two algorithms can be approximately obtained by comparing the comparison results of k and lognlogn. If the length of the string is very long, that is, k is very large, and the input scale n is not large, there will be k>lognlogn. At this time, quick sorting is more advantageous. Conversely, the cardinality sort may be better.
at last...
What's good is that when I expanded the character set and added all the characters on the keyboard, I found that the results of cardinal sorting are different from those of the (string[] method. After careful observation of the resource manager's sorting file names, I found that the rules for string sorting are much more complicated, not simply comparing characters. After querying relevant information, I found that the sorting of strings even takes into account the influence of regional culture. Even if they are all Latin letters, the sorting rules in different regions may be different. Therefore, the string sorting algorithm implemented using cardinal sorting does not seem to be very practical <T-T>.