SoFunction
Updated on 2025-04-07

Summary of usage and usage examples of Perl Sort function

1) Use of sort function

sort LIST
sort BLOCK LIST
sort SUBNAME LIST

The usage of sort is as above 3 forms. It sorts LIST and returns the sorted list. If SUBNAME or BLOCK is ignored, sort is performed in the standard string comparison order (for example, ASCII order). If SUBNAME is specified, it is actually the name of a subfunction that compares 2 list elements and returns an integer less than, equal to, or greater than 0, depending on the order in which the elements are sorted (ascending, identifiable, or descending). A BLOCK can also be provided as an anonymous subfunction instead of SUBNAME, and the effect is the same.

The two elements that are compared will be temporarily assigned to the variables $a and $b. They are passed as references, so don't modify $a or $b. If a child function is used, it cannot be a recursive function.

Two) Usage examples

1. sort in numerical order    

Copy the codeThe code is as follows:

@array = (8, 2, 32, 1, 4, 16);
print join(' ', sort {$a <=> $b} @array), "\n";

The print result is:
Copy the codeThe code is as follows:
1 2 4 8 16 32

The same goes for:

Copy the codeThe code is as follows:
sub numerically { $a <=> $b };
print join(' ', sort numerically @array), "\n";

This is easy to understand. It is just sorted in the order of natural numbers, so I won’t go into details.

2.1 sort in ASCII order (non-dictional order)

Copy the codeThe code is as follows:

@languages = qw(fortran lisp c c++ Perl python java);
print join(' ', sort @languages), "\n";

Print result:

Copy the codeThe code is as follows:
Perl c c++ fortran java lisp python

This is equivalent to:

Copy the codeThe code is as follows:
print join(' ', sort { $a cmp $b } @languages), "\n";

Sort by ASCII order, nothing to say.

Note that if you sort the numbers in ASCII order, the results may be different from what you think:

Copy the codeThe code is as follows:

print join(' ', sort 1 .. 11), "\n";
1 10 11 2 3 4 5 6 7 8 9

2.2 sort in dictionary order

Copy the codeThe code is as follows:

use locale;
@array = qw(ASCII ascap at_large atlarge A ARP arp);
@sorted = sort { ($da = lc $a) =~ s/[/W_]+//g;
          ($db = lc $b) =~ s/[/W_]+//g;
          $da cmp $db;
          } @array;
print "@sorted\n";

The print result is:

Copy the codeThe code is as follows:
A ARP arp ascap ASCII atlarge at_large

use locale is optional - it makes code compatibility better if the original data contains international characters. Use locale affects the operation properties of cmp, lt, le, ge, gt and some other functions - see the man page of perllocale for more details.

Note that the order of atlarge and at_large is reversed when output, although their sort order is the same (the subfunction in the middle of sort deletes the underscore in the middle of at_large). This happens because the example runs on perl 5.005_02. Before perl version 5.6, the sort function will not protect the order of keys with the same values. Perl version 5.6 and later will protect this order.

Note, whether it is map, grep or sort, you must protect the value of this temporary variable $_ (sort is $a and $b) and do not modify it
In this code, before replacing $a or $b s/[/W_]+//g, reassign them to $da and $db so that the replacement operation will not modify the original element.

3. sort in descending order

Descending sort is relatively simple, just change the operands before and after cmp or <=> to the position.

Copy the codeThe code is as follows:
sort { $b <=> $a } @array;

Or change the return value of the intermediate block or subfunction:
Copy the codeThe code is as follows:
sort { -($a <=> $b) } @array;

Or use the reverse function (this is a bit inefficient, but perhaps easy to read):
Copy the codeThe code is as follows:
reverse sort { $a <=> $b } @array;

4. Use multiple keys to sort

To sort multiple keys, put all the comparison operations connected by or in one subfunction. Place the main comparison operation in front and the secondary one behind.

Copy the codeThe code is as follows:

# An array of references to anonymous hashes
@employees = (
  { FIRST => 'Bill',   LAST => 'Gates',
    SALARY => 600000, AGE => 45 },
  { FIRST => 'George', LAST => 'Tester'
    SALARY => 55000, AGE => 29 },
  { FIRST => 'Steve', LAST => 'Ballmer',
    SALARY => 600000, AGE => 41 }
  { FIRST => 'Sally', LAST => 'Developer',
    SALARY => 55000, AGE => 29 },
  { FIRST => 'Joe',   LAST => 'Tester',
    SALARY => 55000, AGE => 29 },
);
sub seniority {
  $b->{SALARY}   <=> $a->{SALARY}
  or $b->{AGE}   <=> $a->{AGE}
  or $a->{LAST}   cmp $b->{LAST}
  or $a->{FIRST}   cmp $b->{FIRST}
}
@ranked = sort seniority @employees;
foreach $emp (@ranked) {
  print "$emp->{SALARY}/t$emp->{AGE}/t$emp->{FIRST}
    $emp->{LAST}\n";
}

The print result is:

Copy the codeThe code is as follows:
600000 45     Bill Gates
600000 41     Steve Ballmer
55000   29     Sally Developer
55000   29     George Tester
55000   29     Joe Tester

The above code looks complicated, but it is actually easy to understand. The elements of the @employees array are anonymous hash. Anonymous hash is actually a reference, and you can use the -> operator to access its value. For example, $employees[0]->{SALARY} can access the value corresponding to SALARY in the first anonymous hash. Therefore, the above comparison is very clear. First compare the value of SALARY, then compare the value of AGE, then compare the value of LAST, and finally compare the value of FIRST. Note that the first 2 items are in descending order, and the last 2 items are in ascending order, so don't mess it up.

5. Sort to release a new array

Copy the codeThe code is as follows:

@x = qw(matt elroy jane sally);
@rank[sort { $x[$a] cmp $x[$b] } 0 .. $#x] = 0 .. $#x;
print "@rank\n";

The print result is:

Copy the codeThe code is as follows:
2 0 1 3

Is it a bit confused here? Just look carefully and you will be clear. 0 .. $#x is a list, and its value is the subscript of the @x array, here is 0 1 2 3. $x[$a] cmp $x[$b] is to compare the elements in @x in ASCII order. Therefore, the sort result returns a list of sorting the subscripts of @x. The ordering criteria are the ASCII order of the @x elements corresponding to the subscript.
Don't understand what sort returns? Let's first print out the ASCII order of the elements in @x:

Copy the codeThe code is as follows:

@x = qw(matt elroy jane sally);
print join ' ',sort { $a cmp $b } @x;

The print result is:

Copy the codeThe code is as follows:
elroy jane matt sally

The corresponding subscripts in @x are 1 2 0 3, so the result returned by the above sort is the list of 1 2 0 3. @rank[1 2 0 3] = 0 .. $#x is just a simple array assignment operation
So the result of @rank is (2 0 1 3).

6. Sort the hash by keys

Copy the codeThe code is as follows:

%hash = (Donald => Knuth, Alan => Turing, John => Neumann);
@sorted = map { { ($_ => $hash{$_}) } } sort keys %hash;
foreach $hashref (@sorted) {
  ($key, $value) = each %$hashref;
  print "$key => $value\n";
}

The print result is:

Copy the codeThe code is as follows:
Alan => Turing
Donald => Knuth
John => Neumann

The above code is not difficult to understand. sort keys %hash returns a list in the ASCII order of %hash's keys, and then use map to calculate. Note that map uses double {{}}
The {} inside is an anonymous hash, which means that the result of the map is an anonymous hash list. Do you understand?
Therefore, the elements in the @sorted array are each anonymous hash. By backreferenced by %$hashref, they can access their key/value values.

7. Sort the hash by values

Copy the codeThe code is as follows:

%hash = ( Elliot => Babbage,
      Charles => Babbage,
      Grace => Hopper,
      Herman => Hollerith
    );
@sorted = map { { ($_ => $hash{$_}) } }
        sort { $hash{$a} cmp $hash{$b}
              or $a cmp $b
            } keys %hash;
foreach $hashref (@sorted) {
  ($key, $value) = each %$hashref;
  print "$key => $value\n";
}

The print result is:

Copy the codeThe code is as follows:
Charles => Babbage
Elliot => Babbage
Herman => Hollerith
Grace => Hopper

Unlike hash keys, we cannot guarantee the uniqueness of hash values. If you sort hash only based on values, then when you add or delete other values, the sort order of 2 elements with the same value may change. In order to obtain a stable result, the value should be master sorted and the key should be slave sorted.

Here { $hash{$a} cmp $hash{$b} or $a cmp $b } So sorted twice by value first and then by key. The result returned by sort is the sorted keys list, and then this list is handed over to map for calculation, returning an anonymous hash list. The access method is the same as the one mentioned above, so I won't describe it in detail.

8. Sort the words in the file and remove duplicates

Copy the codeThe code is as follows:

perl -0777ane '$, = "\n"; @uniq{@F} = (); print sort keys %uniq' file

Everyone try this usage, I don't understand it very clearly
@uniq{@F} = () Use hash slice to create a hash, and its keys are the only word in the file;
This usage is semantically equivalent to $uniq{ $F[0], $F[1], ... $F[$#F] } = ()

The description of each option is as follows:

Copy the codeThe code is as follows:
-0777   -   Read the entire file, not a single line
-a    -   Automatic segmentation mode, splitting rows into @F array
-e    -  Read and run scripts from the command line
-n    -   traversal of the file line by line: while (<>) { ... }
$,   -  The output domain splitter of the print function
file  -   File name

9. Efficient sorting: Orcish algorithm and Schwartzian conversion

For each key, the sort subfunction is usually called multiple times. If you care very much about sort run time, you can use Orcish algorithm or Schwartzian transformation so that each key is calculated only once.
Consider the following example, which sorts the file list based on the file modification date.

Copy the codeThe code is as follows:
# Forced algorithm--to access disk multiple times for each file
@sorted = sort { -M $a <=> -M $b } @filenames;

# Orcish algorithm-create keys in hash
@sorted = sort { ($modtimes{$a} ||= -M $a) <=>
          ($modtimes{$b} ||= -M $b)
          } @filenames;


A very clever algorithm, isn't it? Because the file modification date is basically unchanged during the script operation, it is enough to save it after the -M operation.
Here is how to use Schwartzian conversion:

Copy the codeThe code is as follows:
@sorted = map( { $_->[0] }
          sort( { $a->[1] <=> $b->[1] }
              map({ [$_, -M] } @filenames)
            )
        );

This code uses map and sort to be divided into several layers. Remember the method I mentioned before and look from behind. map({ [$_, -M] } @filenames) returns a list, the list element is an anonymous array, the first value of the anonymous array is the file name, and the second value is the date of the file modification.

sort( { $a->[1] <=> $b->[1] }... Then sort the anonymous array list generated above, which sorts according to the file modification date
The result returned by sort is an anonymous array after sorting.

The most peripheral map( { $_->[0] }...is simple. It extracts the file name from the anonymous array generated by sort. This file name is sorted according to the modification date, and each file is only run once -M.
This is the famous Schwartzian conversion, this usage is very popular among foreign perl users