1) Use of sort function
sort LIST
sort BLOCK LIST
sort SUBNAME LIST
The usage of sort is as above 3 forms. It sorts LIST and returns the sorted list. If SUBNAME or BLOCK is ignored, sort is performed in the standard string comparison order (for example, ASCII order). If SUBNAME is specified, it is actually the name of a subfunction that compares 2 list elements and returns an integer less than, equal to, or greater than 0, depending on the order in which the elements are sorted (ascending, identifiable, or descending). A BLOCK can also be provided as an anonymous subfunction instead of SUBNAME, and the effect is the same.
The two elements that are compared will be temporarily assigned to the variables $a and $b. They are passed as references, so don't modify $a or $b. If a child function is used, it cannot be a recursive function.
Two) Usage examples
1. sort in numerical order
@array = (8, 2, 32, 1, 4, 16);
print join(' ', sort {$a <=> $b} @array), "\n";
The print result is:
The same goes for:
print join(' ', sort numerically @array), "\n";
This is easy to understand. It is just sorted in the order of natural numbers, so I won’t go into details.
2.1 sort in ASCII order (non-dictional order)
@languages = qw(fortran lisp c c++ Perl python java);
print join(' ', sort @languages), "\n";
Print result:
This is equivalent to:
Sort by ASCII order, nothing to say.
Note that if you sort the numbers in ASCII order, the results may be different from what you think:
print join(' ', sort 1 .. 11), "\n";
1 10 11 2 3 4 5 6 7 8 9
2.2 sort in dictionary order
use locale;
@array = qw(ASCII ascap at_large atlarge A ARP arp);
@sorted = sort { ($da = lc $a) =~ s/[/W_]+//g;
($db = lc $b) =~ s/[/W_]+//g;
$da cmp $db;
} @array;
print "@sorted\n";
The print result is:
use locale is optional - it makes code compatibility better if the original data contains international characters. Use locale affects the operation properties of cmp, lt, le, ge, gt and some other functions - see the man page of perllocale for more details.
Note that the order of atlarge and at_large is reversed when output, although their sort order is the same (the subfunction in the middle of sort deletes the underscore in the middle of at_large). This happens because the example runs on perl 5.005_02. Before perl version 5.6, the sort function will not protect the order of keys with the same values. Perl version 5.6 and later will protect this order.
Note, whether it is map, grep or sort, you must protect the value of this temporary variable $_ (sort is $a and $b) and do not modify it
In this code, before replacing $a or $b s/[/W_]+//g, reassign them to $da and $db so that the replacement operation will not modify the original element.
3. sort in descending order
Descending sort is relatively simple, just change the operands before and after cmp or <=> to the position.
Or change the return value of the intermediate block or subfunction:
Or use the reverse function (this is a bit inefficient, but perhaps easy to read):
4. Use multiple keys to sort
To sort multiple keys, put all the comparison operations connected by or in one subfunction. Place the main comparison operation in front and the secondary one behind.
# An array of references to anonymous hashes
@employees = (
{ FIRST => 'Bill', LAST => 'Gates',
SALARY => 600000, AGE => 45 },
{ FIRST => 'George', LAST => 'Tester'
SALARY => 55000, AGE => 29 },
{ FIRST => 'Steve', LAST => 'Ballmer',
SALARY => 600000, AGE => 41 }
{ FIRST => 'Sally', LAST => 'Developer',
SALARY => 55000, AGE => 29 },
{ FIRST => 'Joe', LAST => 'Tester',
SALARY => 55000, AGE => 29 },
);
sub seniority {
$b->{SALARY} <=> $a->{SALARY}
or $b->{AGE} <=> $a->{AGE}
or $a->{LAST} cmp $b->{LAST}
or $a->{FIRST} cmp $b->{FIRST}
}
@ranked = sort seniority @employees;
foreach $emp (@ranked) {
print "$emp->{SALARY}/t$emp->{AGE}/t$emp->{FIRST}
$emp->{LAST}\n";
}
The print result is:
600000 41 Steve Ballmer
55000 29 Sally Developer
55000 29 George Tester
55000 29 Joe Tester
The above code looks complicated, but it is actually easy to understand. The elements of the @employees array are anonymous hash. Anonymous hash is actually a reference, and you can use the -> operator to access its value. For example, $employees[0]->{SALARY} can access the value corresponding to SALARY in the first anonymous hash. Therefore, the above comparison is very clear. First compare the value of SALARY, then compare the value of AGE, then compare the value of LAST, and finally compare the value of FIRST. Note that the first 2 items are in descending order, and the last 2 items are in ascending order, so don't mess it up.
5. Sort to release a new array
@x = qw(matt elroy jane sally);
@rank[sort { $x[$a] cmp $x[$b] } 0 .. $#x] = 0 .. $#x;
print "@rank\n";
The print result is:
Is it a bit confused here? Just look carefully and you will be clear. 0 .. $#x is a list, and its value is the subscript of the @x array, here is 0 1 2 3. $x[$a] cmp $x[$b] is to compare the elements in @x in ASCII order. Therefore, the sort result returns a list of sorting the subscripts of @x. The ordering criteria are the ASCII order of the @x elements corresponding to the subscript.
Don't understand what sort returns? Let's first print out the ASCII order of the elements in @x:
@x = qw(matt elroy jane sally);
print join ' ',sort { $a cmp $b } @x;
The print result is:
The corresponding subscripts in @x are 1 2 0 3, so the result returned by the above sort is the list of 1 2 0 3. @rank[1 2 0 3] = 0 .. $#x is just a simple array assignment operation
So the result of @rank is (2 0 1 3).
6. Sort the hash by keys
%hash = (Donald => Knuth, Alan => Turing, John => Neumann);
@sorted = map { { ($_ => $hash{$_}) } } sort keys %hash;
foreach $hashref (@sorted) {
($key, $value) = each %$hashref;
print "$key => $value\n";
}
The print result is:
Donald => Knuth
John => Neumann
The above code is not difficult to understand. sort keys %hash returns a list in the ASCII order of %hash's keys, and then use map to calculate. Note that map uses double {{}}
The {} inside is an anonymous hash, which means that the result of the map is an anonymous hash list. Do you understand?
Therefore, the elements in the @sorted array are each anonymous hash. By backreferenced by %$hashref, they can access their key/value values.
7. Sort the hash by values
%hash = ( Elliot => Babbage,
Charles => Babbage,
Grace => Hopper,
Herman => Hollerith
);
@sorted = map { { ($_ => $hash{$_}) } }
sort { $hash{$a} cmp $hash{$b}
or $a cmp $b
} keys %hash;
foreach $hashref (@sorted) {
($key, $value) = each %$hashref;
print "$key => $value\n";
}
The print result is:
Elliot => Babbage
Herman => Hollerith
Grace => Hopper
Unlike hash keys, we cannot guarantee the uniqueness of hash values. If you sort hash only based on values, then when you add or delete other values, the sort order of 2 elements with the same value may change. In order to obtain a stable result, the value should be master sorted and the key should be slave sorted.
Here { $hash{$a} cmp $hash{$b} or $a cmp $b } So sorted twice by value first and then by key. The result returned by sort is the sorted keys list, and then this list is handed over to map for calculation, returning an anonymous hash list. The access method is the same as the one mentioned above, so I won't describe it in detail.
8. Sort the words in the file and remove duplicates
perl -0777ane '$, = "\n"; @uniq{@F} = (); print sort keys %uniq' file
Everyone try this usage, I don't understand it very clearly
@uniq{@F} = () Use hash slice to create a hash, and its keys are the only word in the file;
This usage is semantically equivalent to $uniq{ $F[0], $F[1], ... $F[$#F] } = ()
The description of each option is as follows:
-a - Automatic segmentation mode, splitting rows into @F array
-e - Read and run scripts from the command line
-n - traversal of the file line by line: while (<>) { ... }
$, - The output domain splitter of the print function
file - File name
9. Efficient sorting: Orcish algorithm and Schwartzian conversion
For each key, the sort subfunction is usually called multiple times. If you care very much about sort run time, you can use Orcish algorithm or Schwartzian transformation so that each key is calculated only once.
Consider the following example, which sorts the file list based on the file modification date.
@sorted = sort { -M $a <=> -M $b } @filenames;
# Orcish algorithm-create keys in hash
@sorted = sort { ($modtimes{$a} ||= -M $a) <=>
($modtimes{$b} ||= -M $b)
} @filenames;
A very clever algorithm, isn't it? Because the file modification date is basically unchanged during the script operation, it is enough to save it after the -M operation.
Here is how to use Schwartzian conversion:
sort( { $a->[1] <=> $b->[1] }
map({ [$_, -M] } @filenames)
)
);
This code uses map and sort to be divided into several layers. Remember the method I mentioned before and look from behind. map({ [$_, -M] } @filenames) returns a list, the list element is an anonymous array, the first value of the anonymous array is the file name, and the second value is the date of the file modification.
sort( { $a->[1] <=> $b->[1] }... Then sort the anonymous array list generated above, which sorts according to the file modification date
The result returned by sort is an anonymous array after sorting.
The most peripheral map( { $_->[0] }...is simple. It extracts the file name from the anonymous array generated by sort. This file name is sorted according to the modification date, and each file is only run once -M.
This is the famous Schwartzian conversion, this usage is very popular among foreign perl users