SoFunction
Updated on 2025-04-07

Classic usage of Perl

Open the file with Open() function

Common ways to open files are:

Copy the codeThe code is as follows:

open(FH, "< $filename")
    or die "Couldn't open $filename for reading: $!";

The open() function usually has two parameters. The first is a file handle, which points to the opened file, and the second parameter is a mixture of file name and mode (the file opening mode). If the file is successfully opened, the open() function returns true, otherwise it is false. We use "or" to test this condition.
The pattern in the above code is represented by a character less than (<). If the file does not exist, open() will return false. At this point, you can read the file handle, but you cannot write it.
Greater than characters means writing. If the file does not exist, it will be created. If the file exists, the file is cleared and previous data will be lost. You can write to file handles, but not read them.

Copy the codeThe code is as follows:

# If the file does not exist, create it
open(FH, "> $filename")
    or die "Couldn't open $filename for writing: $!";

If the file does not exist, the add pattern (indicated by two larger than symbols) can be used to create a new file. If the file exists, the pattern will not clear the original data.
Like "<" or "read" mode, you can only write to file handles. (So ​​all the writes are added to the end of the file). An attempt to perform a read operation will result in a running error.

Copy the codeThe code is as follows:

open(FH, ">> $filename")
    or die "Couldn't open $filename for appending: $!";

Through the "+<" mode, you can both read and write files. You can move inside the file through the tell() function and locate it through the seek() function. If the file does not exist, it will be created. If the file already exists, the original data will not be cleared.
If you plan to clear the original file content, or call the truncate() function yourself, or use the "+>" mode.

Copy the codeThe code is as follows:

open(FH, "+> $filename")
    or die "Couldn't open $filename for reading and writing: $!";

Pay attention to the difference between "+<" and "+>", both of which can be readable and writable. The former is non-destructive writing, while the latter is destructive writing.
mistake
How does an error occur? Errors will occur in many places: such as the directory does not exist, the file cannot be written, your program loses the file handle, etc.
You should check the results of the system call (such as open() and sysopen()) to see if the call is successful.
To help users check errors, "or die()" is usually used, and you should remember these usages. First, the system call failure ("open") should be written out. Secondly, the file name information should be written so that it can be easier to locate when correcting errors. Third, write out the way to open the file ("for writing," "for appending"). Fourth, output the operating system error information (including in $!). In this way, once the file cannot be opened, users who use your program will generally know why it cannot be opened. Sometimes we combine the first and the third:
or die "unable to append to $filename: $!";

If you write the full name of the file in both open() and error messages, you will risk changing the error messages, making the error messages inappropriate or incorrect.

Copy the codeThe code is as follows:

# Fake error information will appear below
open(FH, "</var/run/")
    or die "Can't open /var/log/ for writing : $!";

Use Sysopen() for more control
In order to better control the way files are opened, you can use the sysopen() function:
 
Copy the codeThe code is as follows:

use Fcntl;
  sysopen(FH, $filename, O_RDWR|O_CREAT, 0666)
    or die "Can't open $filename for reading/writing/creating : $!";

The function sysopen() has four parameters. The first is a file handle parameter similar to the open() function. The second is a file name without mode information. The third is a mode parameter. It is composed of a constant composed of logical OR operations provided by the Fcntl module. The fourth parameter (optional) is an octal attribute value (0666 represents the data file, and 0777 represents the program). If the file can be opened, sysopen() returns true, and if the opening fails, return false.
Unlike the open() function, sysopen() does not provide a shorthand for pattern description, but combines some constants. Moreover, each pattern constant has a unique meaning. They can be combined only through logical OR operations. You can set a combination of multiple behaviors.
O_RDONLYRead-only
  O_WRONLY     Write-only
  O_RDWR Reading and writing
  O_APPEND Writes go to the end of the file
  O_TRUNC Truncate the file if it existed
  O_CREAT Create the file if it didn't exist
  O_EXCLError if the file already existed (used with O_CREAT)

When you need to be careful, use the sysopen() function. For example, if you plan to add content to a file, if the file does not exist and does not create a new file, you can write it like this:
sysopen(LOG, "/var/log/", O_APPEND, 0666)
or die "Can't open /var/log/ for appending: $!";

Read a single record
There is an easy way to read filehandles: use the <FH> operator. Under scalar content, it returns the next record in the file, or returns an undefined error message. We can use it to read a line into a variable:
$line = <FH>;
  die "Unexpected end-of-file" unless defined $line;
In a loop statement, we can write this:

Copy the codeThe code is as follows:

  while (defined ($record = <FH>)) {     # long-winded
    # $record is set to each record in the file, one at a time
  }

Because this kind of work is to be done a lot, it is usually simplified.
Put the record in $_ instead of $record:
Copy the codeThe code is as follows:

while (<FH>) {
# $_ Each time a record in the file
  }
In Perl 5.004_04, we can do this:
     while ($record = <FH>) {
# $record is a record in the file at a time
  }

defined() will be added automatically, and in versions prior to Perl 5.004_04, the command gives a warning. To understand the version of Perl used, you can type in the command line:
 perl -v
Once we read a record, we usually intend to remove the record separator (the default is the newline character):
chomp($record);
Perl 4.0 version only has the chop() operation, which removes the last character of the string, regardless of what the character is. chomp() is not so destructive, if there is a line separator, it only removes the line separator. If you plan to remove the line separator, use chomp() instead of chop().
Read multiple records
If you call <FH>, return the remaining records in the file. If you are at the end of the file, return an empty table:
Copy the codeThe code is as follows:

@records = <FH>;
  if (@records) {
    print "There were ", scalar(@records), " records read. ";
  } 

In the following step, perform the assignment and testing:
Copy the codeThe code is as follows:

if (@records = <FH>) {
    print "There were ", scalar(@records), " records read. ";
  }

chomp() can also be used for array operations:
  @records = <FH>;
  chomp(@records);
For any expression, you can perform chomp operation, so you can write it like this in the following step:
 chomp(@records = <FH>);

What is a record?
The default definition of a record is: "row".
The definition of a record is controlled by the $/ variable, which stores the separator of the entered record. Because the newline character (by definition!) is used to separate lines, its default value is the string "".
For example, you can replace " " with any symbol you want to replace.
  $/ = ";";
$record = <FH>;  # Read the next record separated by a semicolon
$/ can take two other interesting values: empty string ("") and undef.
Read into paragraph
The writing method of $/="" is used to instruct Perl to read into a paragraph. The paragraph is a text block composed of two or more line breaks. This is different from setting to " ", which only reads in a block of text consisting of two lines. In this case, a problem arises: if there are continuous blank lines, such as "text", you can interpret it as one paragraph ("text") or two paragraphs ("text", followed by two newlines, and an empty paragraph, followed by two blank lines.)
When reading the text, the second explanation is of little use. If the paragraph you are reading is like this, you don't have to filter out the "empty" paragraph.

Copy the codeThe code is as follows:

$/ = " ";
  while (<FH>) {
    chomp;
next unless length;     # Skip the empty segment
    # ...
  }

You can set $/ to undef, which is used to read into paragraphs followed by two or more newlines: undef $/;
while (<FH>) {
    chomp;
    # ...
  }

Read the entire file
Other interesting values ​​of $/ are undef. If set to this value, Perl will be told, and the read command will return the rest of the file as a string:

Copy the codeThe code is as follows:

undef $/;
  $file = <FH>;

Because changing the value of $/ will affect each subsequent read operation, not only the next read operation. Usually, you need to restrict this operation to local. Using the following example, you can read the contents of the file handle into a string:
Copy the codeThe code is as follows:

{
    local $/ = undef;
    $file = <FH>;
  }

Remember: Perl variables can be read into very long strings. Even though your file size cannot exceed the limit of your virtual memory capacity, you can still read as much data as possible.
Operate files with regular expressions
Once you have a variable that contains the entire string, you can use regular expressions to operate on the entire file instead of a block in the file. There are two useful regular expression tags /s and /m. Generally, Perl's regular expression processes rows, you can write it like this:
Copy the codeThe code is as follows:

undef $/;
  $line = <FH>;
  if ($line =~ /(b.*grass)$/) {
    print "found ";
  }

If we fill in our file, please enter the following content:
  browngrass
  bluegrass
Then the output is:
found bluegrass
It does not find "browgrass", because $ is only looking for its match at the end of the string, (or a line before the end of the string). If you use "^" and "$" to match in a string containing many lines, we can use the /m ("multiline") option:
if ($line =~ /(b.*grass)$/m) {}
Now the program will output the following information:
  found browngrass
Similarly, periods can match all characters except line breaks:
Copy the codeThe code is as follows:

while (<FH>) {
    if (/19(.*)$/) {
      if ( < 20) {
      $year = 2000+;
      } else {
      $year = 1900+;
      }
    }
  }

If we read "1981" from the file, $_ will contain "1981". Periods in regular expressions match "8" and "1", but not " . This is what you need to do here, because the newline is not a component of the date.
For a string with many rows, we might want to extract the large blocks in it that may span the line separator. In this case, we can use the /s option and use a period to match all characters except the newline.
Copy the codeThe code is as follows:

if (ms) {
    print "Found bold text: ";
  }

Here, I used {} to represent the start and end of the regular expression without slashes, so I can tell Perl that I am matching, the start character is "m" and the end character is "s". You can use the /s and /m options:
Copy the codeThe code is as follows:

if (m{^<FONT COLOR="red">(.*?)</FONT>}sm) {
    # ...
  }

Summarize
There are two ways to open a file: the open() function is characterized by its fast and simple function, while the sysopen() function is powerful and complex. The <FH> operator allows you to read a record, and the $/ variable allows you to control what the record is. If you plan to read many lines into a string, don't use the forget /s and /m regular expression tags.