1. Introduction to split command
split is a very useful command line tool in Unix and Unix-like systems such as Linux, which is used to split large files into smaller fragments. This is especially useful for handling large log files, data transfers, or storage constraints.
2. Help for using split command
2.1 split command help help information
In the command line terminal, we use --help to query the basic help information of the split command.
root@jeven01:~# split --help Usage: split [OPTION]... [FILE [PREFIX]] Output pieces of FILE to PREFIXaa, PREFIXab, ...; default size is 1000 lines, and default PREFIX is 'x'. With no FILE, or when FILE is -, read standard input. Mandatory arguments to long options are mandatory for short options too. -a, --suffix-length=N generate suffixes of length N (default 2) --additional-suffix=SUFFIX append an additional SUFFIX to file names -b, --bytes=SIZE put SIZE bytes per output file -C, --line-bytes=SIZE put at most SIZE bytes of records per output file -d use numeric suffixes starting at 0, not alphabetic --numeric-suffixes[=FROM] same as -d, but allow setting the start value -x use hex suffixes starting at 0, not alphabetic --hex-suffixes[=FROM] same as -x, but allow setting the start value -e, --elide-empty-files do not generate empty output files with '-n' --filter=COMMAND write to shell COMMAND; file name is $FILE -l, --lines=NUMBER put NUMBER lines/records per output file -n, --number=CHUNKS generate CHUNKS output files; see explanation below -t, --separator=SEP use SEP instead of newline as the record separator; '\0' (zero) specifies the NUL character -u, --unbuffered immediately copy input to output with '-n r/...' --verbose print a diagnostic just before each output file is opened --help display this help and exit --version output version information and exit The SIZE argument is an integer and optional unit (example: 10K is 10*1024). Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers of 1000). Binary prefixes can be used, too: KiB=K, MiB=M, and so on. CHUNKS may be: N split into N files based on size of input K/N output Kth of N to stdout l/N split into N files without splitting lines/records l/K/N output Kth of N to stdout without splitting lines/records r/N like 'l' but use round robin distribution r/K/N likewise but only output Kth of N to stdout GNU coreutils online help: </software/coreutils/> Full documentation </software/coreutils/split> or available locally via: info '(coreutils) split invocation'
2.2 Interpretation of split command options
Below issplit
The command help information is translated into Chinese and sorted in the form of a Markdown table:
Options | describe |
---|---|
-a, --suffix-length=N | Generate a suffix with length N (default is 2) |
--additional-suffix=SUFFIX | Append additional SUFFIX after the file name |
-b, --bytes=SIZE | Each output file size is SIZE bytes |
-C, --line-bytes=SIZE | Records containing up to SIZE bytes per output file |
-d | Use a numeric suffix starting from 0, instead of a letter suffix |
--numeric-suffixes[=FROM] | Same as -d, but allows setting of the starting value |
-x | Use a hexadecimal suffix starting from 0, instead of a letter suffix |
--hex-suffixes[=FROM] | Same as -x, but allows setting of the starting value |
-e, --elide-empty-files | When using '-n', no empty output file is generated |
--filter=COMMAND | Write content to shell command COMMAND; file name is $FILE |
-l, --lines=NUMBER | Each output file contains NUMBER lines/records |
-n, --number=CHUNKS | Generate CHUNKS output files; see below for details |
-t, --separator=SEP | Use SEP as record separator, not line breaks; '\0' specifies NUL characters |
-u, --unbuffered | Copy input to output immediately when using '-n r/…' |
--verbose | Print diagnostic information before opening each output file |
--help | Show help information and exit |
--version | Output version information and exit |
SIZE Parameters
- The SIZE parameter is an integer and optional unit (for example: 10K means 10*1024).
- The units can be K, M, G, T, P, E, Z, Y (power of 1024) or KB, MB, … (power of 1000).
- Binary prefixes can also be used: KiB=K, MiB=M, etc.
CHUNKS Parameters
- N: Split into N files according to the input size
- K/N: Output the Kth to the standard output, a total of N copies
- l/N: Split into N files without splitting lines/records
- l/K/N: output the Kth to the standard output without splitting the line/record, a total of N copies
- r/N: Similar to 'l', but using loop allocation
- r/K/N: Same as above, but only outputs the Kth to the standard output
3. Basic use of split command
3.1 Generate test files
Generate a 2M size test file
root@jeven01:/test# dd if=/dev/zero bs=1M count=2 of= 2+0 records in 2+0 records out 2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.00158099 s, 1.3 GB/s root@jeven01:/test# ll -h -rw-r--r-- 1 root root 2.0M Oct 3 20:35
3.2 Split small files with size 200KB
Use the -b option to split the file you just created into small files with a size of 200KB:
root@jeven01:/test# split -b 200k root@jeven01:/test# ls xaa xab xac xad xae xaf xag xah xai xaj xak
3.3 Cut into a file with a numeric suffix
Use the -a and -d options to cut large files into small files with numeric suffixes.
root@jeven01:/test# split -b 200k -d -a 3 root@jeven01:/test# ll total 4104 drwxr-xr-x 2 root root 4096 Oct 3 20:42 ./ drwxr-xr-x 22 root root 4096 Sep 24 22:37 ../ -rw-r--r-- 1 root root 2097152 Oct 3 20:35 -rw-r--r-- 1 root root 204800 Oct 3 20:42 x000 -rw-r--r-- 1 root root 204800 Oct 3 20:42 x001 -rw-r--r-- 1 root root 204800 Oct 3 20:42 x002 -rw-r--r-- 1 root root 204800 Oct 3 20:42 x003 -rw-r--r-- 1 root root 204800 Oct 3 20:42 x004 -rw-r--r-- 1 root root 204800 Oct 3 20:42 x005 -rw-r--r-- 1 root root 204800 Oct 3 20:42 x006 -rw-r--r-- 1 root root 204800 Oct 3 20:42 x007 -rw-r--r-- 1 root root 204800 Oct 3 20:42 x008 -rw-r--r-- 1 root root 204800 Oct 3 20:42 x009 -rw-r--r-- 1 root root 49152 Oct 3 20:42 x010
3.4 Split files by number of lines
Split files by number of lines: split the file into a new file every 1000 lines, the new file name is logs_part_aa, logs_part_ab, etc.
split -l 1000 logs_part_
3.5 Prefix for the file name
The cut file name suffix is named in sequence with 000, etc., and the prefix is split_file.
root@jeven01:/test# split -b 200k -d -a 3 split_file root@jeven01:/test# ll -h total 4.1M drwxr-xr-x 2 root root 4.0K Oct 3 20:57 ./ drwxr-xr-x 22 root root 4.0K Sep 24 22:37 ../ -rw-r--r-- 1 root root 200K Oct 3 20:57 split_file000 -rw-r--r-- 1 root root 200K Oct 3 20:57 split_file001 -rw-r--r-- 1 root root 200K Oct 3 20:57 split_file002 -rw-r--r-- 1 root root 200K Oct 3 20:57 split_file003 -rw-r--r-- 1 root root 200K Oct 3 20:57 split_file004 -rw-r--r-- 1 root root 200K Oct 3 20:57 split_file005 -rw-r--r-- 1 root root 200K Oct 3 20:57 split_file006 -rw-r--r-- 1 root root 200K Oct 3 20:57 split_file007 -rw-r--r-- 1 root root 200K Oct 3 20:57 split_file008 -rw-r--r-- 1 root root 200K Oct 3 20:57 split_file009 -rw-r--r-- 1 root root 48K Oct 3 20:57 split_file010 -rw-r--r-- 1 root root 2.0M Oct 3 20:35
4. Things to note
1. Ensure the integrity of log files: When dividing log files by rows or bytes, please be careful to maintain the integrity of log records. Avoid splitting a full log record into two different files, which can lead to misunderstandings during log analysis. The -C option can be used to limit the maximum number of bytes per output file, while trying not to split lines.
2. Reasonably select the segmentation size: reasonably set the size of each segmentation file according to your storage needs and log processing strategy. Too large files may cause inconvenience in processing, while too small files may increase management complexity. For example, if the amount of logs generated per day is about 50MB, then consider dividing the file into small pieces of about 10MB.
3. Use appropriate suffix naming rules: For easy management and identification, set clear and meaningful prefixes and suffixes to the divided files. Specify the suffix length with the -a option and add a numeric suffix to the files using the -d or --numeric-suffixes option, which helps to process the files in order.
4. Consider timestamp information: If the log file contains timestamps, make sure that this important information is retained during the splitting process. This helps to quickly locate and search according to time in the future. The record separator can be customized with the -t option to accommodate timestamps in different formats.
5. Test and verify the results: Before formal application, perform segmentation tests on a small amount of sample data to check whether the output file meets expectations. Make sure all configurations are correct before performing operations on the full log. This step can help you discover possible problems in advance and adjust your plan in time.
6. Backup the original log file: Be sure to back up the original log file before performing any cutting operations. Although the split command does not modify the source file, backups can prevent data loss caused by accidental deletion or other human errors.
This is the end of this article about the example of using split to cut log files in Linux. For more related contents of split log files in Linux, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!