1. Text format data reading and writing
read_csv()
: Reads delimited data from a file, URL, or file-type object, with commas as the default delimiter.
read_table()
: Reads delimited data from a file, URL, or file-based object, with tab ('\t') as the default delimiter.
Windows users print the original contents of the file
Since this file is comma delimited, we can read it into a DataFrame using read_csv:.
You can also use read_table and specify the separator character
Just now is the case where the file contains a header line, but there are files that do not contain a header line, for example
If you read directly, the default will be the first line as the table header, that is, the default header = 0, that the first line for the header.
There are two ways to change that.
One is to allow pandas to automatically assign default column names.
The second is to designate the listing yourself.
Assuming that you want the message column to be the index of the returned DataFrame, you can either specify the column at position 4 as the index, or pass 'message' to the parameter index_col:.
Form a hierarchical index from multiple columns
The parser function has a number of additional arguments to handle various file formats where exceptions occur; for example, skiprows can be used to skip the first, third, and fourth lines.
Handling of missing values
Typically, missing values are either not displayed (empty string, or with some identification value)
By default, pandas uses some common identifiers such as NaN and NULL
The na_values option can be passed a list or a set of strings to handle missing values
In the dictionary, each column can specify a different missing value identifier
1.1 Reading text files in chunks
If you want to read only a small portion of the file (to avoid reading the whole file), you can specifyrows
To read a file in chunks, you can specify chunksize as the number of lines per chunk
The TextParser object returned by read_csv allows traversing the file based on chunksize and aggregating the 'a' columns to get a count value
1.2 Writing data into text format
Use DataFrame's to_csv method to export data to a comma-delimited file
By default, both row and column labels are written if no other options are specified, but both can be disabled.
It is also possible to write only a subset of the columns, and in the order of choice
Default missing values appear as empty strings in the output, and can be labeled with other identifying values
(Text result printed in the console when written to)
The default separator is a comma, which can be selected with the sep option
Series also has the to_csv method
I don't know why there is a line with 0 at the end ????
summarize
That's all for this post, I hope it was helpful and I hope you'll check back for more from me!