SoFunction
Updated on 2024-10-28

pandas for data input and output methods in detail

1. Text format data reading and writing

read_csv(): Reads delimited data from a file, URL, or file-type object, with commas as the default delimiter.

read_table(): Reads delimited data from a file, URL, or file-based object, with tab ('\t') as the default delimiter.

Windows users print the original contents of the file

Since this file is comma delimited, we can read it into a DataFrame using read_csv:.

You can also use read_table and specify the separator character

Just now is the case where the file contains a header line, but there are files that do not contain a header line, for example

If you read directly, the default will be the first line as the table header, that is, the default header = 0, that the first line for the header.

There are two ways to change that.

One is to allow pandas to automatically assign default column names.

The second is to designate the listing yourself.

Assuming that you want the message column to be the index of the returned DataFrame, you can either specify the column at position 4 as the index, or pass 'message' to the parameter index_col:.

Form a hierarchical index from multiple columns

The parser function has a number of additional arguments to handle various file formats where exceptions occur; for example, skiprows can be used to skip the first, third, and fourth lines.

Handling of missing values

Typically, missing values are either not displayed (empty string, or with some identification value)

By default, pandas uses some common identifiers such as NaN and NULL

The na_values option can be passed a list or a set of strings to handle missing values

In the dictionary, each column can specify a different missing value identifier

1.1 Reading text files in chunks

If you want to read only a small portion of the file (to avoid reading the whole file), you can specifyrows

To read a file in chunks, you can specify chunksize as the number of lines per chunk

The TextParser object returned by read_csv allows traversing the file based on chunksize and aggregating the 'a' columns to get a count value

1.2 Writing data into text format

Use DataFrame's to_csv method to export data to a comma-delimited file

By default, both row and column labels are written if no other options are specified, but both can be disabled.

It is also possible to write only a subset of the columns, and in the order of choice

Default missing values appear as empty strings in the output, and can be labeled with other identifying values

(Text result printed in the console when written to)

The default separator is a comma, which can be selected with the sep option

Series also has the to_csv method

I don't know why there is a line with 0 at the end ????

summarize

That's all for this post, I hope it was helpful and I hope you'll check back for more from me!