existPrevious article.In the article, I compiled a description of the use of the concat method, which is commonly used in pandas for data merging and reshaping. Here, I'll move on to the join and merge methods, which are also often used in pandas.
merge
The merge method of pandas provides an SQL-like memory linking operation, and the official website documentation mentions that its performance will be more efficient than data manipulation in other open source languages (e.g. R).
A comparison with SQL statements can be seenhere are
Parameters of merge
on: column name, join used to align the name of the column, use this parameter must ensure that the left table and the right table used to align the column have the same column name.
left_on: left table aligned columns, either column names or arrays of the same length as the dataframe.
right_on: right table aligned columns, either column names or arrays of the same length as the dataframe.
left_index/ right_index: if True haunted with index as alignment key
HOW: Methods of data fusion.
sort: according to the dataframe merge keys in dictionary order, default is yes, if set false can improve the performance.
merge's default merge method:
merge is used for internal table merging based on index-on-index and index-on-column(s), but merging based on index is the default.
1.1 Methods for merging composite keys
When using merge, you can select multiple keys as a composite that can be aligned and merged.
1.1.1 Specifying data merge aligned columns by on
In [41]: left = ({'key1': ['K0', 'K0', 'K1', 'K2'], ....: 'key2': ['K0', 'K1', 'K0', 'K1'], ....: 'A': ['A0', 'A1', 'A2', 'A3'], ....: 'B': ['B0', 'B1', 'B2', 'B3']}) ....: In [42]: right = ({'key1': ['K0', 'K1', 'K1', 'K2'], ....: 'key2': ['K0', 'K0', 'K0', 'K0'], ....: 'C': ['C0', 'C1', 'C2', 'C3'], ....: 'D': ['D0', 'D1', 'D2', 'D3']}) ....: In [43]: result = (left, right, on=['key1', 'key2'])
The inner method is used by default if you don't specify how.
HOW methods are available:
left
Keep all data in the left table only
In [44]: result = (left, right, how='left', on=['key1', 'key2'])
right
Keep all data in the right table only
In [45]: result = (left, right, how='right', on=['key1', 'key2'])
outer
Retain all information from both tables
In [46]: result = (left, right, how='outer', on=['key1', 'key2'])
inner
Keep only the information in the common parts of the two tables
In [47]: result = (left, right, how='inner', on=['key1', 'key2'])
1.2 indicator
The v0.17.0 version of pandas also supports an indicator parameter, which, if set to True, adds a column called '_merge' to the output. The _merge column can take three values
- left_only Only in the left table
- right_only In the right table only
- Both. In both tables.
1.3 The join method
The dataframe's built-in join method is a fast way to merge. It defaults to index as the aligned column.
1.3.1 how parameters
The how parameter in join, like the how parameter in merge, is used to specify the rules for retaining data for table merges.
See the previous how section for details.
1.3.2 on parameter
In practice, if the index value of the right table is the value of a column in the left table, then you can merge the index of the right table with the column of the left table in a flexible way by aligning the index of the right table with the column of the left table.
ex 1
In [59]: left = ({'A': ['A0', 'A1', 'A2', 'A3'], ....: 'B': ['B0', 'B1', 'B2', 'B3'], ....: 'key': ['K0', 'K1', 'K0', 'K1']}) ....: In [60]: right = ({'C': ['C0', 'C1'], ....: 'D': ['D0', 'D1']}, ....: index=['K0', 'K1']) ....: In [61]: result = (right, on='key')
1.3.3 suffix suffix parameters
If the process of merging with a table encounters a column with the same name in both tables, but the value is different, and both want to keep it when merging, you can use suffixes to add suffixes to the duplicate column names in each table.
In [79]: result = (left, right, on='k', suffixes=['_l', '_r'])
* There are also lsuffix and rsuffix which specify the suffix of the left table and the suffix of the right table, respectively.
1.4 Combining multiple dataframes
When combining multiple dataframes at a time, you can pass in a list or tuple of dataframes, joining multiple at a time, solving multiple problems at once~.
In [83]: right2 = ({'v': [7, 8, 9]}, index=['K1', 'K1', 'K2']) In [84]: result = ([right, right2])
1.5 Updating the table's nan value
1.5.1 combine_first
If a table has a nan value that can be found in the same location (same index and same column) in another table, the data can be updated with combine_first
1.5.2 update
If you want to use the data in one table to update the data in another table, you can use update to do so.
1.5.3 Difference between combine_first and update
Using combine_first will update only the nan values in the left table. Update, on the other hand, will update all values in the left table that can be found in the right table (the two tables are positioned relative to each other).
Example code reference source -official website
This is the whole content of this article.