Data merging and correlation is often encountered in the process of data processing, in SQL, HQL, we may be useful to join, uion all, etc., in Pandas has the same function to meet the needs of data processing, I feel that Pandas is very convenient to deal with the data, the data processing efficiency is relatively high to meet the needs of different business requirements.
Data splicing:
concat is a pandas-level function that splices or merges data, either horizontally or vertically, depending on the axis.
function parameter
( objs: 'Iterable[NDFrame] | Mapping[Hashable, NDFrame]', axis=0, join='outer', ignore_index: 'bool' = False, keys=None, levels=None, names=None, verify_integrity: 'bool' = False, sort: 'bool' = False, copy: 'bool' = True, ) -> 'FrameOrSeriesUnion'
-
objs
: the merged dataset, usually passed in as a list, e.g., [df1,df2,df3]. -
axis
: Specify the axis on which the data will be spliced, 0 is for rows, splicing in the row direction; 1 is for columns, splicing in the column direction. -
join
: The splices are inner, or outer, in the same sense as in sql.
The above three parameters are often used in practice, and the other parameters will not be introduced again
Case in point:
analog data
Horizontal splicing
Horizontal Splicing-1
Columns with the same fields are stacked, columns with different fields are stored separately, and missing values are stored with theNAN
to populate, the following transformation of the simulated data with the same fields, a demonstration of the
Horizontal splicing-2
vertical splice
vertical splice
You can see that when splicing vertically, the grades are associated by index, so that grades with the same name are placed together, rather than simply stacked
Data linkage.
Data association is basically the same as join in SQL, two data tables can be associated at a time, there is a distinction between left and right tables, and you need to be able to specify the associated fields.
function parameter
( left: 'DataFrame | Series', right: 'DataFrame | Series', how: 'str' = 'inner', on: 'IndexLabel | None' = None, left_on: 'IndexLabel | None' = None, right_on: 'IndexLabel | None' = None, left_index: 'bool' = False, right_index: 'bool' = False, sort: 'bool' = False, suffixes: 'Suffixes' = ('_x', '_y'), copy: 'bool' = True, indicator: 'bool' = False, validate: 'str | None' = None, ) -> 'DataFrame'
-
left
: Left table -
right
: Table on the right -
how
: the way of association, {'left', 'right', 'outer', 'inner ', 'cross'}, defaults to 'inner'. -
on
: Fields specified in the association, common to both tables -
left_on
: Used in association with a field in the left table, used when the two tables do not share the associated field. -
right_on
: Used in association with a field in the right table, used when the two tables do not share the associated field.
The above parameters are often used in practice, other parameters will not be introduced again
Case in point:
data association
The use of merge is very similar to join in SQL, the use of basically the same way, both internal and external connections, with basically no difficulties
distinction between the two
- concat is just a method under pandas, and merge is a method under pandas and a method under DataFrame.
- concat can be spliced horizontally and vertically, and also serves as a linking function
- merge can only be associated, i.e. vertically spliced.
- concat can process multiple DataFrames at the same time, whereas merge can only process 2 DataFrames at the same time.
To this point this article on a one-time thoroughly speak through Python with the article is introduced to this, more related Python with the content please search for my previous articles or continue to browse the following related articles I hope you will support me in the future more!