Plotting bivariate joint distributions
Sometimes we need to look not only at the distribution of a single variable, but also at the association between variables, and often to make predictions. This is where the joint bivariate distribution comes in.
Let's take a look at the methods for visualizing continuous numeric data between bivariate variables.
To draw continuous numeric bivariate variables in Seaborn we use the()
:
Description Documentation:/generated/
(x, y, data=None, kind='scatter')
- x, y: record the data name of the x-axis and y-axis respectively.
- data: the data set, the data type of data isDataFrame。
- kind: use to set the type of the image, the available types are: 'scatter' | 'reg' | 'resid' | 'kde' | 'hex', which means scatter plot, regression plot, residual plot, kernel density plot and hive plot respectively.
scatterplot
If we wish to look at the data inWhen the relationship between two variables on a two-dimensional plane isInstead, we can use scatterplots, which can help us easily find the distribution pattern of some data.
import numpy as np import seaborn as sns import pandas as pd import as plt df =({'x':(size=500), 'y':(size=500)}) (x='x',y='y',data=df,kind='reg') ()
Bivariate scatterplots:
- Based on the results we find that the () function displays theJoint relationship between two variablesas well asDistribution of each univariate。
- By setting the kind parameter in the function to 'reg' we can do some simplelinear model fitting。
- and in the coordinate system ofAbove and to the rightHistograms and kernel density plots for the two variables were plotted separately.
hive map
Above we plotted a joint scatterplot based on the data, but you will find that there is no clear linear relationship between the two data, and the scatterplot has a problem, that is, the same points will be covered together, so we can not see the dense and sparse. So we can use the hive plot to see the distribution of the data.
The hive map is still drawn using the () function, just change the kind parameter to hex.
import numpy as np import seaborn as sns import pandas as pd import as plt df =({'x':(size=500), 'y':(size=500)}) (x='x',y='y',data=df,kind='hex') ()
Each hexagon in the honeycomb plot represents a range, and the color indicates the amount of data within the range; the whiter the color, the smaller the amount of data, and the darker the color, the larger the amount of data. The whiter the color, the smaller the amount of data, the darker the color, the larger the amount of data. When the data is large, it is easier to find out the distribution of the data in this way.
density map
In univariate analysis, we plotted univariate probability density curves, and in bivariate analysis we can also use density plots to analyze the distribution of the data. In bivariate analysis, we can also use density plots to analyze the distribution of the data. Density plots are still drawn using the () function, except that the kind parameter is changed to kde.
import numpy as np import seaborn as sns import pandas as pd import as plt df =({'x':(size=500), 'y':(size=500)}) (x='x',y='y',data=df,kind='kde') ()
As can be seen from the graph, the bivariate density plot is represented by a number of closed but irregular curves, with higher data densities being darker and lower data densities being lighter.
g = (data=x_data, x=x, y=y) g.plot_joint(, color="r", zorder=0, levels=6) g.plot_marginals(, color="r", height=-.15, clip_on=False)
( data=x_data, x=x, y=y, marker="+", s=100, marginal_kws=dict(bins=25, fill=False), )
Above is seaborn plotting bivariate joint distribution example details, more information about seaborn plotting bivariate joint distribution please pay attention to my other related articles!