1. About box plots and the () method
Box and line plots are also known as box plots, or in some places, box-and-whisker plots. The benefits of using box-and-line plots are that they can describe the discrete distribution of data in a relatively stable way and identify outliers in the data.
Plotting box-and-line plots in pthon's matplotlib library uses the () method.
The main parameters of the method are as follows
parameters | descriptive |
---|---|
x | Data to be plotted on a box plot |
notch | Whether to show the box plot as a bump, the default is non-bump. |
sym | Specifies the shape of the anomaly, which is displayed as a plus sign (+) by default |
vert | Whether the box plot needs to be placed vertically |
whis | Specifies the distance between the upper and lower limits and the upper and lower quartiles. The default is 1.5 times the quartile difference |
position | Specifies the position of the box plot. Defaults to [0, 1, 2]. |
widths | Specify the width of the box plot, default is 0.5 |
patch_artist | Whether to fill the box color |
meanline | Whether or not to represent the mean as a line, the default is to represent it as a point. showmeans is True to make sense of this parameter. |
showmeans | Whether or not to display the mean value, the default is not displayed |
showcaps | Whether to display the two lines at the top and end of the box plot. The default is not to display |
showbox | Whether to display the box, default display |
showfliers | Whether or not to display abnormal values, default display |
boxprops | Set the properties of the box, such as border color, fill color, etc. Fill box color (facecolor key) is valid when patch_artist is True. |
medianprops | Set the properties of the median, such as the type of line, thickness, etc. |
meanprops | Setting properties of the mean, such as point size color, etc. |
capprops | Set the properties of the top and end lines of the box plot, such as color, thickness, etc. |
whiskerprops | Set the properties of the whiskers. Such as color, thickness, type of line, etc. |
2. Drawing a simple box-and-line diagram
Three random but fixed sets of data were randomly generated using a random number seed. to be used to plot three box-line individuals (one graph).
The global font uses italics.
import as plt import numpy as np fig = (1, facecolor='#33ff99', figsize=(10, 6)) ['-serif'] = ['STKAITI'] ['axes.unicode_minus'] = False [''] = '#cc00ff' (30) data1 = (20, 100, 200) data2 = (30, 120, 200) data3 = (40, 110, 200) ([data1, data2, data3]) (range(1, 4), ['Type A', 'Type B', 'Type C'], fontsize=20) (fontsize=20) ('Box-Line Chart', fontsize=25, color='#0033cc') ()
The image effect is as follows:
3. Creating a more refined image
The data in the lower side has been modified a bit. The randomly generated data on the upper side, because it is more uniform, makes it difficult to generate outliers and fails to achieve the desired presentation of the box-and-line plot.
Use the * symbol to mark outliers. And use the line to mark the mean value for each set of data.
import as plt import numpy as np fig = (1, facecolor='#33ff99', figsize=(10, 6)) ['-serif'] = ['STKAITI'] ['axes.unicode_minus'] = False [''] = '#cc00ff' (110) data1 = (20, 100, 200) data2 = (30, 120, 200) data3 = (40, 110, 200) # Modify a couple of values as exceptions for easy presentation data1[100:102] = [142, 150] data3[100:103] = [1, 5, 154] ([data1, data2, data3], notch=True, sym='*', patch_artist=True, boxprops={'color': '#ffff00', 'facecolor': '#0066ff'}, capprops={'color': '#ff3333', 'linewidth': 2}, showmeans=True, meanline=True ) (range(1, 4), ['Type A', 'Type B', 'Type C'], fontsize=20) (fontsize=20) ('Box-Line Chart', fontsize=25, color='#0033cc') ()
The effect of code execution is as follows:
4. Criteria for outliers
The criteria for judging outliers can be modified with the whois parameter. By default, the outliers will be judged as those that are not within the range of [mean ± 1.5 times the interquartile deviation].
Slightly modified from the above code:
Setting his=2
import as plt import numpy as np fig = (1, facecolor='#33ff99', figsize=(10, 6)) ['-serif'] = ['STKAITI'] ['axes.unicode_minus'] = False [''] = '#cc00ff' (110) data1 = (20, 100, 200) data2 = (30, 120, 200) data3 = (40, 110, 200) # Modify a couple of values as exceptions for easy presentation data1[100:102] = [142, 150] data3[100:103] = [1, 5, 154] ([data1, data2, data3], whis=2, notch=True, sym='*', patch_artist=True, boxprops={'color': '#ffff00', 'facecolor': '#0066ff'}, capprops={'color': '#ff3333', 'linewidth': 2}, showmeans=True, meanline=True ) (range(1, 4), ['Type A', 'Type B', 'Type C'], fontsize=20) (fontsize=20) ('Box-Line Chart', fontsize=25, color='#0033cc') ()
then there are no longer outliers in the result:
5. Output of abnormal values
The above is just a visualization of the outliers. Of course, this is not enough when doing data analysis, it is usually necessary to process the data, e.g. remove.
The python code below accomplishes the output of the exception value:
import numpy as np (110) data1 = (20, 100, 200) data2 = (30, 120, 200) data3 = (40, 110, 200) # Modify a couple of values as exceptions for easy presentation data1[100:102] = [142, 150] data3[100:103] = [1, 5, 154] Q1 = (a=data3, q=0.25) Q3 = (a=data3, q=0.75) # Calculate the interquartile range QR = Q3 - Q1 # Lower and upper limits low_limit = Q1 - 1.5 * QR up_limit = Q3 + 1.5 * QR print('The lower limit is:', low_limit) print('Capped at:', up_limit) print('The outliers are:') print(data3[(data3 < low_limit) + (data3 > up_limit)])
to this article on the Python matplotlib library based on the box plot this article, more related Python matplotlib box plot content, please search my previous posts or continue to browse the following related articles I hope that you will support me in the future!