SoFunction
Updated on 2024-10-29

Plotting box-and-line plots based on the Python matplotlib library

1. About box plots and the () method

Box and line plots are also known as box plots, or in some places, box-and-whisker plots. The benefits of using box-and-line plots are that they can describe the discrete distribution of data in a relatively stable way and identify outliers in the data.

Plotting box-and-line plots in pthon's matplotlib library uses the () method.

The main parameters of the method are as follows

parameters descriptive
x Data to be plotted on a box plot
notch Whether to show the box plot as a bump, the default is non-bump.
sym Specifies the shape of the anomaly, which is displayed as a plus sign (+) by default
vert Whether the box plot needs to be placed vertically
whis Specifies the distance between the upper and lower limits and the upper and lower quartiles. The default is 1.5 times the quartile difference
position Specifies the position of the box plot. Defaults to [0, 1, 2].
widths Specify the width of the box plot, default is 0.5
patch_artist Whether to fill the box color
meanline Whether or not to represent the mean as a line, the default is to represent it as a point. showmeans is True to make sense of this parameter.
showmeans Whether or not to display the mean value, the default is not displayed
showcaps Whether to display the two lines at the top and end of the box plot. The default is not to display
showbox Whether to display the box, default display
showfliers Whether or not to display abnormal values, default display
boxprops Set the properties of the box, such as border color, fill color, etc. Fill box color (facecolor key) is valid when patch_artist is True.
medianprops Set the properties of the median, such as the type of line, thickness, etc.
meanprops Setting properties of the mean, such as point size color, etc.
capprops Set the properties of the top and end lines of the box plot, such as color, thickness, etc.
whiskerprops Set the properties of the whiskers. Such as color, thickness, type of line, etc.

2. Drawing a simple box-and-line diagram

Three random but fixed sets of data were randomly generated using a random number seed. to be used to plot three box-line individuals (one graph).

The global font uses italics.

import  as plt
import numpy as np
fig = (1, facecolor='#33ff99', figsize=(10, 6))
['-serif'] = ['STKAITI']
['axes.unicode_minus'] = False
[''] = '#cc00ff'
(30)
data1 = (20, 100, 200)
data2 = (30, 120, 200)
data3 = (40, 110, 200)
([data1, data2, data3])
(range(1, 4), ['Type A', 'Type B', 'Type C'], fontsize=20)
(fontsize=20)
('Box-Line Chart', fontsize=25, color='#0033cc')
()

The image effect is as follows:

3. Creating a more refined image

The data in the lower side has been modified a bit. The randomly generated data on the upper side, because it is more uniform, makes it difficult to generate outliers and fails to achieve the desired presentation of the box-and-line plot.

Use the * symbol to mark outliers. And use the line to mark the mean value for each set of data.

import  as plt
import numpy as np
fig = (1, facecolor='#33ff99', figsize=(10, 6))
['-serif'] = ['STKAITI']
['axes.unicode_minus'] = False
[''] = '#cc00ff'
(110)
data1 = (20, 100, 200)
data2 = (30, 120, 200)
data3 = (40, 110, 200)
# Modify a couple of values as exceptions for easy presentation
data1[100:102] = [142, 150]
data3[100:103] = [1, 5, 154]
([data1, data2, data3],
            notch=True,
            sym='*',
            patch_artist=True,
            boxprops={'color': '#ffff00', 'facecolor': '#0066ff'},
            capprops={'color': '#ff3333', 'linewidth': 2},
            showmeans=True,
            meanline=True
            )
(range(1, 4), ['Type A', 'Type B', 'Type C'], fontsize=20)
(fontsize=20)
('Box-Line Chart', fontsize=25, color='#0033cc')
()

The effect of code execution is as follows:

4. Criteria for outliers

The criteria for judging outliers can be modified with the whois parameter. By default, the outliers will be judged as those that are not within the range of [mean ± 1.5 times the interquartile deviation].

Slightly modified from the above code:

Setting his=2

import  as plt
import numpy as np
fig = (1, facecolor='#33ff99', figsize=(10, 6))
['-serif'] = ['STKAITI']
['axes.unicode_minus'] = False
[''] = '#cc00ff'
(110)
data1 = (20, 100, 200)
data2 = (30, 120, 200)
data3 = (40, 110, 200)
# Modify a couple of values as exceptions for easy presentation
data1[100:102] = [142, 150]
data3[100:103] = [1, 5, 154]
([data1, data2, data3],
            whis=2,
            notch=True,
            sym='*',
            patch_artist=True,
            boxprops={'color': '#ffff00', 'facecolor': '#0066ff'},
            capprops={'color': '#ff3333', 'linewidth': 2},
            showmeans=True,
            meanline=True
            )
(range(1, 4), ['Type A', 'Type B', 'Type C'], fontsize=20)
(fontsize=20)
('Box-Line Chart', fontsize=25, color='#0033cc')
()

then there are no longer outliers in the result:

5. Output of abnormal values

The above is just a visualization of the outliers. Of course, this is not enough when doing data analysis, it is usually necessary to process the data, e.g. remove.

The python code below accomplishes the output of the exception value:

import numpy as np
(110)
data1 = (20, 100, 200)
data2 = (30, 120, 200)
data3 = (40, 110, 200)
# Modify a couple of values as exceptions for easy presentation
data1[100:102] = [142, 150]
data3[100:103] = [1, 5, 154]

Q1 = (a=data3, q=0.25)
Q3 = (a=data3, q=0.75)
# Calculate the interquartile range
QR = Q3 - Q1
# Lower and upper limits
low_limit = Q1 - 1.5 * QR
up_limit = Q3 + 1.5 * QR
print('The lower limit is:', low_limit)
print('Capped at:', up_limit)
print('The outliers are:')
print(data3[(data3 < low_limit) + (data3 > up_limit)])

to this article on the Python matplotlib library based on the box plot this article, more related Python matplotlib box plot content, please search my previous posts or continue to browse the following related articles I hope that you will support me in the future!