Introduction
Python is excellent at handling data. Typically, data sets will include multiple variables and many instances, which makes it difficult to understand the situation of the data. Data visualization is a useful way to help you identify data patterns.
For example, suppose you are a real estate agent and you want to understand the relationship between the age of the house and the selling price. If your data includes data for 5 homes, it won’t be too difficult to understand the situation. But, suppose you want to use data from 500 houses in the entire town, it becomes very difficult to understand how age affects prices. By plotting the relationship between selling price and age, visualizing the data will surely clarify the relationship between the two.
Visualization is a way to quickly communicate concepts in a common way, especially for those who are not familiar with your data. Whenever we process data, visualization is often a necessary part of the analysis.
We will use the 2D drawing library matplotlib, originally written by John D. Hunter, and has since become a very active open source development community project. It allows you to generate high-quality line charts, scatter charts, histograms, bar charts, and more. Each graph presents data in a different way, and it is usually useful to try different types of graphs before determining the most informative graph of the data. Remember that visualization is a combination of art and science.
Given the importance of visualization, this tutorial describes how to plot data in Python using matplotlib. We will explain how to use a small set of data to generate a scatter plot, add titles and legends to the chart, and customize the chart by changing the appearance of the plot points.
After completing this tutorial, you will be able to plot data in Python!
Prerequisites
Before you proceed with this tutorial, you should have Python 3 installed and have a local programming environment set up on your computer. If this is not the case, you can follow the appropriate installation and setup guide for your operating system.
Step 1 — Import matplotlib
Before we start working in Python, let's check again if the matplotlib module is installed. On the command line, check if matplotlib is installed by running the following command:
python -c "import matplotlib"
If matplotlib is installed, this command will complete smoothly and we are ready to start. If not installed, you will receive an error message:
Traceback (most recent call last): File "<string>", line 1, in <module> ImportError: No module named 'matplolib'
If you receive an error message, use pip to download the library:
pip install matplotlib
Now that matplotlib is installed, we can import it in Python. First, let's create the script we will use in this tutorial:. Then, in our script, let's import matplotlib. Since we will only use the drawing module (pyplot), let's specify it when importing.
import as plt
We passedmatplotlib
Add at the end.pyplot
To specify the module we want to import. To make reference to the module easier in a script, we abbreviated it asplt
. Now we can continue to create and plot our data.
Step 2 — Create the data point to be drawn
In our Python script, let's create some data to process. We will use 2D data, so we will need the X and Y coordinates of each data point.
To better understand how matplotlib works, we will connect our data to possible real-life scenarios. Suppose we are the owner of a coffee shop and we are interested in the relationship between average weather during the year and the total sales of iced coffee. Our X variable will be the total number of iced coffees sold per month, and our Y variable will be the average temperature of Fahrenheit per month.
In our Python script, we will create two list variables:X
(Total sales of iced coffee) andY
(Average temperature). Each item in our respective list will represent the data for each month (January to December). For example, in January, the average temperature was 32 degrees Fahrenheit, and the coffee shop sold 590 iced coffee.
import as plt X = [590,540,740,130,810,300,320,230,470,620,770,250] Y = [32,36,39,52,61,72,77,75,68,57,48,48]
Now that we have the data, we can start drawing.
Step 3 — Plot the data
Scatter plots are ideal for determining the relationship between two variables, so we will use this chart type as an example. To create a scatter plot using matplotlib, we will usescatter()
function. This function requires two parameters, representing the X and Y coordinate values respectively.
import as plt X = [590,540,740,130,810,300,320,230,470,620,770,250] Y = [32,36,39,52,61,72,77,75,68,57,48,48] (X,Y) ()
We have to use each time we create a chart()
Specifies that the chart to be displayed.
Before proceeding, let's check if our script works properly. Save the script and run it from the command line:
python
If everything goes well, a window should be launched to display the chart as follows:
!Alt scatter plot
This window is great for viewing data; it is interactive and includes several features such as hovering to display labels and coordinates, zooming in or out, and saving.
Step 4 — Add title and tag
Now that we know our script is working properly, we can start adding information to the chart. To clearly show what our data represents, let's include a title in the chart along with the labels for each axis.
We will start by adding a title. We're in the script()
Add the title before the line.
import as plt X = [590,540,740,130,810,300,320,230,470,620,770,250] Y = [32,36,39,52,61,72,77,75,68,57,48,48] (X,Y) ('The relationship between temperature and iced coffee sales') ()
Next, inAdd the label of the axis below the line:
... ('Number of iced coffee cups sold') ('Fahrenheit') ...
If we save the script and run it again, we should now get an updated chart, more informative. Our updated chart should look like this:
!Alt Scatter plot with title and X/Y tags
Step 5 — Custom Charts
Each dataset we process is unique and it is important to be able to customize how we want the information to be displayed. Remember, visualization is also an art, so you can get creative in it! matplotlib includes many custom features such as different colors, dot symbols and sizes. Depending on our needs, we may want to try different proportions, using different ranges for our axis. We can change the default parameters by specifying the new range of the axis, as follows:
import as plt X = [590,540,740,130,810,300,320,230,470,620,770,250] Y = [32,36,39,52,61,72,77,75,68,57,48,48] (X,Y) (0,1000) (0,100) ('The relationship between temperature and iced coffee sales') () ...
The dots in the original chart look a little small, and the blue may not be the color we want. Maybe we want triangles instead of circles as symbols for points. If we want to change the actual color/size/shape of the point, we have to do it in the initial()
Make these changes in the call. We will change the following parameters:
-
s
: The size of the point, the default value is 20 -
c
: Color, sequence or color sequence, default value is ‘b’ -
marker
: dot symbol, default value is ‘o’
Possible markers include many different shapes such as rhombuses, hexagons, star shapes, etc. Color selections include blue, green, red and magenta. HTML hexadecimal strings can also be provided as colors. See matplotlib's documentation for a comprehensive list of possible markers and colors.
To make our chart easier to read, let's double the size of the point (s=60
), change the color to red (c='r'
) and change the symbol to a triangle (marker='^'
). We will modify()
function:
(X, Y, s=60, c='red', marker='^')
Before running the updated script, we can check again that our code is correct. The update script for custom charts should look like this:
import as plt X = [590,540,740,130,810,300,320,230,470,620,770,250] Y = [32,36,39,52,61,72,77,75,68,57,48,48]
#scatter plot
(X, Y, s=60, c='red', marker='^')
#Change the axis range
(0,1000) (0,100)
#Add a title
('The relationship between temperature and iced coffee sales')
#Add x and y-axis labels
('Number of iced coffee cups sold') ('Fahrenheit')
#Show chart
()
Don't forget to save your script before proceeding to step 6.
Step 6 — Save the chart
Now that we have finished the code, let's run it to view our newly customized chart.
python
A window should now open to display our chart:
!Alt Final scatter plot with title and X/Y tags and customized with larger, red, triangular points.
Next, save the chart by clicking the Save button (the disk icon located on the bottom toolbar). Remember that images will be saved in PNG format, not in interactive charts. Congratulations, now you have your own customized scatter plot!
in conclusion
In this tutorial, you learned how to plot data using matplotlib in Python. Now you can visualize your data and customize your charts.
The above is the detailed tutorial on using matplotlib to draw data in Python. For more information about Python matplotlib to draw data, please pay attention to my other related articles!