SoFunction
Updated on 2025-04-14

Detailed tutorial on drawing word cloud diagrams using Python

introduction

Word Cloud is a data visualization technology used to display frequent words in text data. It visually presents the keywords of the text by displaying words with higher frequency in larger fonts, while the words with lower frequency in smaller fonts.

In this tutorial, we will use Python's wordcloud library, combined with tools such as matplotlib and jieba, to show how to generate word cloud maps from text data.

1. Install the required libraries

Before you start, you need to make sure that the following Python libraries are installed:

pip install wordcloud matplotlib jieba numpy pillow
  • wordcloud: Used to generate word cloud maps.
  • matplotlib: Used to display the generated word cloud map.
  • jieba: Used in Chinese word participle.
  • numpy: Used to process array data.
  • pillow: Used for image processing (optional, enhance the effect of word cloud map).

2. Basic word cloud map generation

We start with simple English text and show how to generate word cloud maps.

2.1 Basic word cloud generation

import  as plt
from wordcloud import WordCloud

# 1. Prepare text datatext = "Python is a great programming language. Python is widely used for data science, web development, and automation."

# 2. Create a word cloud objectwordcloud = WordCloud(width=800, height=400, background_color="white").generate(text)

# 3. Show word cloud map(figsize=(10, 5))
(wordcloud, interpolation='bilinear')
("off")  # Do not display the coordinate axes()

explain:

  • WordCloudIt is a class used to generate word cloud maps. We set upwidthandheightTo adjust the size of the image,background_color="white"Set the background color to white.
  • .generate(text)Methods are used to generate word clouds from text.
  • imshow()Show the generated word cloud map,interpolation='bilinear'It is to make the image smoother.

2.2 Customize the appearance of word cloud map

You can customize the appearance of the word cloud map according to your needs, such as setting different colors, fonts, etc.

Example: Setting custom colors

wordcloud = WordCloud(width=800, height=400, background_color="black", colormap="coolwarm").generate(text)

(figsize=(10, 5))
(wordcloud, interpolation='bilinear')
("off")
()
  • colormap="coolwarm": Set the color map, you can choose different color schemes, such ascoolwarminfernoplasmawait.

Example: Setting custom fonts

wordcloud = WordCloud(font_path="/path/to/your/", width=800, height=400, background_color="white").generate(text)

(figsize=(10, 5))
(wordcloud, interpolation='bilinear')
("off")
()
  • font_pathA custom font can be set, which is especially suitable for Chinese word cloud generation.

3. Generation of Chinese word cloud map

The difficulty in generating a Chinese word cloud map lies in Chinese word segmentation. We can usejiebaThe library divides Chinese text into words and then generates word cloud maps.

3.1 Generation of Chinese word participle and word cloud map

import jieba
from wordcloud import WordCloud
import  as plt

# 1. Prepare Chinese texttext = "Python is a very powerful programming language. Python is widely used in fields such as data science, machine learning and artificial intelligence."

# 2. Use jieba for Chinese word segmentationseg_list = (text)
word_list = " ".join(seg_list)  # Convert word participle result to string
# 3. Create a word cloud object and generate a word cloudwordcloud = WordCloud(font_path="", width=800, height=400, background_color="white").generate(word_list)

# 4. Show word cloud map(figsize=(10, 5))
(wordcloud, interpolation='bilinear')
("off")
()

explain:

  • (text)Used to participle Chinese text.
  • Pass the word participle result" ".join(seg_list)Spliced ​​into a string and used in a cloud diagram.
  • font_path=""is specifying a font path that supports Chinese (It is Microsoft's black font, you can choose other font files as you want).

4. Use Mask to generate a word cloud with customized shapes

Mask refers to qualifying the shape of a word cloud map through a picture, so that the generated word cloud will be shape with the outline of the picture.

4.1 Using picture mask

import numpy as np
from wordcloud import WordCloud
import  as plt
from PIL import Image

# 1. Prepare text datatext = "Python is a versatile programming language. It is popular among data scientists, developers, and AI researchers."

# 2. Read the mask imagemask_image = (("cloud_shape.png"))

# 3. Create a word cloud object and generate a word cloudwordcloud = WordCloud(width=800, height=400, background_color="white", mask=mask_image).generate(text)

# 4. Show word cloud map(figsize=(10, 5))
(wordcloud, interpolation='bilinear')
("off")
()

explain:

  • mask_image = (("cloud_shape.png")): We load an image (such as heart, cloud, etc.) and convert it into a NumPy array as a mask.
  • maskParameters are used to specify the shape of the word cloud graph.

Note: When using masks, the background of the image must be white or transparent, and the image itself should be black and white (black represents the word cloud area, and white represents the transparent area).

5. Adjust word frequency and generate custom word cloud

You can control the size of certain words in the word cloud diagram by manually adjusting the frequency of words.

5.1 Setting the frequency dictionary

from wordcloud import WordCloud
import  as plt

# 1. Create a word frequency dictionaryword_frequencies = {
    "Python": 100,
    "Java": 50,
    "C++": 30,
    "JavaScript": 70,
    "Ruby": 10
}

# 2. Create a word cloud objectwordcloud = WordCloud(width=800, height=400, background_color="white").generate_from_frequencies(word_frequencies)

# 3. Show word cloud map(figsize=(10, 5))
(wordcloud, interpolation='bilinear')
("off")
()

explain:

  • usegenerate_from_frequencies()Method, we can generate a word cloud map based on the frequency dictionary, where the keys of the dictionary are words and the values ​​are the corresponding frequency.
  • The word cloud map displays words based on the word frequency information provided in the frequency dictionary, and words with higher frequency will display larger.

6. Save the word cloud image as a file

The generated word cloud map can not only be displayed on the screen, but can also be saved as an image file (such as PNG or JPG format) for subsequent use.

6.1 Save word cloud map

wordcloud = WordCloud(width=800, height=400, background_color="white").generate(text)

# Save as filewordcloud.to_file("wordcloud_output.png")

explain:

  • useto_file()Method saves the generated word cloud map as a file. You can specify the saved file path and format (such as PNG, JPG, etc.).

7. Summary

In this article, we detail how to use Python andwordcloudThe library generates word cloud map. We learned the following:

  1. Basic word cloud map generation: Generate word clouds through simple text.
  2. Chinese word cloud map generation:usejiebaThe library divides Chinese text into words and generates word cloud maps.
  3. Use mask to generate a word cloud for custom shapes: Create word clouds with specific shapes through image mask.
  4. Custom word cloud: Generate a custom word cloud map based on the word frequency dictionary.
  5. Save word cloud map: Save the generated word cloud map as an image file.

Word Cloud is a powerful visualization tool that can help you extract the most representative vocabulary from large amounts of text data. In practical applications, word clouds are widely used to analyze text content such as social media data, comment data, etc.

This is the end of this article about the detailed tutorial on using Python to draw word cloud maps. For more related content on Python to draw word cloud maps, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!