SoFunction
Updated on 2024-10-29

Example of Python Implementation of Wordcloud to Generate Word Cloud Maps

wordcloud is a Python extension library in a form of words expressed in pictures, through the word cloud generated pictures, we can more intuitively see the story of a particular article synopsis.

Let's start by posting a word cloud map (using the Harry Potter novels as an example):

Before generating a word cloud map, there are first some preparations to be made

1. Installation of stuttering participle library

pip install jieba

There are a lot of stuttering modules in Python, and they all function in much the same way. The stuttering module we installed is the most commonly used type.

Let me briefly describe the use of stuttering participles

The stuttering participle is divided into three participle patterns:

(1) Full mode: scan all the words in the sentence, fast, but can not solve the problem of ambiguity.

(2) Precision mode: the most precise cut of the sentence, suitable for text analysis

(3) search engine mode: on the basis of the precise mode, the long words again cut, improve the recall rate, suitable for search engine word division

Here's a simple example to look at the difference between the three modes of subjunctivation:

import jieba
 
 # Full mode: scans all the words in a sentence, fast, but doesn't resolve ambiguities
 text = "Harry Potter is an excellent piece of literature."
 seg_list = (text, cut_all=True)
 print(u"[Full mode]:", "/ ".join(seg_list))
 
 # Precision mode: cuts sentences most accurately, suitable for text analysis
 seg_list = (text, cut_all=False)
 print(u"[Precision mode]:", "/ ".join(seg_list))
 
 # The default is precise mode
 seg_list = (text)
 print(u"[Default mode]:", "/ ".join(seg_list))
 
 # Search engine mode: on the basis of the precise mode, the long words again cut, improve the recall rate, suitable for search engine word segmentation
seg_list = jieba.cut_for_search(text)
print(u"[Search engine mode]:", "/ ".join(seg_list))

Here's how to split the sentence:

As you can see from these three patterns, they don't do a very good job of delineating the proper noun "Harry Potter", which is because the stuttering dictionary doesn't have a record of this noun, so we need to manually add a customized dictionary.

Add a custom dictionary: find a convenient reference location (the path below is the location of my installation), create a new text file (with a .txt extension), enter the words you want to add into it (pay attention to the input format), save and exit!

Add the path to the custom dictionary in the code above, then click Run

jieba.load_userdict("/home/jmhao/anaconda3/lib/python3.7/site-packages/jieba/")

As you can see, the word "Harry Potter" has been recognized.

There is another output for stuttering participles with disabled words

 stopwords = {}.fromkeys(['Excellent', 'Literature'])
 
 #After adding the disable word
 seg_list = (text)
 final = ''
 for seg in seg_list:
   if seg not in stopwords:
       final += seg
 seg_list_new = (final)
 print(u"[After cutting]:", "/ ".join(seg_list_new))

You can see that the words "excellent" and "literature" are not in the output.

Stuttering participle there are many more complex operations, specific can go to the official website to view, I will not go into too much detail

Let's officially start the word cloud

First of all, download the module, here I use the environment is Anaconda, because Anaconda contains a lot of commonly used extension packages, so here only need to download the wordcloud. if you use the environment is not Anaconda, you need to install numpy and PIL module

pip install wordcloud

Then we need to find an article and use stuttering participles to break the article into word forms

# Segmentation module
 def cut(text):
   # Selection of the word segmentation mode
   word_list = (text,cut_all= True)
   # Spaces between separate individuals after participles
   result = " ".join(word_list)
   # Returns the result of the word splitting
   return result

Here I created a text file "" in the current folder and copied a chapter of the novel as the main text of the word cloud

Use code control to open and read the contents of a novel

 #Import text files, perform word segmentation, and create word clouds.
 with open("") as fp:
   text = ()
   # Segmentation of read Chinese documents
   text = cut(text)

Find an image with a white background on the Internet and download it to the current folder as the background image for the word cloud (if you do not specify an image, a rectangular word cloud will be generated by default)

#Set the shape of the word cloud, if you set the shape of the word cloud, the generated word cloud will be consistent with the picture, and the width and height set later will be invalid by default.
  mask = ((""))

Next, you can define the color, outline, and other parameters of the word cloud according to your preference The following are the commonly used parameter setting methods

font_path : "font path" The font style of the word cloud, if you want to output Chinese, then follow the Chinese font
width =  n Canvas width, default is 400 pixels
height =  n Canvas height, default is 400 pixels
scale = n Scale the canvas up or down
min_font_size = n Setting the minimum font size
max_font_size = n Setting the maximum font size
stopwords = 'words' Set the words to be blocked
background_color = ''color Setting the background panel color
relative_scaling = n Setting the correlation between font size and word frequency
contour_width = n Setting the outline width
contour_color = 'color' Setting the outline color

Full Code

#Import word cloud library
 from wordcloud import WordCloud
 # Import image processing libraries
 import  as image
 # Import data processing libraries
 import numpy as np
 #Import stuttering lexicon
 import jieba
 
 # Segmentation module
 def cut(text):
   # Selection of the word segmentation mode
   word_list = (text,cut_all= True)
   # Spaces between separate individuals after participles
   result = " ".join(word_list)
   return result
 
 #Import text files, perform word segmentation, and create word clouds.
 with open("") as fp:
   text = ()
   # Segmentation of read Chinese documents
   text = cut(text)
   #Set the word cloud shape
   mask = ((""))
   #Customized word clouds
   wordcloud = WordCloud(
     # Mask layer, all layers drawn except white background (previously set width and height invalid)
     mask=mask,
     # Default black background, change to white
     background_color='#FFFFFF',
     # Expand or reduce the canvas proportionally
     scale=,
     # If you want to generate Chinese fonts, you need to add the path to the Chinese fonts.
     font_path="/usr/share/fonts/bb5828/wave-by-wave-art-song.otf"
   ).generate(text)
   # Return object
   image_produce = wordcloud.to_image()
   #SavePicture
   wordcloud.to_file("new_wordcloud.jpg")
   #Display Image
   image_produce.show()

Note: If you want to generate a picture style word cloud, the background of the found picture must be white, or use Photoshop to key in and replace it with a white background, otherwise the generated word cloud will be a rectangle.

The original image of my word cloud:

Generated word cloud maps:

 

To this point, this article on the realization of Python Wordcloud Wordcloud generated by the example of the article is introduced to this, more related Python Wordcloud Wordcloud generated by the content of the cloud, please search for my previous posts or continue to browse the following related articles I hope that you will have more support for me in the future!