wordcloud is a Python extension library in a form of words expressed in pictures, through the word cloud generated pictures, we can more intuitively see the story of a particular article synopsis.
Let's start by posting a word cloud map (using the Harry Potter novels as an example):
Before generating a word cloud map, there are first some preparations to be made
1. Installation of stuttering participle library
pip install jieba
There are a lot of stuttering modules in Python, and they all function in much the same way. The stuttering module we installed is the most commonly used type.
Let me briefly describe the use of stuttering participles
The stuttering participle is divided into three participle patterns:
(1) Full mode: scan all the words in the sentence, fast, but can not solve the problem of ambiguity.
(2) Precision mode: the most precise cut of the sentence, suitable for text analysis
(3) search engine mode: on the basis of the precise mode, the long words again cut, improve the recall rate, suitable for search engine word division
Here's a simple example to look at the difference between the three modes of subjunctivation:
import jieba # Full mode: scans all the words in a sentence, fast, but doesn't resolve ambiguities text = "Harry Potter is an excellent piece of literature." seg_list = (text, cut_all=True) print(u"[Full mode]:", "/ ".join(seg_list)) # Precision mode: cuts sentences most accurately, suitable for text analysis seg_list = (text, cut_all=False) print(u"[Precision mode]:", "/ ".join(seg_list)) # The default is precise mode seg_list = (text) print(u"[Default mode]:", "/ ".join(seg_list)) # Search engine mode: on the basis of the precise mode, the long words again cut, improve the recall rate, suitable for search engine word segmentation seg_list = jieba.cut_for_search(text) print(u"[Search engine mode]:", "/ ".join(seg_list))
Here's how to split the sentence:
As you can see from these three patterns, they don't do a very good job of delineating the proper noun "Harry Potter", which is because the stuttering dictionary doesn't have a record of this noun, so we need to manually add a customized dictionary.
Add a custom dictionary: find a convenient reference location (the path below is the location of my installation), create a new text file (with a .txt extension), enter the words you want to add into it (pay attention to the input format), save and exit!
Add the path to the custom dictionary in the code above, then click Run
jieba.load_userdict("/home/jmhao/anaconda3/lib/python3.7/site-packages/jieba/")
As you can see, the word "Harry Potter" has been recognized.
There is another output for stuttering participles with disabled words
stopwords = {}.fromkeys(['Excellent', 'Literature']) #After adding the disable word seg_list = (text) final = '' for seg in seg_list: if seg not in stopwords: final += seg seg_list_new = (final) print(u"[After cutting]:", "/ ".join(seg_list_new))
You can see that the words "excellent" and "literature" are not in the output.
Stuttering participle there are many more complex operations, specific can go to the official website to view, I will not go into too much detail
Let's officially start the word cloud
First of all, download the module, here I use the environment is Anaconda, because Anaconda contains a lot of commonly used extension packages, so here only need to download the wordcloud. if you use the environment is not Anaconda, you need to install numpy and PIL module
pip install wordcloud
Then we need to find an article and use stuttering participles to break the article into word forms
# Segmentation module def cut(text): # Selection of the word segmentation mode word_list = (text,cut_all= True) # Spaces between separate individuals after participles result = " ".join(word_list) # Returns the result of the word splitting return result
Here I created a text file "" in the current folder and copied a chapter of the novel as the main text of the word cloud
Use code control to open and read the contents of a novel
#Import text files, perform word segmentation, and create word clouds. with open("") as fp: text = () # Segmentation of read Chinese documents text = cut(text)
Find an image with a white background on the Internet and download it to the current folder as the background image for the word cloud (if you do not specify an image, a rectangular word cloud will be generated by default)
#Set the shape of the word cloud, if you set the shape of the word cloud, the generated word cloud will be consistent with the picture, and the width and height set later will be invalid by default. mask = ((""))
Next, you can define the color, outline, and other parameters of the word cloud according to your preference The following are the commonly used parameter setting methods
font_path : "font path" | The font style of the word cloud, if you want to output Chinese, then follow the Chinese font |
width = n | Canvas width, default is 400 pixels |
height = n | Canvas height, default is 400 pixels |
scale = n | Scale the canvas up or down |
min_font_size = n | Setting the minimum font size |
max_font_size = n | Setting the maximum font size |
stopwords = 'words' | Set the words to be blocked |
background_color = ''color | Setting the background panel color |
relative_scaling = n | Setting the correlation between font size and word frequency |
contour_width = n | Setting the outline width |
contour_color = 'color' | Setting the outline color |
Full Code
#Import word cloud library from wordcloud import WordCloud # Import image processing libraries import as image # Import data processing libraries import numpy as np #Import stuttering lexicon import jieba # Segmentation module def cut(text): # Selection of the word segmentation mode word_list = (text,cut_all= True) # Spaces between separate individuals after participles result = " ".join(word_list) return result #Import text files, perform word segmentation, and create word clouds. with open("") as fp: text = () # Segmentation of read Chinese documents text = cut(text) #Set the word cloud shape mask = (("")) #Customized word clouds wordcloud = WordCloud( # Mask layer, all layers drawn except white background (previously set width and height invalid) mask=mask, # Default black background, change to white background_color='#FFFFFF', # Expand or reduce the canvas proportionally scale=, # If you want to generate Chinese fonts, you need to add the path to the Chinese fonts. font_path="/usr/share/fonts/bb5828/wave-by-wave-art-song.otf" ).generate(text) # Return object image_produce = wordcloud.to_image() #SavePicture wordcloud.to_file("new_wordcloud.jpg") #Display Image image_produce.show()
Note: If you want to generate a picture style word cloud, the background of the found picture must be white, or use Photoshop to key in and replace it with a white background, otherwise the generated word cloud will be a rectangle.
The original image of my word cloud:
Generated word cloud maps:
To this point, this article on the realization of Python Wordcloud Wordcloud generated by the example of the article is introduced to this, more related Python Wordcloud Wordcloud generated by the content of the cloud, please search for my previous posts or continue to browse the following related articles I hope that you will have more support for me in the future!