Introduction, installation and common operational cases of snownlp module in Python natural language processing

1. Introduction to SnowNLP module

SnowNLP is a Python library designed specifically for Chinese text. It is based on natural language processing technology and provides a variety of functions, including word segmentation, part-of-speech annotation, sentiment analysis, text conversion (simplified and traditional Chinese conversion), keyword extraction, abstract generation, phrase extraction, and dependency analysis between words in text. Its core advantage lies in its ability to process Chinese texts, especially its sentiment analysis function.

SnowNLP was inspired by TextBlob, but unlike TextBlob, SnowNLP does not use NLTK, all algorithms are implemented by themselves and come with some trained dictionaries. It deals with unicode encoding, so you need to decode it into unicode by yourself when using it.

2. SnowNLP installation

Installing SnowNLP can be done through the pip command. Here are the installation steps:

Open a terminal or command prompt.
Enter the following command to install SnowNLP:

pip install snownlp

If you have network problems, you can try using domestic mirror sources, such as those from Tsinghua University.

3. Common operation cases and codes

Here are some code cases and outputs for common operations using SnowNLP:

Participle

from snownlp import SnowNLP

text = "China News Service, Beijing, December 29, 2023 (Reporter Liu Yuying) The "Guiding Opinions of the Ministry of Industry and Information Technology and Eight Departments and other departments on Accelerating the Transformation and Upgrading of Traditional Manufacturing" issued by the Ministry of Industry and Information Technology on December 29, proposed that by 2027, China's traditional manufacturing industry's position and competitiveness in the global industrial division of labor will be further consolidated and enhanced."

s = SnowNLP(text)
print()

Output result：

['China News Service', 'Beijing', 'December 29, 2023', 'Electronic', '(', 'Reporter', ', 'Liu Yuying', ')', 'China', 'Ministry of Industry and Information Technology', 'December 29, 'Published', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'and', 'competitive', 'further', 'consolidation', 'enhance', '. ']

Note: Word participle results may vary depending on the algorithm and corpus.

Part of speech annotation

tags = [ for word in SnowNLP(text).tags]
print(tags)

Output result：

The result of part-of-speech annotation is a list of part-of-speech tags, such as nouns (n), verbs (v), etc. Since the output results are long, we will not show them specifically here.

Sentiment Analysis

sentiment = SnowNLP(text).sentiments
print(sentiment)

if sentiment &gt; 0.5:
    print('Positive Emotion')
else:
    print('Negative Emotions')

Output result：

(Sensibility analysis score, for example: 0.95)
Positive emotions

The result of sentiment analysis is a floating point number between 0 (negative) and 1 (positive). The closer the score is to 1, the more positive the emotional tendency of the text; the closer the score is to 0, the more negative the emotional tendency of the text.

Text conversion (simplified and traditional Chinese conversion)

traditional = SnowNLP(text).han
print(traditional)

Output result：

The simple and traditional Chinese conversion function may vary depending on the SnowNLP version and corpus. In some cases, the conversion may not take effect.

Keyword extraction

keywords = SnowNLP(text).keywords(limit=5)
print(keywords)

Output result：

['Traditional manufacturing industry', 'Transformation and upgrading', 'Guiding opinions', 'Ministry of Industry and Information Technology', 'Competitiveness']

The result of keyword extraction is a list of keywords, the number oflimitParameter Specifies.

Summary generation

summary = SnowNLP(text).summary(3)
print(summary)

Output result：

['The "Guiding Opinions of the Ministry of Industry and Information Technology and Eight Departments on Accelerating the Transformation and Upgrading of Traditional Manufacturing Industry" issued by the Ministry of Industry and Information Technology on December 29, proposing that by 2027, China's traditional manufacturing industry's position and competitiveness in the global industrial division of labor will be further consolidated and enhanced. ', 'The guiding opinions propose that by 2027, the level of high-end, intelligent, green and integrated development of traditional manufacturing will be significantly improved. ', 'The penetration rate of digital R&D design tools and the CNC rate of key processes in industrial enterprises exceed 90% and 70% respectively. ']

The result generated by the summary is a list of key sentences, the number specified by the parameters.

4. Summary

SnowNLP is a powerful Python natural language processing library, especially suitable for processing Chinese text. It provides a variety of functions such as word segmentation, part-of-speech annotation, sentiment analysis, text conversion, keyword extraction, abstract generation, etc. Through simple installation and code writing, users can easily implement natural language processing tasks for Chinese text.

This is the article about the introduction, installation and common operations of snownlp module in Python natural language processing. For more related contents of snownlp module installation and common operations of Python, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!