1. Introduction to SnowNLP module
SnowNLP is a Python library designed specifically for Chinese text. It is based on natural language processing technology and provides a variety of functions, including word segmentation, part-of-speech annotation, sentiment analysis, text conversion (simplified and traditional Chinese conversion), keyword extraction, abstract generation, phrase extraction, and dependency analysis between words in text. Its core advantage lies in its ability to process Chinese texts, especially its sentiment analysis function.
SnowNLP was inspired by TextBlob, but unlike TextBlob, SnowNLP does not use NLTK, all algorithms are implemented by themselves and come with some trained dictionaries. It deals with unicode encoding, so you need to decode it into unicode by yourself when using it.
2. SnowNLP installation
Installing SnowNLP can be done through the pip command. Here are the installation steps:
- Open a terminal or command prompt.
- Enter the following command to install SnowNLP:
pip install snownlp
If you have network problems, you can try using domestic mirror sources, such as those from Tsinghua University.
3. Common operation cases and codes
Here are some code cases and outputs for common operations using SnowNLP:
- Participle
from snownlp import SnowNLP text = "China News Service, Beijing, December 29, 2023 (Reporter Liu Yuying) The "Guiding Opinions of the Ministry of Industry and Information Technology and Eight Departments and other departments on Accelerating the Transformation and Upgrading of Traditional Manufacturing" issued by the Ministry of Industry and Information Technology on December 29, proposed that by 2027, China's traditional manufacturing industry's position and competitiveness in the global industrial division of labor will be further consolidated and enhanced." s = SnowNLP(text) print()
Output result:
['China News Service', 'Beijing', 'December 29, 2023', 'Electronic', '(', 'Reporter', ', 'Liu Yuying', ')', 'China', 'Ministry of Industry and Information Technology', 'December 29, 'Published', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'and', 'competitive', 'further', 'consolidation', 'enhance', '. ']
Note: Word participle results may vary depending on the algorithm and corpus.
- Part of speech annotation
tags = [ for word in SnowNLP(text).tags] print(tags)
Output result:
The result of part-of-speech annotation is a list of part-of-speech tags, such as nouns (n), verbs (v), etc. Since the output results are long, we will not show them specifically here.
- Sentiment Analysis
sentiment = SnowNLP(text).sentiments print(sentiment) if sentiment > 0.5: print('Positive Emotion') else: print('Negative Emotions')
Output result:
(Sensibility analysis score, for example: 0.95)
Positive emotions
The result of sentiment analysis is a floating point number between 0 (negative) and 1 (positive). The closer the score is to 1, the more positive the emotional tendency of the text; the closer the score is to 0, the more negative the emotional tendency of the text.
- Text conversion (simplified and traditional Chinese conversion)
traditional = SnowNLP(text).han print(traditional)
Output result:
The simple and traditional Chinese conversion function may vary depending on the SnowNLP version and corpus. In some cases, the conversion may not take effect.
- Keyword extraction
keywords = SnowNLP(text).keywords(limit=5) print(keywords)
Output result:
['Traditional manufacturing industry', 'Transformation and upgrading', 'Guiding opinions', 'Ministry of Industry and Information Technology', 'Competitiveness']
The result of keyword extraction is a list of keywords, the number oflimit
Parameter Specifies.
- Summary generation
summary = SnowNLP(text).summary(3) print(summary)
Output result:
['The "Guiding Opinions of the Ministry of Industry and Information Technology and Eight Departments on Accelerating the Transformation and Upgrading of Traditional Manufacturing Industry" issued by the Ministry of Industry and Information Technology on December 29, proposing that by 2027, China's traditional manufacturing industry's position and competitiveness in the global industrial division of labor will be further consolidated and enhanced. ', 'The guiding opinions propose that by 2027, the level of high-end, intelligent, green and integrated development of traditional manufacturing will be significantly improved. ', 'The penetration rate of digital R&D design tools and the CNC rate of key processes in industrial enterprises exceed 90% and 70% respectively. ']
The result generated by the summary is a list of key sentences, the number specified by the parameters.
4. Summary
SnowNLP is a powerful Python natural language processing library, especially suitable for processing Chinese text. It provides a variety of functions such as word segmentation, part-of-speech annotation, sentiment analysis, text conversion, keyword extraction, abstract generation, etc. Through simple installation and code writing, users can easily implement natural language processing tasks for Chinese text.
This is the article about the introduction, installation and common operations of snownlp module in Python natural language processing. For more related contents of snownlp module installation and common operations of Python, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!