This article describes the method of stating the frequency of English articles in Python. Share it for your reference, as follows:
Application introduction:
It is a very common requirement to count the word frequency of English articles, and this article is implemented using python.
Idea Analysis:
1. Put each word in the English article into the list and count the list length;
2. Iterate through the list, count the number of times each word appears, and store the results in the dictionary;
3. Use the list length obtained in step 1 to find the frequency of each word occurrence and store the result in the frequency dictionary;
4. Sorting the dictionary with the "value" of the dictionary key-value pair as the standard, and output the results (you can also use the slice to output specific several with the largest or smallest frequency, because after the sorted sorted() function, the words and their frequency information are stored in the tuple, and all the tuples are then formed into a list.)
Code implementation:
fin = open('The_Magic_Skin _Honore_de_Balzac.txt') #the txt is up #to you lines=() () '''transform the article into word list ''' def words_list(): chardigit='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 ' all_lines = '' for line in lines: one_line='' for ch in line: if ch in chardigit: one_line = one_line + ch all_lines = all_lines + one_line return all_lines.split() '''calculate the total number of article list s is the article list ''' def total_num(s): return len(s) '''calculate the occurrence times of every word t is the article list ''' def word_dic(t): fre_dic = dict() for i in range(len(t)): fre_dic[t[i]] = fre_dic.get(t[i],0) + 1 return fre_dic '''calculate the occurrence times of every word w is dictionary of the occurrence times of every word ''' def word_fre(w): for key in w: w[key] = w[key] / total return w '''sort the dictionary v is the frequency of words ''' def word_sort(v): sort_dic = sorted((), key = lambda e:e[1]) return sort_dic '''This is entrance of functions output is the ten words with the largest frequency ''' total = total_num(words_list()) print(word_sort(word_fre(word_dic(words_list())))[-10:])
PS: Here are two more relevant statistical tools for your reference:
Online word count tool:
http://tools./code/zishutongji
Online character statistics and editing tools:
http://tools./code/char_tongji
For more information about Python-related content, please check out the topic of this site:Summary of Python file and directory operation skills》、《Summary of Python text file operation skills》、《Python data structure and algorithm tutorial》、《Summary of Python function usage tips》、《Summary of Python string operation skills"and"Python introduction and advanced classic tutorials》
I hope this article will be helpful to everyone's Python programming.