preamble
NLP-string similarity computation, set similarity metrics
Tip: Below is the body of this post, with the following examples for reference
What is the Jaccard distance?
Jaccard Distance (Jaccard Distance) is a measure of the dissimilarity of two sets, which is the complement of the Jaccard similarity coefficient, defined as 1 minus the Jaccard similarity coefficient. And Jaccard similarity coefficient (Jaccard similarity coefficient), also known as Jaccard Index (Jaccard Index), is a measure of the similarity of two sets.
define
The Jaccard similarity index is used to measure the similarity between two sets, which is defined as the number of elements in the intersection of two sets divided by the number of elements in the concurrent set.
The Jaccard distance is used to measure the dissimilarity between two sets, which is the complement of Jaccard's similarity coefficient and is defined as 1 minus the Jaccard similarity coefficient.
Python implementation
The code is as follows:
# -*- encoding:utf-8 -*- import jieba def Jaccard(model, reference): # terms_reference is the source sentence, terms_model is the candidate sentence terms_reference = (reference) # Default precision mode terms_model = (model) grams_reference = set(terms_reference) # De-emphasize; change to list if not needed grams_model = set(terms_model) temp = 0 for i in grams_reference: if i in grams_model: temp = temp + 1 fenmu = len(grams_model) + len(grams_reference) - temp # And the concatenation try: jaccard_coefficient = float(temp / fenmu) # Intersection except ZeroDivisionError: print(model, reference) return 0 else: return jaccard_coefficient
What's the ring?
The rate of development of the ring is the ratio of the level of the reporting period to the level of the previous period, indicating the rate of development of the phenomenon from period to period. If we calculate the comparison between each month of the year and the previous month, i.e. February compared to January, March compared to February, April compared to March ...... December compared to November, it shows the degree of development month by month. If you analyze the development trend of certain economic phenomena during the fight against SARS, the chain ratio is more illustrative than the year-on-year ratio.
Anyone who has studied statistics or economics knows that statistical indicators can be categorized into aggregate, relative and average indicators according to their specific content, actual role and form of expression. Due to the different base period, the development speed can be divided into year-on-year development speed, chain development speed and fixed base development speed. To put it simply, year-on-year, chain-on-chain and base-on-base ratios can all be expressed as percentages or multiples.
The base rate of development, also referred to as the overall rate, generally refers to the ratio of the level in the reporting period to the level in a fixed period of time, indicating the overall rate of development of the phenomenon over a longer period of time. The year-on-year rate of development, in general, refers to the relative rate of development achieved by comparing the level of development in the current period with the level of development in the same period of the previous year. The cyclical rate of development, which generally refers to the ratio of the level of the reporting period to the level of the previous period, indicates the rate of development of the phenomenon from period to period.
Year-on-year and ring ratio, both of which are reflected in the rate of change, but due to the use of the base period of the different connotations of its reflection is completely different; in general, the ring ratio can be compared with the ring ratio, and can not be taken year-on-year comparison with the ring ratio; and for the same place, to consider the reflection of the development trend of the longitudinal direction of time, it is often necessary to year-on-year with the ring ratio to be placed in a comparison together. [1]
Python implementation
The code is as follows:
def month_on_month_ratio(data_list): mid = 0 length = len(data_list) res = [] while mid < length-1: a, b = data_list[mid:mid+2] ((b-a)/a) mid += 1 return res
The above is the content of today's share, this article is just a brief introduction to the Jekyll and Hyde distance and the ring of the Python version of the implementation, I hope to help you, please support me more in the future!