SoFunction
Updated on 2024-10-30

Python Implementation of Jaccard Distance as well as Ring Algorithm Explained

preamble

NLP-string similarity computation, set similarity metrics

Tip: Below is the body of this post, with the following examples for reference

What is the Jaccard distance?

Jaccard Distance (Jaccard Distance) is a measure of the dissimilarity of two sets, which is the complement of the Jaccard similarity coefficient, defined as 1 minus the Jaccard similarity coefficient. And Jaccard similarity coefficient (Jaccard similarity coefficient), also known as Jaccard Index (Jaccard Index), is a measure of the similarity of two sets.

define

The Jaccard similarity index is used to measure the similarity between two sets, which is defined as the number of elements in the intersection of two sets divided by the number of elements in the concurrent set.

在这里插入图片描述

The Jaccard distance is used to measure the dissimilarity between two sets, which is the complement of Jaccard's similarity coefficient and is defined as 1 minus the Jaccard similarity coefficient.

在这里插入图片描述

Python implementation

The code is as follows:

# -*- encoding:utf-8 -*-
import jieba
def Jaccard(model, reference):  # terms_reference is the source sentence, terms_model is the candidate sentence
    terms_reference = (reference)  # Default precision mode
    terms_model = (model)
    grams_reference = set(terms_reference)  # De-emphasize; change to list if not needed
    grams_model = set(terms_model)
    temp = 0
    for i in grams_reference:
        if i in grams_model:
            temp = temp + 1
    fenmu = len(grams_model) + len(grams_reference) - temp  # And the concatenation
    try:
        jaccard_coefficient = float(temp / fenmu)  # Intersection
    except ZeroDivisionError:
        print(model, reference)
        return 0
    else:
        return jaccard_coefficient

What's the ring?

The rate of development of the ring is the ratio of the level of the reporting period to the level of the previous period, indicating the rate of development of the phenomenon from period to period. If we calculate the comparison between each month of the year and the previous month, i.e. February compared to January, March compared to February, April compared to March ...... December compared to November, it shows the degree of development month by month. If you analyze the development trend of certain economic phenomena during the fight against SARS, the chain ratio is more illustrative than the year-on-year ratio.

Anyone who has studied statistics or economics knows that statistical indicators can be categorized into aggregate, relative and average indicators according to their specific content, actual role and form of expression. Due to the different base period, the development speed can be divided into year-on-year development speed, chain development speed and fixed base development speed. To put it simply, year-on-year, chain-on-chain and base-on-base ratios can all be expressed as percentages or multiples.
The base rate of development, also referred to as the overall rate, generally refers to the ratio of the level in the reporting period to the level in a fixed period of time, indicating the overall rate of development of the phenomenon over a longer period of time. The year-on-year rate of development, in general, refers to the relative rate of development achieved by comparing the level of development in the current period with the level of development in the same period of the previous year. The cyclical rate of development, which generally refers to the ratio of the level of the reporting period to the level of the previous period, indicates the rate of development of the phenomenon from period to period.
Year-on-year and ring ratio, both of which are reflected in the rate of change, but due to the use of the base period of the different connotations of its reflection is completely different; in general, the ring ratio can be compared with the ring ratio, and can not be taken year-on-year comparison with the ring ratio; and for the same place, to consider the reflection of the development trend of the longitudinal direction of time, it is often necessary to year-on-year with the ring ratio to be placed in a comparison together. [1]

Python implementation

The code is as follows:

def month_on_month_ratio(data_list):
    mid = 0
    length = len(data_list)
    res = []
    while mid < length-1:
        a, b = data_list[mid:mid+2]
        ((b-a)/a)
        mid += 1
    return res

The above is the content of today's share, this article is just a brief introduction to the Jekyll and Hyde distance and the ring of the Python version of the implementation, I hope to help you, please support me more in the future!