This article describes the method of C# to calculate string similarity. Share it for your reference. The details are as follows:
There are many ways to calculate string similarity, and even the most stupid way can be matched one by one. What we want to talk about here is to use Levinist distance to calculate string similarity.
Levinsit Distance Concept: Assume that the function name is LD
Used to calculate the similarity between two strings. For example, there are two strings A and B. Assuming A is used as the benchmark, the algorithm calculates how many steps it takes to turn B into A through methods such as (replace, delete, add characters).
For example:
A=”abcd”, B=”abc”, then LD(A,B)=1, just insert a character into the B string, it will be completely equal to A
A=”abcd”, B=”abcd”, then LD(A,B)=, because these two goods are exactly the same
A=”abcd”, B=”abdc”, then LD(A,B)= 1, because just swap the positions of “dc” in B, it is equal to A.
A=”fwegwegweg@#2″, B=”dd*&&%^&”, then LD(A,B)= ????, this uncle really doesn’t know, he needs to use the program.
The larger the value calculated by the Levinist distance means more steps, which means the lower the similarity between the two strings.
For example, if you want to make a simple "article copying" judgment function, then using this Levinist distance can completely realize a preliminary method.
Algorithm comments:
1. Suppose that the length of the string str1 is n and the length of str2 is m.
If n = 0, return m and exit; (This is nonsense)
2. If m=0, return n and exit. (This is still nonsense)
3. If none of the above is present, the calculation must be started.
Construct an array d[0..m, 0..n].
Initialize row 0 to 0..n and column 0 to 0..m.
Check each letter of str1 (i=1..n) in turn.
Check each letter of str2 in turn (j=1..m).
If str1[i]=str2[j], then sign=0; (sign is just a mark, no meaning, whether it is equal or not equal to record)
If str1[i]!=str12[j], then sign=1.
Set d[i,j] to the minimum of the following three values:
Add one to the value of the grid immediately above the current grid, that is, d[i-1,j]+1
The value of the grid immediately to the left of the current grid is added one, that is, d[i,j-1]+1
Add sign to the value of the upper left grid of the current grid, that is, d[i-1,j-1]+sign
Repeat the above steps until the loop is over. d[n,m] is the final value
Next is a Levinist distance implementation written in C#.
public class LDMaker//It looks professional in a class. //In fact, it is just taking off your pants and farting. Here we use the Hiwenshite distance algorithm // Used to calculate similarity between strings { char[] str1; char[] str2; public LDMaker(string s1, string s2) { //Replace all numbers as fixed numbers Digital interference is too serious //This varies from person to person. In the matching of Chinese articles, the number is very serious interference. //, so I replaced the numbers in some applications. //The reason is that some articles are actually very similar, but they deliberately add some numbers to it. //Interfere with the execution of this function, allowing the machine to see that the two articles are very different. Generally, it does not need to be done as follows // Steps s1=(s1,@"(\d+)","1"); s2 = (s2, @"(\d+)", "1"); str1 = (); str2 = (); } public int GetLD()//This is the algorithm implementation of Levinsit distance{ try { int m=; int n=; int[,] d = new int[m+1, n+1]; for (int i = 0; i <= m ; i++) d[i, 0] = i; for (int i = 0; i <= n ; i++) d[0, i] = i; for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) { d[i,j] = d[i - 1,j - 1] + (str1[i - 1] == str2[j - 1] ? 0 : 1); //Modify a character d[i,j] = (d[i,j], d[i - 1,j] + 1); // Insert a string d[i,j] = (d[i,j], d[i,j - 1] + 1); //Delete a character } } return d[m, n]; } catch(// An error returns a large value { return 10000; } } }
I hope this article will be helpful to everyone's C# programming.