Detailed explanation of the method of implementing fuzzy search for big data before and after JavaScript

1. Background

In some specific scenarios, the server may return a large amount of our data and hope to process fuzzy queries on the front end. We can refer to the solution, if there is a better implementation solution, you can directly add it.

2. Solution

1、indexedDB

Currently mainstream front-end storage solutions:

characteristic	cookie	localStorage	sessionStorage	indexedDB
Data life cycle	Generally generated by the server, the expiration time can be set; the front-end can also be generated using components such as js-cookies.	Unless it is cleaned, it will always exist; the browser is closed and will be saved locally, but it does not support cross-browser	Cleaning and refreshing when the page is closed still exists, cross-page interaction is not supported	Limit to 50% of available disk space.
Data storage size	4K	5M	5M	No size limit
Communication with the server	It will be carried in the requested header every time, which will affect the request performance; at the same time, since it is carried in the request, security problems are also prone to occur.	Not participating	Not participating	Not participating
Features	String key-value pairs store data locally	String key-value pairs store data locally	String key-value pairs store data locally	It can store a large amount of data, provide interfaces for querying, and create indexes

IndexedDB is a non-relational database, so we can also search for query. The search speed may not be the optimal solution, but we need to consider business implementation. For example: we have an interface, which does not update the interface data so frequently and the amount of data is very large, and the time to call the interface may be long, and searching for a certain item or a certain starting point may be better.

function init() {
  // Open IndexedDB database  const openDB = ("myDatabase", 1);
  // It is called only when the new database and database version number changes   = function (event) {
    const db = ;
    // Create object storage space    const objectStore = ("myObjectStore", {
      keyPath: "code",
    });
    // Add index    // ("code", "code");
    ("pCode", "pCode", { unique: false });
    ("name", "name", { unique: false });
  };

   = async function (event) {
    const db = ;
    const dataArray = data;
    // Get transaction object    const transaction = (["myObjectStore"], "readwrite");
    // Get object storage space    const objectStore = ("myObjectStore");

    // Add objects to object storage    (function (data) {
      const addRequest = (data);

       = function () {
        ("Data has been successfully added to object storage");
      };

       = function (error) {
        ("An error occurred while adding data", error);
      };
    });
  };
}

function search() {
   = [];
  // Open the database  const openRequest = ("myDatabase", 1);
  // Open the database successfully callback   = function (event) {
    // Database object    const db = ;
    // Open read-only transaction and get object storage readwrite read-write read-only read-only    const transaction = (["myObjectStore"], "readonly");
    // Get a reference to myObjectStore object storage space    const objectStore = ("myObjectStore");
    // Get name index    const index = ("name");
    // Open cursor    const request = (); // All data    // const request = (searchKey); // Accurate matching    // const request = (
    //   (searchKey, searchKey + "\uffff", false, false)
    // ); // Post blur    const startTime = new Date().getTime();
     = function (event) {
      const cursor = ;
      if (cursor) {
        const name = ;
        // Determine whether the name contains search keywords. If it is blurred afterwards, just take the name directly.        if ((name)) {
          ();
        }
        ();
      } else {
        const endTime = new Date().getTime();
        (`Time-consuming search：${endTime - startTime}`);
        (result);
      }
    };

     = function () {
      ("An error occurred while performing search");
    };
  };
  // The callback failed to open the database   = function () {
    ("An error occurred while opening the database");
  };
}

2. Trie tree (Dictionary tree)

The principle is to split all the characters and split all the following characters into your own child node. In fact, it is more suitable for post-fuzzy matching. If it is required for front-and-back fuzzy matching, all traversals are required, and the performance is not the best.

class TrieNode {
  constructor() {
     = new Map(); // Children used to store the current node     = false; // Indicates whether the current node is the end of a word     = []; // Used to store the value associated with the current node  }
}

class Trie {
  constructor() {
     = new TrieNode();
  }
  // If the current character does not exist in the child node of the current node, create a new child node.  // and add it to the child node map.  Then, update the current node to a new child node and continue to traverse the next character.  // After traversing all characters, set the isEndOfWord property of the current node to true, indicating that the node is the end of a word.  // At the same time, add the value associated with the word to the values array of the current node.  insert(word, value) {
    let node = ;
    for (let i = 0; i &lt; ; i++) {
      const char = word[i];

      if (!(char)) {
        (char, new TrieNode());
      }
      node = (char);
    }
     = true;
    (value);
  }

  // Recursively collects the associated values of all words in a subtree rooted by a given node.  If the current node is the end of a word  // Add all associated values of the node to the result set.  Then, for each child node of the current node,  // Recursively call collectValues method.  collectValues(node, results) {
    if () {
      for (const value of ) {
        (value);
      }
    }
    for (const childNode of ()) {
      (childNode, results);
    }
  }

  // Fuzzy searches for words that match the given keyword and returns all associated values that match.  This method first creates an empty  // Set object results, used to store results.  Then, call the collectSuffixValues method,  // Recursively search for words matching the keyword from the root node and add the matching associated value to the results.  // Finally, convert results to array and return.  fuzzySearch(keyword) {
    const results = new Set();
    (, keyword, results);
    return (results);
  }

  // Recursively search for words matching the given keyword suffix and add the matching associated value to the result set.  // If the length of the keyword is 0, it means that the end of the keyword has been matched. At this time, the collectValues method is called.  // Add all associated values from the current node and its subtrees to the result set.  // Otherwise, take the first character char of the keyword and get the child node of the current node.  // If childNode exists, collectSuffixValues method is called recursively.  // Pass the remainder of the keyword and childNode as parameters.  // Then, recursively call the collectSuffixValues method for each child node of the current node.  // Pass keywords and result sets as parameters  collectSuffixValues(node, keyword, results) {
    if ( === 0) {
      (node, results);
    } else {
      const char = keyword[0];
      const childNode = (char);
      if (childNode) {
        (childNode, (1), results);
      }
    }
    for (const childNode of ()) {
      (childNode, keyword, results);
    }
  }
}

async function search() {
  // Example usage  const trie = new Trie();
  const dataArray = await getJSON();
  ((item) =&gt; {
    (, item);
  });
  const startTime = new Date().getTime();
  const results = ("Headquarters");
  const endTime = new Date().getTime();
  (`Time-consuming search：${endTime - startTime}`);
  (trie, "results", results);
}
search();

3. Fuse (third-party component library)

The Bitap algorithm isOne of the core algorithms used to implement fuzzy search is to use bit operations to calculate the similarity between the pattern string and the target string. Specifically, the Bitap algorithm first converts the pattern string into a binary mask, and uses bit operations to calculate the similarity between the pattern string and the target string, and then uses some heuristic strategies to improve the accuracy and efficiency of the algorithm.Other explanations

function search() {
  const startTime = new Date().getTime();
  const fuseOptions = {
    // isCaseSensitive: false, // Indicates whether the comparison should be case sensitive.    // includeScore: true, // Whether the score should be included in the result set.  A score of 0 means an exact match, and a score of 1 means an entire mismatch.    // shouldSort: true, // Whether to sort the result list by score.    // includeMatches: true, // Whether the match should be included in the result set.  When true, each record in the result set will contain an index of the matching characters.  Therefore, these can be used for highlighting purposes.    // findAllMatches: true, // When true, the match function will continue to the end of the search pattern, even if a perfect match has been found in the string.    // minMatchCharLength: 1, // Return only matches with length exceeding this value.  (For example, if you want to ignore a single character match in the result, set it to 2).    // location: 0, // Determines the approximate location of the pattern expected to be found in the text.    // threshold: 0.6, // When will the matching algorithm be given up?  Threshold 0.0 requires a perfect match (letter and position), and threshold 1.0 can match anything    // distance: 100, // Determines the proximity of the match to the fuzzy position (specified by location).  The exact letter match of the characters away from the blurred position will be scored as a complete mismatch.  A distanceof0 requires matching the exact specified location.  The distance 1000 requires a perfect match within 800 characters found using of.  locationthreshold0.8    // ignoreLocation: true, // true, search will ignore location and distance, so the location where the pattern appears in the string does not matter.    // ignoreFieldNorm: true, //true, the calculation of the correlation score (for sorting) will ignore the field length norm.    // fieldNormWeight: 1, // Determine the degree of influence of the field length norm on the score.  The value 0 is equivalent to ignoring the field length norm.  A value of 0.5 will greatly reduce the impact of the field length norm, while a value of 2.0 will greatly increase the impact of the field length norm.    keys: ["name"],
  };
  const fuse = new Fuse(data, fuseOptions);
  const searchPattern = "Human Resources Department, Meat Headquarters";
  result = (searchPattern);
  const endTime = new Date().getTime();
  (`Time-consuming search：${endTime - startTime}`);
  (result);
}

Configuration Items	describe	default value	illustrate
keys	List of keys to search for.		It supports nested paths, weighted searches, searches in strings and object arrays.
isCaseSensitive	Whether it is case sensitive	false
includeScore	Should scores be included in the result set	false	A score of 0 means an exact match, and a score of 1 means an entire mismatch.
shouldSort	Whether to sort the result list by score.	true
includeMatches	Whether the match should be included in the result set.
findAllMatches	Whether the matches should all be in the result set.	false	When true, the match function continues to the end of the search pattern, even if a perfect match has been found in the string.
minMatchCharLength	Returns only matches whose length exceeds this value.	1	If you want to ignore single character matches in the result, set it to 2
location	Determines the approximate location of the pattern you expect to find in the text.	0
threshold	When will the matching algorithm be abandoned?	0.6	Threshold 0.0 requires a perfect match (letter and position), and threshold 1.0 can match anything
distance	Determines the proximity of the match to the fuzzy position (specified by location)	100	The exact letter match of the characters away from the blurred position will be scored as a complete mismatch. A distanceof0 requires matching the exact specified location. The distance 1000 requires a perfect match within 800 characters found using of. locationthreshold0.8
ignoreLocation	Search will ignore location and distance	false
ignoreFieldNorm	Calculation of correlation score (for sorting) ignores field length norms	false
fieldNormWeight	Determine the degree of influence of the field length norm on the score.	1	The value 0 is equivalent to ignoring the field length norm. A value of 0.5 will greatly reduce the impact of the field length norm, while a value of 2.0 will greatly increase the impact of the field length norm.

4. Prefix (Trie) tree + intersection realizes fuzzy search

The second solution we mentioned before is more suitable for prefix query, but if we loop the entire tree, we can implement front-and-back fuzzy query, but it is not the best solution. If it is front-and-back fuzzy query, the following solution is more recommended.

We used to disassemble the first text of the array. In fact, we can change a way of thinking by disassembling all the words, and then there is a field in each tree node to store all unique indexes containing this word.

let array = [{id:1,name:"Headquarters"},{id:2,name:"total"},{id:3,name:"department"}]

let result = {"total":{ids:[1,2]},"department":{ids:[1,3]}}

When we enter the word "total", the result should be 1, 2;

When inputting the unit, the result should be 1, 3;

When entering the headquarters, the result should be 1;

In fact, when entering the headquarters, it is to take the intersection of two words, and they contain 1 together.

The same is true if you are searching for multiple characters. Take the index of each character and find their intersection, which is their search results. It can also be implemented.Department, GeneralThis kind of situation is included.

The above is a detailed explanation of the method of front-end JavaScript to implement fuzzy search before and after big data. For more information about fuzzy search in JavaScript, please pay attention to my other related articles!