When it comes to keyword search, the first thing that comes to mind is to use some character functions such as indexOf, replace, and at most some regular expressions. Although it is very simple to implement, have the efficiency issues behind this carefully considered? For example, in keyword filtering in forums, the number of keywords to be filtered and the length of text to be detected are not large, so there is not much to pay attention to in this instant process. But if the number of keywords is not just a handful, but there are thousands, and the text to be tested is also a long discussion, the result will no longer be so optimistic. As we all know, every additional keyword is added to the full text search, and the final time will be far beyond the acceptable range.
Since you are considering that kind of extreme keyword search, the usual traversal search is obviously not feasible. Now I use JavaScript, I would be so sorry if I don’t use Hash tables. With unique support for the watch, you might as well take out a small amount of space in exchange for a lot of time.
Let’s take a look at an example first, for example, there are the following keywords: foo1, foo2, bar1, bar2. Since you need to exchange space for time, you can preprocess them before searching. As mentioned earlier, JS is flexible and efficient tables, it is obvious that using tree structure is the most advantageous. Even if you don't understand, it doesn't matter. The final implementation structure is like the following code. It's also very friendly to be familiar with JSON:
var Root =
{
f:
{
o:
{
o:
{
: true,
: true
}
}
},
b:
{
a:
{
r:
{
: true,
: true
}
}
}
};
The structure of these layers is like a tree. Each character is a branch of the tree. When the last character is a leaf, there are no new nodes.
At this time, you should understand that just search for each word in the article down the tree. If you can reach the leaves, it means that the current character is one of the keywords; if you cannot find the corresponding branch in the middle, it is of course not the keywords.
For example, foo1, accesses downward along the Root structure and finally reaches Root['f']['o']['o']['1'], that is, a match is completed. Then skip the length of foo1 and continue to search later.
Therefore, the entire article only needs to be searched once to find out the location of each keyword.
Since the JS hash table performance is very high, the so-called search for branches is very fast. Because of JS's flexibility, the code to achieve this effect is also very short.
In fact, it can be found that the number of keywords does not have much to do with the search time, which only affects the width of the tree. Only the length of the article determines the search time.
Have an extreme test:
Keywords: Complete collection of idioms (19830 items)
Content: Complete Works of Zhu Xian.txt (1659219 words)
Time: 935ms
(CPU for Chrome26/i3-2312)
An article with 1.6 million words matches 20,000 keywords, less than 1 second. It can be seen that making full use of JavaScript’s flexibility can still achieve great potential.
Since you are considering that kind of extreme keyword search, the usual traversal search is obviously not feasible. Now I use JavaScript, I would be so sorry if I don’t use Hash tables. With unique support for the watch, you might as well take out a small amount of space in exchange for a lot of time.
Let’s take a look at an example first, for example, there are the following keywords: foo1, foo2, bar1, bar2. Since you need to exchange space for time, you can preprocess them before searching. As mentioned earlier, JS is flexible and efficient tables, it is obvious that using tree structure is the most advantageous. Even if you don't understand, it doesn't matter. The final implementation structure is like the following code. It's also very friendly to be familiar with JSON:
Copy the codeThe code is as follows:
var Root =
{
f:
{
o:
{
o:
{
: true,
: true
}
}
},
b:
{
a:
{
r:
{
: true,
: true
}
}
}
};
The structure of these layers is like a tree. Each character is a branch of the tree. When the last character is a leaf, there are no new nodes.
At this time, you should understand that just search for each word in the article down the tree. If you can reach the leaves, it means that the current character is one of the keywords; if you cannot find the corresponding branch in the middle, it is of course not the keywords.
For example, foo1, accesses downward along the Root structure and finally reaches Root['f']['o']['o']['1'], that is, a match is completed. Then skip the length of foo1 and continue to search later.
Therefore, the entire article only needs to be searched once to find out the location of each keyword.
Since the JS hash table performance is very high, the so-called search for branches is very fast. Because of JS's flexibility, the code to achieve this effect is also very short.
In fact, it can be found that the number of keywords does not have much to do with the search time, which only affects the width of the tree. Only the length of the article determines the search time.
Have an extreme test:
Keywords: Complete collection of idioms (19830 items)
Content: Complete Works of Zhu Xian.txt (1659219 words)
Time: 935ms
(CPU for Chrome26/i3-2312)
An article with 1.6 million words matches 20,000 keywords, less than 1 second. It can be seen that making full use of JavaScript’s flexibility can still achieve great potential.