SoFunction
Updated on 2025-04-14

Theoretical analysis of preventing websites from being collected and ten methods and measures page 2/2

9. The website randomly adopts different templates
Analysis: Because the collector locates the required content based on the web structure, once the template is replaced twice, the collection rules will be invalid, which is good. And this has no effect on search engine crawlers.

Applicable website: Dynamic website, and user experience is not considered.

What will the collector do: There cannot be more than 10 templates for a website, just make one rule for each template, and different templates adopt different collection rules. If there are more than 10 templates, since the target website has tried so hard to replace the templates, it will be convenient for him to withdraw.

10. Use dynamic irregular html tags
Analysis: This is quite perverted. Considering that the effect of html tags containing spaces and not containing spaces is the same, so <div > and <div > are the same for page display, but as the collector tag, there are two different tags. If the number of spaces in the html tag of the second page is random, then
The collection rules are invalid. However, this has little impact on search engine crawlers.

Suitable for websites: All websites that are dynamic and do not want to comply with web design specifications.

What will the collector do: There are still countermeasures. There are still many html cleaners now. First clean the html tags, and then write the collection rules; you should clean the html tags before the collection rules, or you can still get the required data.

Summarize: