Theoretical analysis of preventing websites from being collected and ten methods and countermeasures

7. Anti-theft link measures
Analysis: Asp and php can read the HTTP_REFERER attribute of the request to determine whether the request comes from this website, thereby limiting the collector, and also restricting search engine crawlers, seriously affecting the search engine's inclusion of some anti-theft link content on the website.

Applicable websites: Websites included in search engines are not considered

What will the collector do: It's not difficult to disguise HTTP_REFERER.

8. Full flash, picture or pdf to present website content
Analysis: It has poor support for search engine crawlers and collectors, and many people who know a little SEO know this.

Applicable websites: Media design and do not care about websites included in search engines

What will the collector do: No more picking, leave

9. The website randomly adopts different templates
Analysis: Because the collector locates the required content based on the web structure, once the template is replaced twice, the collection rules will be invalid, which is good. And this has no effect on search engine crawlers.

Applicable website: Dynamic website, and user experience is not considered.

What will the collector do: There cannot be more than 10 templates for a website, just make one rule for each template, and different templates adopt different collection rules. If there are more than 10 templates, since the target website has tried so hard to replace the templates, it will be convenient for him to withdraw.

10. Use dynamic irregular html tags
Analysis: This is quite perverted. Considering that the effect of html tags containing spaces and not containing spaces is the same, so <div > and <div > are the same for page display, but as the collector tag, there are two different tags. If the number of spaces in the html tag of the second page is random, then
The collection rules are invalid. However, this has little impact on search engine crawlers.

Suitable for websites: All websites that are dynamic and do not want to comply with web design specifications.

What will the collector do: There are still countermeasures. There are still many html cleaners now. First clean the html tags, and then write the collection rules; you should clean the html tags before the collection rules, or you can still get the required data.

Summarize:
Once you have to search engine crawlers and collectors at the same time, this is a helpless thing, because the first step of search engines is to collect the content of the target web page, which is the same as the principle of collectors. Therefore, many methods to prevent collection also hinder search engines from collecting websites. Helpless, right? Although the above 10 suggestions cannot be 100% resistant to collection, several methods have been applied together and rejected a large number of collectors.