*** Welcome to piglix ***

Scraper sites


A scraper site is a website that copies content from other websites using web scraping. The content is then mirrored with the goal of creating revenue, usually through advertising and sometimes by selling user data. Scraper sites come in various forms, and range from spammy content sites, to price aggregation and shopping sites, and also web search engines such as Yahoo and online maps such as Google Maps.

Search engines such as Google could be considered a type of scraper site. Search engines gather content from other websites, save it in their own databases, index it and present the scraped content to their search engine's own users. The majority of content scraped by search engines is copyrighted.

Some scraper sites are created to make money by using advertising programs. In such case, they are called Made for AdSense sites or MFA. This derogatory term refers to websites that have no redeeming value except to lure visitors to the website for the sole purpose of clicking on advertisements.

Made for AdSense sites are considered search engine spam that dilute the search results with less-than-satisfactory search results. The scraped content is redundant to that which would be shown by the search engine under normal circumstances, had no MFA website been found in the listings.

Some scraper sites link to other sites to improve their search engine ranking through a private blog network. Prior to Google's update to its search algorithm known as Panda, a type of scraper site known as an auto blog was quite common among black hat marketers who used a method known as spamdexing.

Depending upon the objective of a scraper, the methods in which websites are targeted differ. For example, sites with large amounts of content such as airlines, consumer electronics, department stores, etc. might be routinely targeted by their competition just to stay abreast of pricing information.

Another type of scraper will pull snippets and text from websites that rank high for keywords they have targeted. This way they hope to rank highly in the search engine results pages (SERPs), piggybacking on the original page's page rank. RSS feeds are vulnerable to scrapers.


...
Wikipedia

...