Saturday 27 April 2013

What's the difference between Data Mining and Screen Scraping?

Data mining isn’t screen-scraping: They’re actually two almost completely different concepts.

In a nutshell, screen-scraping allows you to get information, where data mining allows you to analyze information.

The term “screen-scraping” comes from the old mainframe terminal days where people worked on computers with green and black screens containing only text. Screen-scraping was used to extract characters from the screens so that they could be analyzed. Today, "screen-scraping" most commonly refers to extracting information from web sites. That is, computer programs can “crawl” or “spider” through web sites, pulling out data. People often do this to build things like comparison shopping engines, archive web pages, or simply download text to a spreadsheet so that it can be filtered and analyzed.

Data mining, on the other hand, is defined in Wikipedia as the “practice of automatically searching large stores of data for patterns.” In other words, you already have the data, and you’re now analyzing it to learn useful things about it. Data mining often involves complex algorithms based on statistical methods. It has nothing to do with how you got the data in the first place.

The difficulty is that people who don’t know the term “screen-scraping” will try Googling for anything that resembles it. We include a number of these terms on our web site to help such folks. For example, we created pages entitled Text Data Mining, Automated Data Collection, Web Site Data Extraction, and even Web Site Ripper (I suppose “scraping” is sort of like “ripping” :) ). So it presents a bit of a problem–we don’t necessarily want to perpetuate a misconception (i.e., screen-scraping = data mining), but we also have to use terminology that people will actually use.

Source: http://it.toolbox.com/wiki/index.php/What%27s_the_difference_between_Data_Mining_and_Screen_Scraping%3F

Note:

Delta Ray is experienced web scraping consultant and writes articles on Hotelscombined Data Scraping, Hotelclub Data Scraping, Amazon Product Scraping, Linkedin Email Scraping, Screen Scraping Services, Yelp Review Scraping and yellowpages data scraping etc.

No comments:

Post a Comment