Why is Web Crawling important?
Written by Subrat on May 15, 2012
Data lies in the heart of any business, even more if its tech related. With all the open standards of today like RSS feeds or APIs sharing data across systems have become relatively easier.
For example, if you want to read today’s financial news directly from your email inbox, you could simply subscribe to the provider’s (like Google News or BBC) RSS feed. Similarly, your system or application could also use a provider’s API to get upto date stock market prices. Feeds and XML makes sharing data very easy and that has been the whole reason they exist in the first place.
But what about data that is unstructured or does not have RSS feeds for you to consume? How will you go about fetching them? You could always hire people to manually log on and save the info into an Excel sheet – but the process gets tedious and impractical.
Lets take a simple example. You have a shopping site and have 1000 products. You want to make sure your prices are competitive. In order to do that, you will need to monitor your competitors’ sites and their prices for the same products. If there are a lot of products and lot of competitors it is going to be very difficult to do this without some automated process.
This is where Web Crawling comes into picture. There is a good chance you or your business will have need for automated web crawling to gather data which will be processed to make business decisions. Web Crawling technology was made popular by Google for its use in their search. They were the first to see the importance of immense amount of data on the web which was then not crawled and indexed. They capitalized on that – sent out thousands of crawlers to the web and indexed everything they could possibly find!
Lets scale down a bit, and think just about your business. What would web crawling do for you? Here are a few things that come to my mind:
- Gather data for business intelligence
- Market research about the product or service you are offering
- Monitor competitor’s product or solution 24/7
- Gather user behavior data to make your product perform better
- Simply make your product more relevant with more content
- … and many more!
Can you think of a few?