Written bySubratonMay 26, 2017
A beginner’s guide to your favorite DIY web scraping tool
Just over a year ago, we introduced the all new Grepsr along with a beta launch of Chrome extension to fill the gap that Kimono Labs, a widely popular scraping tool, left since it’s closure. Now after a year of iteration on both the UI and UX along with shipping a couple of most requested features, we think many of you will be as impressed with Grepsr for Chrome as our users have been.
But we still have some more work to do. Thanks David for the great feedback!
If you don’t know about Grepsr for Chrome, here’s how it works.
You might have probably found yourself copying and pasting stuff (like contact info, emails, etc.) from the web that you hate to do? That’s the sort of idea behind Grepsr for Chrome.
Grepsr takes an old school concept and brings it into a digital age.
With Grepsr for Chrome, you can select data elements from the website you’re viewing using an intuitive point-and-click toolkit, and turn them into tables, spreadsheets, RSS feeds or JSON APIsin seconds.
People use it to scour the web (and other data sources) for publicly available information on people, products and businesses.
While there are certainly many tools you owe it to yourself to pay up for, you don’t need any expensive or complex software for simply organizing your data ingestion process.
Grepsr for Chrome is one of those multi-purpose tools that helps you to do a lot for your business with very little time and work. And did I mention there’s a free plan?
Just by using Grepsr for Chrome, you’ll probably find that the way you move and consume data will become more streamlined. Everything will be in one place, easy to find and easier to manage.
This means no more manual anything — no writing messy codes, no learning and configuring complex software, and no chasing overseas programmers to fix if anything breaks.
Sounds pretty amazing, right?
Let’s explore how to use Grepsr for Chrome and all of its cool features. I’ll show you how to use it to make your data ingestion process so much better step by step.
How to get started with Grepsr for Chrome
Before we get into some of the more advanced uses of Grepsr’s features, I’ll give you a crash course on how to use Grepsr for Chrome.
If you haven’t installed Grepsr for Chrome yet, here’s how to do it:
- Head to Browser Extensions and click the “Add to Chrome” button on the middle of the page.
- Once you’ve installed the extension, you’ll see this blue ‘g’ logo next to the address bar in your Chrome browser. Think of it as a Swiss knife that you’ll carry with you while you surf the web to convert unstructured data to a structured format.
As a demo, I’ll create a scraper to extract Amazon data.
We’ll drill down to an Amazon product category and then extract all the results including some product related data points. Once we’ve collected this data, we’ll look into how to make the best use of the Grepsr platform and its features.
Extracting Amazon Data
Simply navigate your browser to the product category of your choice on amazon.com. In this example I’m going to scrape the bestsellers from the “Video Games” product category.
Once the results pages are displayed, press the Grepsr icon on your browser and you should see Grepsr for Chrome toolbar on the top of the page like the one below — making the whole thing selectable.
When you make your first selection, Grepsr for Chrome recognizes other elements that are structurally similar to what you clicked and will suggest them to you. In some cases, you might need to click on the second related item for the tool to suggest similar elements.
A great way to make sure you have the correct elements is by looking at the count. For example, I know that the product results page I’m viewing has 20 product items on it, therefore we want to see “20 items selected” on the toolbar.
Now go ahead and name your new item group. In this case, it would be “image”.
Now it’s time to confirm the data. You can see the preview of the actual data your selection would collect under ‘SAMPLE DATA’. We selected the product image earlier, so the sample data is showing the image URL.
If you wish to collect additional data points on the Image element, just click on the down-pointing arrow under ‘EXTRACT’ and you’ll see a list of additional data points.
Make sure you click on the ⨁ icon to add each additional data elements from the list (as shown on the GIF above). Not clicking on the ⨁ icon will replace the primary data point (suggested by the tool) with your current selection.
Once you’re done saving the data field, pick the other ones as you did earlier and tag the elements. In this case I’ll tag:
- # of Reviews
- Seller Name
- Supported Platforms
Once you have tagged those, go ahead and click the ‘Next’ button.
The toolbar will then ask you to define the pagination type, as shown below.
Grepsr for Chrome supports all types of pagination including enhanced type. In this case, there is a numbered pagination which is equivalent to a ‘Next’ pagination link. So, go ahead and select ‘Yes, has a “next” link’ and then tag the numbered pagination at the bottom the page.
That will bring you to the review screen (as shown below) where you’ll see all the elements you’ve selected. Here you can edit the field names and/or add additional data points to pull from the EXTRACT dropdown list like we did before.
If everything seems all right, just click the ‘Continue’ button or you can ‘Go Back’ to extraction mode in case you missed anything.
Once you click ‘Continue’, you’ll be asked if you want to extract additional data fields from the detail pages as well. In the screenshot below, the extractor is asking me to choose from the product detail page or the reviews page.
In this case, we want to go to each product’s page to extract the last two remaining data, i.e. seller’s name and supported platforms. Go ahead and select the “$20 PlayStation Store…” link and click ‘Continue to Detail Page’.
On the product detail page, select and save the data fields like you did earlier and click Export.
As a final step, the extension will ask you if the page you’re viewing is behind a login. In our case it’s not, so just select ‘No’ and click ‘Continue’.
To collect the listings, you’ll need a Grepsr account.
Go ahead and create your account (or log in, if you’re already registered), and you’ll be brought to the project setup screen like the one below.
Once you’re done naming your project and report, click ‘Start Crawling’ and Grepsr will process your data.
And in a few seconds… Voila!
Grepsr App Interface
It usually takes anywhere from 10-15 seconds to a couple of minutes (depending on the queue and the volume) for the data to start streaming in under your report’s Data Preview tab. If you see any issues, you can always click ‘Edit in Chrome’ and go back to the extraction screen.
Once the crawl is complete, go to the API tab to test the endpoint or activate the JSON feeds. But if you’re like me who prefers to download my data in CSV, then head over to the Download tab and click the CSV download URL.
A possible next step here would be to queue up your crawling calendar ahead of time. This is basically instructing Grepsr to look for new and fresh data at the right time so you don’t have to.
For the free plan, scheduling is limited to once per month. So if you want to schedule your crawls daily or weekly, you would need to upgrade your plan.
You can send data to a variety of destinations for analysis to build powerful apps using our secure data connectors. You can sync data to popular apps you already use like Dropbox, Google Drive, Amazon S3, Box and more.
Better yet — you can plug and play Grepsr for Chrome into your app using a self-consuming JSON API and each time you need the data you just call it up. As an example, if I call up the Amazon API again a few days or week later, the data will be different. So you basically “APIfied” Amazon.
The way you can pull data from Grepsr aside from the export is through the Report API. In this case you call the default URL.
Lastly, we’re always there to help. We know web scraping projects are often complicated with various layers of details and requirements, and a software-only solution isn’t always enough. So we built a communication doorway under the Support tab for each of your projects.
Messages are to issue support tickets, discuss requirements, upload files, and to ask any questions about a project — all in one place, not all over the place.