Web Scraping

Frequently Asked Questions

Let us answer the most asked questions about Grepsr

Web Scraping

Do I need to install anything on to my computer to get my data via Grepsr?

Only our Google Chrome extension, Grepsr for Chrome. And that too only if you want to do everything (data fields tagging, project setup, etc.) yourself.
The Grepsr app platform, on the other hand, is entirely web-based, and works on any OS and browser — although we recommend Google Chrome or Mozilla Firefox. So the only thing you’ll need is a web browser and a working internet connection.

How much data can I collect?

We have no hard limits, but for a non-enterprise plan, the typical limit is 50,000 records per month. If you have a comprehensive crawl requirement which contain millions of records, contact us for a discounted bulk price.

How will I receive my data once it’s scraped?

An email will be sent to you as soon as the crawl run is complete and your file(s) is exported.
Alternatively, you can manually download your data in your prefered format from the Download tab.
Furthermore, you can set up automated data delivery to sync your Grepsr files to your preferred storage locations (Dropbox, Google Drive, Amazon S3, Box, FTP) by authorizing the extractor with the respective filesystems on the Data Delivery tab. More information here.

What file formats is the data available in?

CSV, XLSX, JSON, XML and YAML. Users can customize the format(s) on the Data Delivery tab.

What is the maximum frequency you can scrape data at?

For our Concierge service, once a month is the longest frequency we offer. Any interval longer than a month would incur a new setup fee since the crawl will be treated as a new project.
For Chrome projects, this will only apply if/when your project goes beyond your plan’s monthly limits.

Are you able to extract data from sites that require a login?

Yes, but we require the login credentials from our clients. However, we may not be able to help if the site has a captcha or blocks automated login.

Do you have any referral program?

No, not at the moment. But we’ll definitely post a policy change announcement on our blog page if anything changes on this regard. So stay tuned!

Grepsr for Chrome isn’t scraping the data fields I need accurately. What should I do?

Grepsr for Chrome is a simple tool that’s built to work on simple and well-structured single page websites that don’t present a multi-layer structure. That’s why, for complex or poorly-designed websites and multi-level scraping, we’ve built Grepsr Concierge.
On Concierge, users tell us their project requirements, and they don’t have to be involved in any of the setup or monitoring processes, while we take care of the end-to-end data delivery!
Find out more about Grepsr Concierge here.

[Grepsr Concierge] Why do I suddenly see no data even though the crawl has already completed?

Web scraping, or data scraping as it is commonly referred to, is a popular data extraction technique employed by multiple organisations to gather information from various sources. A web crawler or scraper is an automated software script that scours webpages to gather relevant information for the company. Think of it as a stealth ninja that is going around a neighbourhood gathering clues to a case.
This ninja faces roadblocks in the form of website security, especially in the case of those web pages that churn out multiple data requests. Our data crawlers face the following challenges from time to time:

  • Websites may change algorithms to ban crawlers,
  • Websites may be down or facing issues,
  • Content may be location-specific, etc.
If you’re facing issues while web scraping, please let us know and we’ll get it sorted as soon as possible. Who knows, we may be able to add a permanent fix for an issue thanks to your error report. Everybody wins!

[Grepsr Concierge] How long does an extraction take to complete?

There isn’t an exact timeframe that we can put on any extraction as it depends on the requirements of the project. A project that involves data gathering from fewer sources could typically be finished a lot faster than one that requires scraping more websites. Other factors include the complexity (of a website and crawler setup), use of proxy (to bypass location restrictions), etc.
At Grepsr, once we have a clear understanding of the project requirements, we can get a sample ready within a couple of days and mention the estimated timeline of delivery before we start. After that, data is delivered as per the specified frequency.

[Grepsr Concierge] Can you scrape images as files?

Yes! Our web crawlers can scrape images in the form of either URLs or files. Scraping as files requires extra effort and, as a result, will incur an additional charge. The image files will be zipped and emailed/synced with the rest of your data.

[Grepsr Concierge] Can we see a proof of concept before we commit to a payment plan?

In order to provide a PoC or sample data, we need to set up our crawlers as if for a fully-fledged scraping project. Because of the time and effort this entails, we only take a project on once payment is received.
That said, we do provide a PoC before moving on to the full crawl run. If you’re not satisfied with the quality of that sample, then we can make the requested modifications or even offer a full refund.

[Grepsr Concierge] Once my project is set up, can I start a crawl myself?

Once the project has been initiated, you can set up the crawl to suit your project specifications in the following ways:

  • Manual Runs: Hover over the settings icon (the cog icon next to search) on your project’s Data Preview tab and click an option labelled “Report Re-run”. Clicking this will start a new crawl right away.
  • Scheduled Runs: If you want to automate your crawl runs for specific intervals, you can customize a schedule from the Schedule Crawl tab on your report. More information on the Scheduler here.

[Grepsr Concierge] I want my colleagues to also have access to some of the projects. Is this possible?

Yes! Grepsr is an out-an-out team tool. You can always invite team members and collaborate on Grepsr, and also decide the degree of involvement you are comfortable with on a per-project basis.

  • If you only want them to access specific projects, just head over to the Team tab on the respective project page and invite them via email or share the Team Invite Link.
  • If you want to grant access to all projects, you can do so by visiting the Team page under your account (either via the Team link on the top menu or Your Name > Account > Team) and following the same steps as mentioned above.