How much data can I collect?
There is no limit to how much data you can collect. Data projects are priced based on scale and complexity.
How does the data subscription work and how is it priced?
Customers with recurring data needs are priced monthly in arrears. There is an initial one-time set up fee. Customers are either billed a flat monthly fee or based on metered usage. The latter is reserved for high volume projects. Other billable fees for consulting and technical support are agreed in advance before they’re added to your invoice.
Do you have any referral program?
Yes, we do have a Referral Partner Program where our partners are rewarded handsomely for providing us qualified leads.
For more information about this and our other partnership models, please visit our partnership page.
How long does it take to extract data once the requirements are clear?
It’s hard to put an exact timeframe on our lead time as it strictly depends on the data requirements such as number of sources and complexity. Our customers value us for quick turnaround and, on average, a typical project is completed in days not weeks.
We set a clear expectation of timeline beforehand and aim to get the initial sample ready within a couple of days.
Is web scraping legal?
Scraping publicly available data is perfectly legal so long 1) it does not violate the source site’s terms of service, 2) data is not copyrighted, and 3) data does not contain Personally Identifiable Information (or PII). Fair to say, this is a contested and misunderstood topic. You can read more about the legalities of web scraping in our blog here.
Can you scrape images as files?
Yes! Our web crawlers can scrape images in the form of either URLs or files. Scraping as files requires extra effort and, as a result, will incur an additional charge. The image files will be zipped and emailed/synced with the rest of your data.
Can I get the raw HTML along with structured data?
Certainly! We can pull the underlying HTML along with structured data. We can also have the HTML output automatically deposited in your cloud storage platform.
How does Grepsr ensure quality data?
We’ve built several quality controls – both platform-based and using humans in the loop — to meet quality standards.
Platform-based controls
- Notification triggers in the crawler that executes during run-time to identify chokes, failures during crawler execution. System monitors to arrest system-wide errors
- Define data schema to set acceptable formats. Anomaly detection using historical data
- Quality and operational dashboards to monitor project health. Custom reporting for key accounts to analyze key metrics
Quality experts
- Validate initial setup with customer consultation to ensure quality compliance
- Manually QA a randomized sample set per SLA terms
- Proactive communication and resolution (<24 hour unless wholesale changes on source)
Can we see a proof of concept before we commit to a payment plan?
In order to pull data, we need to set up crawlers no differently than how we would in a full-fledged project.Because of the time and effort this entails, we only take on a project once payment is received.
That said, for every project, we provide a sample dataset before moving on to full production. This ensures data is per scope and quality criteria are met. If you’re not satisfied with the sample, then we are happy to make modifications or even offer a full refund.
Why do I suddenly see no data even though the crawl has already completed?
A crawler may not return any data either due to 1) technical failures on our end, 2) roadblocks encountered in transit such as captcha, IP bans, and 3) due to changes in the source system.
Our advanced data infrastructure allows work around complex security controls. Our technology platform has system and data quality monitoring capabilities built in to proactively handle outages, failures and data quality issues.
Can I schedule crawlers to automate data collection? Or run them manually when needed?
Absolutely! You can run manually crawlers on an ad-hoc basis or create recurring schedules to automate your crawl runs. Scheduled runs work like clockwork simplifying your data acquisition workflow.
Read more about scheduling crawlers in our platform documentation here.
How will I receive my data once it’s scraped?
For large scale data collection, we automatically deliver the output to your preferred cloud storage location. We support Amazon S3, Google Cloud, Azure Cloud, Dropbox, Box, FTP and more. You must authorize the respective filesystem before we can store the output.
Output can also be manually exported from the platform. Learn more about how you can integrate with Grepsr in our platform documentation here.
What file formats is the data available in?
We support common formats such as CSV, XLSX, JSON, XML and YAML. Contact us if you need a custom format that is not supported out-of-the-box.
Can I add my colleagues to work on my data projects?
We support common formats such as CSV, XLSX, JSON, XML and YAML. Contact us if you need a custom format that is not supported out-of-the-box.
Do you still have a question?
You can always contact us. We'll try and get back to you as soon as possible!