Introducing Grepsr’s Data Quality Report
Written by Asmit Joshi on June 18, 2019
Quality assured data to help you make the best business decisions
Data acquisition is an expensive and time-consuming process, if you try to manage every aspect on your own.
You may get by this way when you are just starting your business, but over time you’ll need a more efficient solution that’s cost-effective, and able to handle exponentially larger volumes of data and automate the extraction as well as delivery.
Your Data Quality Matters, More Than You Think
An important factor to consider is the quality of the data that enters your system. With data now the backbone of any modern industry, it’s imperative that businesses have access to the most reliable data. Only the highest quality data ensures you take the best data-driven decisions.
“Good quality data empowers business insights and starts new business models in every industry. It allows enterprises to generate revenue by trading data as a valuable asset.”– Mei Yang Selvage, research director, Gartner
In financial terms, this importance can be seen in figures from research firm Gartner’s survey, which estimated that businesses lost around $15 million on average in 2017 due to poor data quality. Their director adds, “Not only are organizations taking a financial hit, poor data quality practices undermine digital initiatives, weaken their competitive standing and sow customer distrust.”
Grepsr — the Perfect Solution
If you have Grepsr as your data provider, then you already know that you’re always getting the most accurate, up-to-date and complete data rapidly delivered to your system. Within a short space of time, we’ve become one of the leading DaaS solutions for thousands of individuals and businesses, big and small.
To add to the top-of-the-line service that we already provide, we’re now introducing a major new addition to enhance your Grepsr experience — the Data Quality Report.
Users will now be able to assess the quality of the data being collected and easily identify any issues. Since the report makes everything transparent, users will also be able to view the completeness of crawl runs and compare latest or average fill rates to historical instances of the same report runs.
This QA report, titled ‘Data Quality‘, is placed between the Activity and Team tabs, and is available to all users on both the Grepsr Concierge and Grepsr for Chrome apps.
The first thing users will notice is an interactive graph that shows the number of records and requests, along with the report’s run time (in minutes). The exact statistics can be viewed on mouse-over. The time range can be set via the dropdown at the top right — current options are ‘Last 10 Runs‘, ‘Last 7 Days‘, ‘This Month‘ and ‘Last 30 Days‘.
For more accuracy, the report shows record and request counts, crawl run times and fill rates for the report.
Individual data field fill rates (%) and sizes (character-lengths) for each data sheet are also listed. Both of these are compared to their respective average values and the difference is colour-coded to indicate whether they’re up or down — green means no change or increase compared to the average value, while red means the latest value has decreased.
Users can toggle the view to show these details either on a single row or as grids.
Here’s what some of the key terms actually mean:
- Record Count: Each row of data in a sheet is a record. The record count is the total number of rows the crawler is able to extract.
- Request Count: To fetch records, the crawler contacts the website in the form of requests. After each request, a certain amount of data, or record(s), is collected.
- Fill Rate (data field): The percentage of records for which data was available and extracted for a particular data field (column).
- Fill Rate (report): The percentage of the overall data field fill rates in the report.
- Size (data field): The average character-length of the data in each row of the data field.
- Time Taken:
- Total: The sum of all crawler run-times of the report.
- Average: The average time it takes a crawl run to complete.
- Latest: The run-time of the most recent crawl run.
Possible Inconsistency Scenarios
Occasionally, there may be instances of discrepancies in the acquired data compared to previous runs. This is almost always caused by changes made on the websites, beyond Grepsr’s control. Our Data Quality Report automatically identifies such instances, and highlights the data fields that require immediate attention in red.
Let’s take a look at some of these scenarios.
Drastically Reduced Data Field Fill Rate
There could be an instance when our crawler fails to extract data that it would normally fetch without any hassle. In such cases, the crawler searches for that particular field, and having failed to find it, leaves the field value blank (or null) and moves on to the next data field.
This mainly happens when a website changes its layout, meaning the data field is moved to a different location within the website, so the crawler isn’t able to find it where it normally would.
Sudden Change in Field Size
Another possible scenario is when a particular data field’s size (character length) is suddenly much lower than average despite the fill rate remaining relatively constant.
For example, say you’re fetching product descriptions off a retailer and you notice that the data size on the latest run for this field drops to 10-15 characters from the average 1000+ from previous runs.
Increase Isn’t Always Good
Say you have a field containing dates in the format MM-DD-YYYY (10 characters). Then in your latest run, you notice that the character length has now increased to 11. Ideally, this field should remain constant whatever the fill rate. So even if the data size in this example grew by just a single character, this isn’t a good sign.
A similar example is shown on the screenshot below:
Grepsr’s robust engineering ensures that our QA team is alerted as soon as such issues arise. So you can rest assured that we have it covered, and we’ll get things back up and running in next to no time.
An Asset to Business Growth
As mentioned earlier, the quality of the data entering the system is key to business growth.
Here’s how the Grepsr app’s latest feature aids users and businesses:
- By providing key data points and metrics, users can get an accurate understanding of their data pipeline’s health.
- Since the report also gives a historical overview, users are able to quickly identify any irregularities in the data before it is integrated into their system.
- By analyzing fill rates of critical data fields (names and addresses of leads, for example), users can quickly assess the situation and apply remedial actions (restructure parameters, restart crawl runs, notify Grepsr experts, etc.).
Here at Grepsr we’re as committed as ever to offer the best products and services to our customers. We’re constantly testing and introducing new features, so look forward to more exciting features and updates in the future.
Not a Grepsr user yet? It’s easy to sign up and get started!