Its been a busy year for all of us at Grepsr! We may not have been active with keeping everyone up-to date with whats happening, but we’ve silently been working on a few things behind the scenes :-) . We are a bootstrapped startup with a small but efficient team. Sometimes we get so consumed with daily operations at Grepsr that we’ve not had the time to update our blog. Bad habit, I know!

Things will change as we grow into a bigger company. We will soon be processing our 50 millionth data stream! We are very thrilled with this milestone – especially because this traction has mostly come from users who’ve referred other users to our service. We have not done any serious marketing push for our product yet, its almost been like a “closed beta” for the last 1 year!

We’re also very happy to  releasing a completely new and simplified version of Grepsr in May 2013. We launched Grepsr back in 2012, and after 1 year of thorough usage and tons of suggestions from our valued customers, we’ve incorporated lots of changes into the product. I am sure these changes will go a long way in improving the overall user experience. Here are some of the things we’ve worked on:

  1. User Experience – We decided to completely re-do the user experience so that everything from project creation to data delivery is painless. You will now find everything in one place. Not to forget, support for mobile devices.
  2. Speed! Speed! Speed! – We’ve had some growing pains. Some of our users who’s had lots of data with us would know. We’ve completely optimised the way we store our data. Now there are virtually no noticeable load times for millions of records.
  3. Search - Search is not our core business but we’ve spent some time optimizing our backend so that you can search for data fast. The last time we benchmarked, we could search 80 million records in less than a second!
  4. Browser Plugin – our last browser plugins were not that well integrated, and our users had a hard time using it. This time we’ve fixed all of that and you can now easily take web page snapshots from anywhere!

We will upload some screenshots of the new version soon. If you are an existing customer and would like to try the new version out, please drop us an email.

Grepsr at Startup Asia event in Jakarta

Posted on 11 Jun 2012 , by Subrat 0 Comments

We are just back from an awesome start-up event in Jakarta, Indonesia organized by TechInAsia. There were big investors and experts from the Asian tech industry at the event. We shared the stage with 15 other start-ups who pitched their product in front of a big crowd – it was an amazing experience!

The reception of our service was great! People were enthusiastic about what we offered and they could clearly see how our service could help. We have collected feedback, suggestions from various experts in the tech industry on what Grepsr needs to make things better. We have already started on executing those suggestions. The whole focus for us now revolves around not only crawling the web and fetching data, but also being able to process the data to make more sense of it for our users. Making sense out of the data you have or collected is a tricky problem to solve for any organization.

We want to add value to our service by doing all the processing for the user at our end, so that the user does not need to make sense or waste resources processing the data on their end! That’s going to be our goal for the next 6 months. It would be safe to say that, we are slowly moving into the realms of Big Data processing – stay tuned!

Here’s Amit pitching at the event, to an enthusiastic crowd!

Amit pitching at StartupAsia

Amit pitching at StartupAsia

 

Web Crawling Software or Web Crawling Service

Posted on 30 May 2012 , by Subrat 2 Comments

Some people ask us if we are a “service” or a “software”. We simply tell them – we are a service, with killer software that runs behind the scenes! :) Also, lot of our customers ask us, why go for a Web Crawling Service over a Web Crawling Software? The answer is pretty straight forward. Both solutions have their own advantages. The most important issues that help in deciding are:

  1. Time and Resources
    This is probably the most important factor. Running your own simple web crawler may not be that difficult. You can find a web crawling software that runs right off  your desktop. But you will have to do a some configuration on the software and you will need to be pretty well versed with technical jargon such as Xpath, Regular Expressions etc. Also, do you have the time to deviate from your company or product’s end goal and spend your time learning how to write or work with crawlers? Probably not.
  2. Scalability and maintenance
    Running your own crawler or running a web crawling software from your desktop may not be scalable. What if you need to crawl a 10000 different pages or links? Will that be scalable? What about technical problems? Do you have the network bandwidth to do this on a bigger scale? There are many unseen hassles when it comes to crawling the web. Web crawlers are known to break when the site being crawled changes – you will also need to take maintenance into consideration.

If you are think the above matters to you or your business, then Service is definitely the wiser choice. At Grepsr, we have designed our system to handle both the scenarios above. Our core focus is providing our customers with just the processed final data and hiding all the technical details involved. We run our crawlers in the cloud and it is highly scalable according to your requirement.

Sychronizing data extracted with Grepsr

Posted on 23 May 2012 , by Subrat 0 Comments

One of Grepsr’s most powerful feature is the ability to synchronize data in a variety of different ways when a new data is available. The whole idea behind this feature is to automate our data delivery process. It would really be tedious if you have to login to your grepsr account everyday to check if there is new data. If you are comfortable doing that, no problem :-) but as a user we’d rather have it sync automatically.

I’ve already touched base on FTP and Dropbox in my earlier post. I’ll try to explain all our sync options and how you could use them in a real world scenario.

Grepsr Data Synchronize Options

Grepsr Data Synchronize Options

There are currently 5 data sync options. They are:

  • Call Back URL
  • DropBox
  • Email
  • FTP
  • Google Docs

Scenario #1: You run a shopping website and you need prices compared with your competitors once every day – Callback URL

You would want to use, Call Back URL option for this. Grepsr would post the latest data to your system via HTTP POST. You need to configure your callback URL, typically something like (http://www.yourdomain.com/callback.php) to read data posted to it. Your callback script would then import this data in your database (mySQL, MongoDB, MS SQL etc) and perform whatever operations you need to perform on the data. Perhaps match the product SKU with your own database and reduce your product’s price by 1% to stay competitive? Its your call!

Call Back URLs are ideal when there are a lot of data.

Scenario #2: You are a news reporter and you need latest news from various sources once you wakeup every morning – DropBox or Email or Google Docs

So you are not a technical guy, but use DropBox to stay in sync with your files? Thats all that you need. You can have us gather news data from various sources once every day or less. Then just attach your DropBox account with our service and that is all! Grepsr will send you the news as Spreadsheets. You can then process the news and re-write them or blog them! We even deliver the news as PDFs if that helps.

If you need to use Email, thats fine too. Grepsr will send you the same files in the emails which you can download to view. The files are securely hosted in our cloud, so you need not worry about data security.

If you are a big fan (who isnt?) and heavy user of Google Docs, you can have Grepsr automatically sync the latest data to your Google Docs account, so that the data is in one place. This method is ideal only when the volume of data is less than 5000 records, because Google Docs has its restrictions.

Scenario #3: You are a business owner, and you need to extract heavy amount of data and archive them for processing – FTP

FTP would be the right deal for this case. Grepsr can send all the data that was extracted  directly to your FTP server, neatly organized in folders. You can then choose to process them either manually or using your in-house technologies. This method is ideal because it helps you keep archived data for future reference.

Some of our monthly customers have the habit of doing things manual. We just want to ask you to make full use of all these wonderful sync options we have in place! If you need help, you could always buzz us – we are ever ready to offer tips and advice on any of our features!

Why is Web Crawling important?

Posted on 15 May 2012 , by Subrat 2 Comments

Data lies in the heart of any business, even more if its tech related. With all the open standards of today like RSS feeds or APIs sharing data across systems have become relatively easier.

For example, if you want to read today’s financial news directly from your email inbox, you could simply subscribe to the provider’s (like Google News or BBC) RSS feed. Similarly, your system or application could also use a provider’s API to get upto date stock market prices. Feeds and XML makes sharing data very easy and that has been the whole reason they exist in the first place.

But what about data that is unstructured or does not have RSS feeds for you to consume? How will you go about fetching them? You could always hire people to manually log on and save the info into an Excel sheet – but the process gets tedious and impractical.

Lets take a simple example. You have a shopping site and have 1000 products. You want to make sure your prices are competitive. In order to do that, you will need to monitor your competitors’ sites and their prices for the same products. If there are a lot of products and lot of competitors it is going to be very difficult to do this without some automated process.

This is where Web Crawling comes into picture. There is a good chance you or your business will have need for automated web crawling to gather data which will be processed to make business decisions. Web Crawling technology was made popular by Google for its use in their search. They were the first to see the importance of immense amount of data on the web which was then not crawled and indexed. They capitalized on that – sent out thousands of crawlers to the web and indexed everything they could possibly find!

Lets scale down a bit, and think just about your business. What would web crawling do for you? Here are a few things that come to my mind:

  • Gather data for business intelligence
  • Market research about the product or service you are offering
  • Monitor competitor’s product or solution 24/7
  • Gather user behavior data to make your product perform better
  • Simply make your product more relevant with more content
  • … and many more!

Can you think of a few?

Grepsr is now SSL Enabled

Posted on 14 Apr 2012 , by Amit 0 Comments

SSL Enabled

At Grepsr, we are committed towards the security and privacy of your data. We have enabled SSL on our application today. What does this mean to you as a user?

  • All communication that occurs between you and the user from the point of login is encrypted.
  • This will prevent and eavesdropping on the communication between you and our system.
  • All your requirements, file uploads, communication, data and other information you provide us for data delivery is now safely transmitted.

We will continue to improve our services and help our customers become more efficient!

Grepsr and FTP – permissions and settings

Posted on 26 Nov 2011 , by Subrat 0 Comments

When you set your FTP inside Grepsr so that the system can automatically upload extracted data, please make sure of the following.

  1. Create a new FTP account for Grepsr – highly encourage you to NOT use the same FTP account that you use for your website etc.
  2. Preferably, make sure Grepsr can write to the root “/” folder of your FTP (not the same as “/” root of your file system).
  3. Or, create a folder called “/GrepsrData” at the root and make sure that is writable. Please note the folder name is case sensitive and does not contain space, i.e. “/grepsrdata”, “/Grepsrdata” or “/Grepsr Data” will NOT work.

If you take these considerations, Grepsr should easily upload your data to your FTP without issues.

Grepsr now supports DropBox (and FTP too)!

Posted on 10 Oct 2011 , by Subrat 2 Comments

We have some news!

We are very happy to announce that Grepsr will now add content to your DropBox and FTP accounts!

DropBox Logo

 

So, from now on whenever Grepsr extracts your data from the source, it will automatically upload the data in PDF, XLS and XML formats to your DropBox (and/or FTP) account.

The data will come to you directly to your computer; so that you always get the latest data without even having to check your Grepsr account!

All you need to do is attach your Dropbox to your Grepsr account as shown in the screenshot below and you are good to go :-)

Grepsr integrates with Dropbox

Grepsr integrates with Dropbox

Dropbox after Grepsr uploads the extracted files

Dropbox after Grepsr uploads the extracted files

 

Managed Data Extraction Service

Posted on 5 Oct 2011 , by Subrat 0 Comments

Grepsr is what we like to call, “Managed Data Extraction Service”. Here are some of the reasons why we call it “managed”:

  1. We let you focus on your business and use the data; worrying about technical details of extraction is our job and we will do it for you.
  2. We let you describe your requirements visually. Good bye to long boring descriptions of what you want extracted!
  3. We have communication “built in” so that you can communicate with us from within the system and keep track of each and every step.
  4. We believe that once we take ownership of a project, we must stick with it till the end – we will manage all your extraction projects (maintenance and small tweaks) so that you keep getting your data.
  5. We send the extracted data automatically to you (or your application) via XML feeds or notifications.
  6. We will manage all the other resources such as bandwidth, servers etc required for extraction.
  7. Finally, once a project is completed and working, we hand the control back to you so that you can schedule and run your extractions according to your needs.

Official launch of Grepsr (beta)

Posted on 2 Oct 2011 , by Subrat 0 Comments

We are immensely proud to be launching Grepsr today. Grepsr is probably one of the first Web 2.0 Software as a Service (SaaS) product for website data extraction.

So what does this mean for the customers?

  1. Cheaper costs – you pay a flat monthly fee no matter how big or small your extraction needs are.
  2. Fully automated service – from preparing your requirements – to communication – to the actual extraction, everything will be fully automated; everything is built into the system.
  3. Better reliability – we will make sure your extraction works smoothly even after your project has completed (i.e. if the source website changes etc).
  4. Better customer support – we have a customer support system built in -  when in doubt, just open up a support ticket from within the system.
  5. Better integration with your system – we deliver the extracted data to you in various such as email notifications, URL callback, feeds etc.

Grepsr is currently on Trial and all our services are free for a limited amount of time. We welcome you to try our service and our system and give us valuable feedback!