Web Scraping FAQs

WHAT IS A WEB-CRAWLER?

A web-crawler is an application that extracts data from a web page and manipulates it into a “more usable” format. Crawlers are typically built for a specific web-site and purpose. Properly done, a crawler will emulate a user’s behaviour whilst shielding the scrapers true identity through proxy-servers. They are also known as data collectors, data extractors, web crawlers, web scrapers and web-site rippers.

HOW MUCH DOES IT COST TO WRITE AND RUN A CRAWLER?

It depends on numerous factors, including:

  • Number of fields to be scraped
  • Number of pages, and different page formats
  • The complexity of the data manipulation and degree of manual intervention required
  • Existence of anti-crawler protection mechanisms, or limitations of the rate at which the site can be scraped
  • Once off or ongoing project

As a ball park indication, a very simple scrape costs around $300-400 whilst a complicated, protected site with millions of pages can cost around $2,000.

IS THE BILLING DONE BASIS FIXED PRICE OR PER-HOUR?

We offer both types of billing structures:

  • Fixed price is typically used for simple, accurately specified projects. Job requirements not expected to change over the life of the project.
  • Per-Hour billing is preferred for complex projects, or those on an agile development path where requirements adapt in response to changed circumstances.

DO YOU OFFER VOLUME DISCOUNTS?

Multi-crawler projects or long-term ongoing projects obtain volume discounts.

IS DATA SCRAPING LEGAL?

In certain situations data scraping is considered unethical or even illegal. Much depends on how it is done, the type of data extracted and for what purpose the extracted data will be used. Each data scraping project thus needs to be assessed on its own merits. If in doubt, please obtain your own legal advice.
We have been doing this since 2005 and have not heard of any legal problems from our clients, for their use of the scraped data.
We value client confidentiality and discretion. We also scrape websites in a highly anonymous manner that is impossible to trace back to us (and you). We can covertly get the data for you, but what happens thereafter, depends on what you do with it.

DO YOU DO PROVIDE ADDITIONAL DATA SERVICES?

Yes, we are a data focussed company that provides a full suite of data analysis, data-cleansing, data mining and data warehousing services.

CAN I RUN THE CRAWLERS FROM MY OWN SERVER?

Yes you can, however you may have to install additional software, and obtain access to proxy-servers.

TYPICAL WORKFLOW FOR A DATA SCRAPING PROJECT?

Please Contact us with an overview of your project requirements. We will setup an introductory call with you to discuss your project and better understand your requirements. We will review the sites with your intentions in mind and then provide a initial assessment of expected costs. We offer you a free proof of concept for 1-2 websites with one-time data extraction.
Once you are satisfied with the proof of comcept results, we will generate a final quote which takes approx. 3-4 days. Upon final agreement on the quote, we sign a mutual non-disclosure agreement, if required. Once we receive your signed Letter of Engagement, we commence the development of the crawler(s). Once completed, crawlers are tested and reviewed. We also provide you with an initial data extract for your review and approval.
Once approved, we transfer the crawler into production mode sharing data in agreed format and getting feedback at pre-agreed intervals. We also take care of any post-production data processing including parsing, standardisation, normalisation and de-duplication. Once deliverables are approved, we submit invoice, which is due for payment in 15 days.

PAYMENT MECHANISMS?

Payment by direct deposit into our bank account. We do accept PayPal, however we include the (somewhat expensive) PayPal fees to your invoice. Please note that we do not have credit card facilities.

WHAT ARE THE PAYMENT TERMS?

Invoice for each month is raised on the 15th day of each month. Payments are expected within 15 days from the date of invoice.
If payment has not been received within 30 days, a fee of 2 % of the value of the invoice will be charged. This fee will be charged every 30 days until the invoice is paid in full.

CAN I OBTAIN THE CRAWLER(S) SOURCE CODE?

Yes, we can provide you the crawler source code at no extra charge, however you may need to install additional software, and/or obtain access to proxy-servers, before it can be used on your system.

HOW DO I GET THE SCRAPED DATA?

Data is provided in the agreed format either by email, via a shared DropBox folder or uploaded directly to your Amazon S3 account. This includes any extracted images, PDFs and documents where required.

DO YOU PROVIDE A FULLY MANAGED WEB HARVESTING SOLUTION?

We provide a fully managed solution where we take care of all the entire scraping process so that you can receive freshly-harvested data, without the fuss & hassle.

We'd like to help and talk with you

Contact Us