Real estate construction company based out of USA
Client was looking for creating a database of real estate mailing lists for locating sales prospects in US/Canada and capturing highly targeted leads
The client wanted to build a database of realtors and real estate agents in US and Canada who were listed on various real estate franchises. The challenge was to collect more than 1 million records comprising real estate agent profile information ensuring low turnaround time, minimal duplicity and highest fill rate of contact information. This had to be done by gathering data from 8 websites (each having a different structure) with some websites requiring user to specifically search before listing the agents. The critical part of developing the database was to compare and identify the same agent across multiple websites and triangulate contact information.
REMAX(US & Canada), Corcoran, Royal lepage, Sotheby’s Realty, BHGRE, Century 21, Keller Williams, ERA Real Estate
The key contact fields for the agents were identified across the websites including Agency Name, Contact Name, Email, Address, City, State, Zip, Phone, Fax, Website, Specialization, Languages Known, Properties Price Range, Agent/Agency Summary, Listed from 8 data crawlers were setup for mass crawling that could collect the information related to the above fields in a structured format. The websites were crawled parallel to each other and 1M agent records were collect in a span of 1 week. These records were subsequently treated for duplication using agency and contact details and the final dataset was made available via an API.
Home electronics price-aggregator startup
Collection of selling price data for electronics and appliance categories from major US ecommerce websites and collation of prices in a cloud-hosted database
The client was looking for a data partner to collate consumer electronics and home appliances price data from multiple US retailers so that their analytics team could develop interesting analytics around price comparison for consumers. The existing data extraction prcess was unstructured with prices being gathered manually using browser based scraping tools from various sources. This led to high efforts and data inconsistency issues with the scraping tool being unable to ensure consistent record count and missing out on key high selling models. Even with the manual effort, structuring of the data in order to import it into their database was a challenge. Client was in need of clean data that could be uploaded into the database directly and utilized for creating a price comparison and offer tool.
Kohls, Staples, Office Depot, Bestbuy, BJs, Costco, Amazon, Dell, Target
The client provided the list of source websites to be crawled. The data had to be extracted on a daily basis meaning fresh information had to be supplied every day. Our team set up the crawlers to fetch data points from the sources and crawl all fields including model, manufacturer, SKU, price, store, date, UPC, EAN and description. Since every website in the source list had a different structure and design, site specific customized crawlers were developed. Post completion of crawler development, the results of first crawl were shared with the client and once we received approval on data format and fill-rate, the crawlers started uploading data daily to the database ensuring consistent record count. Each crawled record was automatically mapped against other records based on the UPC data. We were delivering about 200k records on a daily basis.
Popular Job portal in India
The client wanted job listings to be extracted from leading job sites like Monster Jobs, Indeed and Naukri.com. The data points that the client needed were Job postings including job titles, location, wages, company profiles, Job descriptions and candidate resumes.
The list of source websites and the data points were provided by the client. They wanted this data to be extracted on a daily basis, which means fresh data had to be provided every day. We set up crawlers to extract the required data fields from the list of websites provided by the client. This requirement comes under the site specific crawl offering of ours since the crawlers have to be setup specifically for each site in the list. The client wanted the data in CSV format and be uploaded to their Dropbox account. Once the initial setup was done, our crawlers started delivering the data which was directly fed into the client’s Dropbox. We delivered close to 2 million job listings during the first crawl and about 200K records of clean and structured data on a daily basis thereafter.
Event Marketing Consultancy Firm based out of New Zealand
Crawl event and ticket price data from leading live event ticket booking websites
Ticketfly, Ticketek, Moshtix
To be updated soon