Connotate Web Scraping for Data Monitoring, Extraction & Collection

3 Key Questions about Big Data for Your Business: Part 3 – How do you rapidly gain advantages from Big Data?

May 10, 2012 - Matt Jacobson

As you move to harness Big Data, the volume of data, as well as the velocity and variety, makes automation a necessity.  As Connotate’s recent survey found, too many companies are ignoring the very real staffing and talent challenges associated with Big Data.  45% of the respondents in Connotate’s survey  indicated that human effort was the primary deterrent to enterprises leveraging Big Data. 

What we see is that many companies are relatively low on the maturity scale of dealing with web data, a key source of Big Data.  Our experience is the majority of companies, even in the information provider industry, are in the lower half of the maturity model.  In increasing order of maturity, the stages of maturity model are:

  • Chaotic – efforts are ad-hoc and haphazard and deliver little business value.
  • Manual – regular data collecting and monitoring is performed through a manual process.  The process is error prone and the time intended for business analysis effort is often cannibalized by the tedious collection and monitoring efforts.  Some business value is provided.
  • Programming – initial automation is in place, but relatively expensive programmers are used and maintenance costs are high, detracting from the business value.
  • Scalable – industrial strength automation that facilitates a replicable workflow leveraging non-technical personnel is in place.  Costs decrease dramatically from lower maturity levels and business value jumps.
  • Agile – unstructured data from web sources dynamically flows into solutions that integrate with internal data at high volumes with velocity.  Sophisticated software ranging from price management to text analytics and predicative analytics are used.  High business value is realized.

While an automated approach is essential to address the Big Data challenge in a mature way, the key to automation success is to maximize the leverage of human judgment in your Big Data stream.  The variety of data sources and formats is overwhelming once you multiply by the volume and velocity components.  Our information provider customers tell us that applying a text analytics solution to their massive compilations results in decent, but unsatisfactory accuracy rates (e.g., 80% which means 1 in every 5 is wrong).  However, an automation approach which efficiently records human judgments at the upstream sources, before volume and velocity complicate the picture, produces great accuracy at an attractive price.  That captured human judgment provides the foundation for the perspective that’s necessary due to the velocity of the data.  Such shrinking of Big Data sources is discussed in Bill Franks’ recent blog entry  for the International Institute for Analytics. 

The era of Big Data has arrived with the potential to add tangible business value to your company if you can address the Big Data challenge with a mature approach.