Few weeks ago on LinkedIn, a post of ours gave rise to a heated debate among insurance professionals. All that the post said was, ‘Yelp will soon be used as one of the major sources of data for underwriting’.
Insurance experts chipped in to say how bots have taken over the reviews on Yelp and fake reviews and ratings run rife across most of the social sites. They had a point – ‘How can big data sources be credible?’.
The other day, we even chanced upon this video that shows you five different ways in which you can manipulate wearables data! An insurer’s nightmare.
Small wonder then that insurers are wary of big data sources as their numbers continue to grow phenomenally.
What if all of big data is wrong?
Fake reviews in social media sites and instances of jailbreaks of devices are no reason to avoid using big data in insurance. Because fraudsters always slither into any system.
Data scientists and big data engineers are known to swear by ‘Veracity’ of big data. And for this reason insist of cleaning data sets. However, with streaming big data, the longer you wait to clean the data, the more quickly it decays.
Instead of donning hand gloves and masks to clean and sanitize data, a better way to ensure its credibility is to assume that all of it is wrong! Yes, that’s right, assume that all of the big data you are working with is infected and work backwards from there.
Essentially, your focus is NOT on establishing a single source of truth, but rather on identifying those strains of truth which are likely present in the data by using a method called triangulation. To illustrate further, just because you trust revenue from source A, does not mean you should trust employee count from the same source.
As a term, triangulation has its origins in qualitative research. And, in the context of big data, it is used to verify the accuracy of a data source by corroborating it with two similar or disparate elements.
And of course, in the case of big data, triangulation can only be implemented with a machine learning algorithm. Machines alone can handle the volume and complexity of the data, and become smarter over time.
Examples of data triangulation include:
Fixing a bad address by verifying the data from static forms with addresses found on social media, news websites and GPS data.
Revealing the safety practices of a manufacturing company using data sources such as Enforcement and Compliance History Online (ECHO), news mentions in Google, independent third party data sources such as HazardHub.
Detecting red flags in leadership of businesses by using a combination of sources—social media such as Glassdoor, news websites and paid sources such as D&B.
Uncovering financial irregularities of a risk with the help of Paydex sources, review websites and news mentions.
How can you rest assured (well, relatively) with triangulated big data?
To truly experience triangulation, get on to Intellect Risk Analyst today.