The Challenge of Dirty Data

A recent post on the use of artificial intelligence (AI) to clean data not only elicited tremendous response but CTOs and CEOs wrote in to say they wanted an even more detailed post on dirty data and its consequences.

To be candid, we were taken aback by the response. We also interpreted the responses as evidence of the magnitude of the problem. It is a fact, after all, that data quality is one of the top three challenges that Enterprises face in their business intelligence programs.

All of you know that to utilize Big Data, it needs to be viewed within the framework of business context. The bridge between the two is master data management. This kind of program aims to club together data from disparate sources and establish its veracity to ensure data consistency.

Off the starter’s block, businesses usually perform data asset inventories. This helps establish the baseline for relative values, uniqueness, and validity of all incoming data. Going forward, these baseline ratings are used to measure all data.

Sounds good, right? Unfortunately, data asset inventories by themselves are simply not enough. Organizations run into hurdles as they grow, and the more the growth, the more the chances of data quality getting compromised.

There are many reasons for dirty data. Top of the list is human error. Starting from typos to erroneous values being entered to the duplication of entries; the list is long.

Just below human error is IT architecture challenges. IT relies on multiple hardware and software platforms and solutions. If their mesh-up is not done properly, it could cause data problems. What’s more, not updating systems as the Enterprise grows could also add to consistency errors.

Think of it as a pipeline bringing in the crude oil to the refinery. You need to monitor it for leaks, spillages, corrosion, joint failure, metal fatigue, human error; all of which can led to contamination or leakage at the ingest stage itself.

Then there is the problem of data decay, which not many businesses factor in.

Here’s a simple example: a client has moved house and his address on file is no longer valid. Someone in the team needs to be tasked with the duty of keeping data current. Multiply such errors by thousands and suddenly you have a massive problem of dirty data on your hands.

Areas that get impacted because of dirty data

There are overt and covert consequences of poor data getting into your system. Poor data comes with a cost, both tangible and intangible, and monetary as well as to reputation.

The most obvious sphere of business intelligence to be impacted is strategy, followed by decision making. Any strategy born from dirty data has to fail. It also does not need rocket science to understand that poor data means poor decisions.

The impact of dirty data can also lead to system inefficiencies and negatively affect productivity. All of which will eventually be a strain on your Enterprise’s operational costs.

Here’s an example of low trust: The number of cars on the road as on a particular date in particular geography was erroneously filled in as X value rather than Y. The strategy of an automobile manufacturer to introduce electric cars as replacements in that particular market, based on this wrong input will be flawed.

Another example: Faced with a deadline, a research executive for a detergent company keys in an approx. the number under the men-women ratio in a particular suburb, without doing the leg work. What will follow next when the company wants to launch a new product is anybody’s guess.

The intangibles are difficult to measure but equally important. Consistent bad or dirty data reporting in an organization can affect employee morale. Having to deal with inaccurate data 90% of the time can be a frustrating experience.

What your organization eventually needs is one source of truth based on which all internal and external decisions are taken to achieve the shared objectives.

In part 2 of this post, we will examine how data quality can be preserved, and how AI can be used to preserve the integrity of large data sets.  


How to Clean Dirty Data – The Life of a Data Janitor
6 Key Responsibilities of the Invaluable Data Steward
Data Quality

An Engine That Drives Customer Intelligence

Oyster is not just a customer data platform (CDP). It is the world’s first customer insights platform (CIP). Why? At its core is your customer. Oyster is a “data unifying software.”

Explore More

Liked This Article?

Gain more insights, case studies, information on our product, customer data platform

Leave a comment