What if I told you that even the best technology in the world couldn’t save a company with erroneous, disorganized customer data?

“Dirty data,” if you will, is one of the most widely misunderstood and disregarded issues plaguing companies today. Flawed data costs the U.S. roughly $3B per year, according to a 2016 Harvard Business Review article.

But what are the root causes of it? How would this kind of data come to exist in a company’s database? And what can you do to make sure it doesn’t happen — or start taking steps to fix it if it already has? Consider the following figures:

Whew! So, before jumping off into any technology initiatives — especially implementing AI-enabled software — your immediate priority must be cleaning up how you’re obtaining customer data, where your data lives, and its upkeep.

I sat down with Daniel Eisenhut, Vice President of Services and Support at Emarsys, to learn more.


(1) In your opinion, why is data so crucial to AI success? Why is clean data such a prerequisite to AI success?

Daniel: AI interprets data points that are collected across the consumer purchase journey. It is not generating new data; it only enriches attributes based on initial captured behavior. Therefore, the underlying captured data needs to be as accurate as possible.

#AI interprets data across the customer journey – so captured #data must be as accurate as possible CLICK TO TWEET

(2) Why would a company’s customer data not be clean?

Daniel: The root cause is migration between different ERP systems, E-com frontends, or CRM software over time. The data structures of the different systems don’t fit, and data transformation needs to be applied that can create incorrect data sets. Other reasons may include:

  • Offline in-store data capture or provided by the call center is faulty due to manual information capture
  • Middlewares that move data from one system to another could experience errors during data transformation
  • Customer error — the customer makes a mistake or intentionally enters incorrect information (via forms, sign-ups, etc.)

(3) How would a company know if they needed to clean up their data? Are there any obvious warning signs?

Daniel: One common warning sign would be seeing very different results than expected after a campaign or specific period of time. Let’s say marketing is trying to execute a routine campaign and build meaningful segments (such as ‘return all my defective female buyers over the last 30 days’). If the campaign result deviates drastically from what’s expected — and especially if this pattern ensues — then you probably have a data problem.

(4) What’s one practical way an e-commerce marketing team would get started in cleaning up its data?

Daniel: The most important thing is to find the source of the error. Is it tied to legacy data, and the initial issue no longer exists? If the original error is tied to a legacy system or old data capture methods, only the legacy data will be affected. If there’s a problem with an existing data entry point, figure out the root cause, and determine the steps needed to correct it. Prioritize existing issues, then focus on fixing historical data.

(5) How can a brand avoid messy databases in the first place?

Daniel: Our recommendation is to attack the problem where the root cause typically happens: migration. I recommend following a 7-phase data migration protocol:

  • Phase 1: Pre-Migration Planning. Perform a thorough analysis of cost and resource requirements for migration.
  • Phase 2: Project Initiation. Plan your project from identifying stakeholders, procedures, policies, and data requirements.
  • Phase 3: Landscape Analysis. Look at all your data and create conceptual/logical/physical model.
  • Phase 4: Solution Design. Create an interface design specification and data quality management specification.
  • Phase 5: Build & Test. Test the migration with a mirror of the live environment and develop an independent migration validation engine.
  • Phase 6: Execute & Validate. Independently validate migration and keep an accurate log of SLA progress.
  • Phase 7: Decommission & Monitor. Hand over ownership of the data quality monitoring environment.

(6) What’s the best way to scale a data cleansing project? Could this be done in-house?

Daniel: There are several third party solutions for cleaning up data. We usually recommend Towerdata for email cleansing, Twilio for mobile data cleansing, or SmartyStreets for postal address validation.

(7) What are the best data-related reasons a company should adopt AI?

Daniel: If you have healthy baseline data, the data generated by AI will, without a doubt, enrich your existing data with valuable and actionable insights.

(8) How does a marketing automation platform help clients get the most out of their data?

Daniel: The best AI-enabled marketing automation software can revolutionize what a company can do with their data. In general, it helps enrich existing data with several AI-driven elements, like:

  • Incentive usage prediction
  • Defecting buyer detection
  • Next-best product recommendation (highest propensity to be bought by an individual)
  • Automatic replenishment reminders
twitter #Marketing automation lets you use #data to create incentives & product recommendations for defecting buyers CLICK TO TWEET

At Emarsys, we are not building simple segments based on the data clients are sharing with us. We calculate and extend segmentation to be meaningful, precise, and fully automated.

Trying to run your e-commerce program with dirty data is like trying to run an automobile on water. You won’t get very far.

The keys to maintaining a clean database? Take extra special care when migrating from one system to another, reduce manual data capture wherever possible, and use consistent, clear form fields/capture mechanisms at every customer touchpoint.

While data and AI are not dependent on one another, they both work best when they can coexist. In other words, AI works best when it has clean, quality, correct customer data to run on.

Emarsys offers several data validation tools. Our recommendation engine, Predict, validates consistency and correctness of product catalog structures. Emarsys’ customer intelligence tool, Smart Insight, provides deep insights into order data and helps uncover data inconsistencies for any marketing related segmentation.

New Call-to-action