Redpoint Logo
Redpoint Logo

Jul 24, 2020

What is Data Hygiene?

Imagine you’re a passenger in a Boeing 767 at an altitude of 41,000 feet when the aircraft runs out of fuel, in large part because of a data hygiene error. If you suddenly stalled out in the clouds midway between Montreal and Edmonton and you knew it was because a refueling calculation that incorrectly converted pounds instead of liters, proper data hygiene would become very important to you. I’m referring of course to the infamous “Gimli Glider” incident of 1983, where this very situation unfolded, before fortunately ending without a single fatality as the pilot glided the aircraft to a safe landing.

The tired “garbage in, garbage out” saying is usually met with a roll of eyes, but the Gimli example shows us that poor data hygiene does have potentially catastrophic consequences. As marketers, we’re fortunate that negative outcomes fall short of life and death implications, but revenue loss is certainly at play when poor data hygiene results in missed opportunities, customer dissatisfaction or regulatory fines.

Data Hygiene: A Breakdown

What, then, is  superior data hygiene that ultimately supports a superior customer experience? It’s ensuring that there are rules for converting measurements and transaction amounts (US and Canadian dollars, for example) into an apples-to-apples comparison. It’s having rules for address, email, name and phone number standardizations – and for any data that makes up a unique customer record. And it’s having rules that prioritize customer data – rules, in other words, that stipulate how “noise” (the volume of customer data from every system) become “signals” (identifying what’s relevant, and why).

Exceptional data hygiene also recognizes that customer data is used for a wide range of purposes for the benefit of both a business and the customer, and that those purposes may sometimes be in conflict. A business aims to use customer data to build accurate, relevant and reliable models and use those models to intelligently orchestrate an extraordinary omnichannel customer experience. A customer, meanwhile, wants to own, protect and manage their identity, and know that their customer record with a brand accurately reflects who they are, and puts them in the best light possible.

How to Get Data Hygiene Right

Within these occasionally dueling purposes, there are synergies. Both the business and the customer want data looking right, even if it is for different reasons. To ensure that it is right – to check off all the data hygiene buckets, if you will – three things have to happen.

First, is to separate the noise from the signals. Making a decision about the relevance of data is part of it, but this component also entails ensuring accuracy in attaching relevant data to the right customer. Second, it’s important to clean up existing data. This includes correcting obvious errors, as well as making sure that data is formatted correctly so marketers can use it for the intended purpose. Matching identities, ensuring time stamps are accurate, and ensuring apples-to-apples comparisons are all part of this important data hygiene operation. Third, superior data hygiene requires solving for identity resolution beyond a simple rule for matching (or not matching) a Robert Smith with a Bob Smith at the same canonical address. Rather, it entails making sure that the data collected is appropriately matched to the correct aspects of a person, such as an accurate determination of who is using a shared device, which address is outdated due to a life change, or how frequently someone visits a brand’s sister site .

Data Hygiene: Beyond the Basics

In addition to the obvious data hygiene exercises such as fixing misspellings, matching street names to a canonical address (Avenue of the Americas vs. Sixth Avenue or Suite 120 vs. Apt. 120, etc.), and rules for standardizing each element of an identity (phone number, IP address, email address, postal address, etc.) there are also more complicated rules in play. These include parsing and processing accurate signals from something as complex as the body of an email, the body of a social post or a web form. Simple or complex, rules must be in place for determining what’s a match or not a match under any number of circumstances, with a prioritization mechanism tuned for accuracy.

Data hygiene rules may also differ by industry, as they relate to a customer or segment of customers, or even as they relate to a specific business use case. As an example, cleansing a name and matching an identity for the purposes of billing may be vastly different from what you’re allowed to do for the purposes of marketing. Similarly, the healthcare industry has more stringent data hygiene requirements regarding the collecting, storing and sharing of medical records that it does for the purposes of sending a welcome email in an onboarding campaign.

Data Hygiene is Not a One and Done Deal

Lastly, data hygiene is a 24/7 process. Filtering, cleansing and matching all have to happen simultaneously and in concert as customer data is collected in real time from various sources. Consider for example a web behavior signal that alerts a CDP that a customer is on a web page for an hour. Length of time on page may be an important signal that triggers a certain offer vs. a different offer for a different time threshold. A marketer must trust the accuracy of the information, which is why rules for matching, understanding and correlating billions of bytes of data into meaningful signals is a continuous process.

Because data hygiene is continuous, it also must operate within the parameters of imperfect human behaviors. The classic example for this is a call center operator whose job performance is measured in part by handle time, who might want to keep call length short. This may potentially interfere with good data hygiene, such as ensuring the proper spelling of a name or email address, or that a form is filled out correctly.

Human error, miscommunication and mistakes – although hopefully less severe than the Gimli Glider incident – have real-world consequences for marketers striving to deliver a superior customer experience. Conversely, data hygiene done well results in confidence in the relevance and accuracy of data signals that deliver better customer engagement, better permissions-based marketing and more accurate and timely interactions.


No Data Left Behind: Analytics, Orchestration and Making Data Work for You

Why Data Veracity is the Foundation for a Personalized Customer Experience

What is Data Lineage and Why is it Important?

Be in-the-know with all the latest customer engagement, data management, and Redpoint Global news by following us on LinkedInTwitter, and Facebook.