Redpoint Logo
Redpoint Logo

Apr 3, 2018

4 Steps to More Efficient Data Preparation

4 Steps to More Efficient Data PreparationData preparation has long been one of the most time-consuming tasks in data management. Today, 80 percent of data scientists spend more time preparing data than analyzing it. But you can make data preparation far simpler and less painful than it has traditionally been. This post presents four well thought out streamlined data preparation steps, helping you spend more time using your data to engage with customers based on their true needs, behaviors, and desires.

  1. Fix the Data in Your Source System

Imagine that you capture customer and prospect data from a website where consumers voluntarily register. Or, perhaps, your call center agents capture information from inbound callers who are responding to advertising or trying to solve problems. Even assuming this information is being validated on the front end – for example, with a web form script that checks to make sure a U.S. ZIP Code consists of five or nine numbers – it’s surprisingly common for data fields to be truncated before they reach your database. This can especially happen if your data source is external, and built with code you don’t control.

Obviously, truncated data fields such as last names can wreak havoc with marketing. Fortunately, advanced customer data platform technologies can quickly recognize and fix most problems like these, before bad data feeds into your CRM system or enterprise data warehouse.

  1. Fix Your Source System to Correct Data Issues Going Forward

Some data problems are easier to fix than others. For example, missing ZIP Codes can usually be added through familiar address standardization tools. However, prevention is always better than cure, and even fixable data quality problems may signal deeper issues that require your attention.

For example, while it’s perfectly acceptable for a data source to report middle initials for only 20 percent of its records, if it’s missing last names for more than five percent of them, that’s a problem. A record that contains gaps might have enough information to permit a mailing, but not enough for accurate identity matching – degrading your ability to avoid duplicate offers and engage with customers as individuals as they cross channels.

With the right tools, you can profile any data source to quickly understand key aspects of its quality. You then have detailed information you can use to address the problems with your internal or external data provider.

Profiling is, of course, equally valuable when you’re bringing a new data source onboard: for example, social media posts or other unstructured data that may have unique attributes and flaws. So, too, it can help you determine which external data you should (and shouldn’t) purchase.

  1. Apply Precision Identity/Entity Resolution

To gain an actionable 360-degree view of your customer, you must ensure that one and only one record is associated with each customer or household. Otherwise, you’ll send duplicate mailings that alienate instead of engage. Your “next best offers” won’t be “best” at all, because they don’t fully reflect your customers’ latest behaviors. Identity matching techniques such as fuzzy matching are well-known, but they still miss many duplicates.

Of course, sometimes you know exactly who visited you, because they made a purchase with a credit card, authenticated themselves by logging in, or used a mobile app tied to their identity. But many customers and prospects visit you anonymously. Fortunately, you can now use a wide variety of technical attributes – including IP addresses, cookie data, and network adapter MAC addresses – to tie anonymous web visitors back to specific customer records. Handled carefully, this can significantly increase the number of accurate matches you generate.

There are industry best practices for identity matching, but sometimes the rules need tweaking to reflect the characteristics of your own data sources. An advanced system such as a customer data platform can allow you to easily adjust thresholds to maximize matches without generating false positives, so you can get the most value from your data.

  1. Automate, Automate, Automate

Today, there’s way too much data for any individual to manage on their own. But you can gain data preparation “superpowers” through automation, and it can help you focus data stewards’ limited time where it’s most valuable.

Once you understand a data source’s characteristics and know how to “fix” its flaws, you can use tools like a customer data platform to run those fixes automatically on every new batch of data from that source.

By automatically running new datasets through a series of validation rules, you can surface problems as soon as they emerge. For example, a call center might have consistently delivered high-quality data for years, but it has just introduced a new data entry system, and error rates are spiking upwards. Problems like this need to be fixed ASAP — and a customer data platform enables your data steward to get involved, with incontrovertible information to back them up.

Refocus on What Matters Most: Using Your Data to Engage and Delight

Data isn’t an end in itself. It’s a means to the end of delighting customers, increasing loyalty, and growing profitable sales. Use technology to help you prepare your data more efficiently and painlessly, so you can focus on the end, not the means. A customer data platform can empower you to do that by making it easier to fix data problems upfront, quickly identify their root causes, match identities more reliably, and automate the majority of your data preparation tasks.