April 8, 2024

Don’t Be Hit by the Bad Data Snowball Effect

The snowball effect, as it pertains to the escalating consequences of bad data, is often understood from the standpoint of a starting and end point. Just as a tiny snowball quickly grows when given a push downhill, starting out with a small data problem becomes a big problem by the end of its run which, in the customer experience realm, translates to negative outcomes.

While true, another way to think about the snowball effect is more analogous to the work of an avalanche forecaster who studies a snowpack, determining weak layers and assessing a propensity for failure. Bad data, in other words, can occur at every layer – the data layer, analytics layer and the orchestration layer. In this context, the snowball effect does not necessarily have a set starting point; failure at any layer puts the entire structure at risk.

Stopping bad data in its tracks, then, requires superior data quality at every layer. A closed loop of perfected data makes the entire cycle resilient in the face of data problems. Rather than bad data feeding off itself, the opposite occurs; the consistent introduction of higher quality data leads to continually improving outcomes, taking away the runway for inferior data to progress.

Data Layer

Bad data at the data layer generally means that data is either missing, inaccurate, out of date, or tied to the wrong customer. The root of these problems is human imperfections: poor data entry, misunderstandings in defining or assembling data, and inconsistency or inattention in human interactions.

Missing data happens for a variety of reasons. A brand may ask for data that seems irrelevant or too personal, or ask customers to fill out a form that is too long or doesn’t have the right input fields, resulting in customers entering incorrect information or leaving a field blank. This can include birthdates, zip codes, household income, etc. Similarly, a customer service rep or other agent can omit or forget to enter customer data during a hurried or noisy interaction.

Inaccurate data can be the result of similar imperfect processes, with people mistyping, filling in false or inaccurate information, or accepting default values in forms. It can also be the result of mishearing or misunderstanding in a face-to-face or phone interaction. Inaccuracies are often compounded by use of external reference data that are low quality or out of date.

Data that is not timely or not attached to the right customer via poor matching speaks to the negative consequences of failing to develop a Golden Record, a persistently updated unified customer record that a brand trusts will include everything there is to know about a customer. Depending on the use case, updates and aggregates may be needed in milliseconds. A lack of real-time updates or advanced identity resolution capabilities is just as consequential as missing or inaccurate data; the business can’t trust it, and its ability to make bold marketing decisions that depend on perfect data suffers.

There are a host of things brands can do to ameliorate bad data issues at the data layer. Perfecting the Golden Record is one, which would include steps to eliminate missing and inaccurate data. It might also include turning the traditional notion of data collection on its head by introducing zero party data into the equation – data a customer willingly provides. Rather than observe a customer and deduce what’s important, a brand might instead have a direct conversation with a customer, making transparently clear why certain data is being requested, how it will be used, and what the customer might expect in return. This conversation can be an explicit value exchange, where a customer volunteers data about themselves in return for a personalized experience that improves over time.

When a customer trusts that a brand will use personal data in accordance with the customer’s stated preference as well as provide a superior customer experience, the customer provides information that a brand might otherwise get wrong. Any interaction point – transactions, returns, social footprint, website behaviors, abandoned cart activities, etc. – has the potential to introduce missing or inaccurate data. By having a direct conversation, a brand takes steps to close the door to bad data.

Decisioning Layer

On the surface, it’s a little harder to recognize the cause and effect of bad data at the decisioning layer – where analytics and machine learning play. That’s because even with some incorrect data, it may still be possible to build reasonably accurate predictive models. The predictive model’s intended use case will determine to a large degree the level of incorrect data that is tolerable. At the same time, if data is skewed because of how you’re collecting it – customers who balk at a retail associate asking for a zip code, and providing random digits – then a model built off skewed data will not fix the underlying issue. This is where the old “garbage in, garbage out” saying applies; customer ages, income brackets, recency of last transaction – any grouping where assumptions are made will introduce the possibility of an inaccurate understanding of the data and models that produce similarly inaccurate results.

The other part of the equation as it concerns the decisioning layer is that if a model is dependent on a particular piece of data from a specific customer, or decisions are dependent on the same, then even an “accurate” predictive model does not necessarily make the best decision for an individual customer. A simple example is that collecting birthdates to understand the age ranges of your customers may be suitable for broad demographic models and segmentation purposes, but if 20 percent of your customers entered the wrong data, those customers are at risk of a poor customer experience if a brand is sending a birthday greeting on the wrong day.

Orchestration Layer

Already we see the beginning of the snowball effect from the first layer to the second layer, where a model may be incorrect because it’s based on data that has not been accurately collected and/or correlated with the right customer.

The orchestration layer, though, is really where the figurative snowball either gains steam or is stopped in its tracks depending largely on how well a brand understands the customer. This layer is where the value exchange is proposed regarding offers and actions that either demonstrate a deep, personal understanding or, conversely, indicate an unfamiliarity as expressed by irrelevance. The latter, of course, is when customers who expect a personalized experience instead choose to opt-out, unsubscribe, or give their business to a brand that understands them.

Irrelevance in interactions can occur because of bad data at either the data or decisioning layers, but also during orchestrated interactions. Consider, for example, a brand that truly knows everything there is to know about a customer and queues up a perfect offer for a product that precisely matches a customer’s size, color, and style preferences. If inventory is locked in a separate channel, the brand will introduce friction into the customer journey if the product that’s offered isn’t available. The same dynamic applies to product and/or service data, such as a financial services organization making an email offer for a credit card and then denying the card when a prospect fills out the application.

A Closed Data Feedback Loop

Beyond the orchestration layer’s impact on an individual customer journey, this layer is also where metrics such as customer lifetime value, churn, or predictions of customer sentiment may also go off track depending on the accuracy of the data layer. The offshoot is that campaigns will not yield the expected results.

To avoid this pitfall, and also to avoid incorrect attribution and reporting (e.g., mistakenly attribute an uptick in sales to a specific campaign), it is important to accurately measure interaction results. A/B testing, correlation analysis, testing and optimization, basic attribution and other instrumentation tools are vital for an accurate understanding of interactions. Correlating results back to the data collected in earlier layers will then measure and close the loop for understanding which champaigns or interactions are succeeding, and which are not.

Bi-directional feedback loops exist from each layer to the next. Setting up attribution within the orchestration layer feeds data back to the data layer. A/B testing is a feedback loop into the decisioning layer for understanding a model’s accuracy, and the decisioning layer can be used as feedback into the data layer for understanding how quality, trust, completeness, and timeliness vary over time.

By thinking of the snowball effect as it impacts data at the data, decision and orchestration layers rather than in a linear sense, we see how the impacts of bad data do not merely affect the customer, but also marketers who are asked to trust the data. By knowing the traditional points of failure at each layer, marketers can more easily gauge data’s trustworthiness at each stage and over time, playing the figurative role of the avalanche forecaster by continually assessing data and shoring up any points of weakness.

A recognition of where and how bad data appears is a marketer’s best defense in preventing bad data from careening out of control. By stopping the snowball effect, marketers can trust and rely on perfect data, which is necessary to provide customers with the personalized experiences that they have come to expect.

For more on the role of data quality in the Redpoint CDP, click here to join Redpoint VP of Product Management and Redpoint VP of Engineering Kris Tomes in the “CDP Back to Basics” webinar series.