April 6, 2023

How Does Your CDP Handle Data Transformations?

Everyone agrees that the secret to great customer experience is great customer data. There is less agreement, however, on the precise steps that need to be taken to ensure customer data is fit-for-purpose. Poll a few multichannel marketing hub (MMH) or customer data platform (CDP) vendors about how they handle data transformations, and you will likely have a few different responses.

This article will explore how the various approaches to data transformations ultimately impact customer experience.

Control, Visibility and In-System Data Transformations

The first and most obvious form of customer data that every CDP handles is Martech touchpoint data, i.e. data that flows between the CDP and the various channels. In the case of a MMH, this may also include the orchestration and pushing of offers to the channel.

One thing to be aware of with Martech data is how quickly or frequently a data source returning data to the CDP is updated. In this pre-transformation phase, there is the possibility of delay before data becomes usable, with background activities that must be performed such as redoing data aggregates or recalculating segments and propagating the changes throughout the system. You don’t want to perform data transformations on data that is out of date. If control over the timing of how and when customer data is updated is important, such as preserving the option for real-time interactions, then performing data transformation online in real time will solve for the potential of an outdated source.

And that’s really the crux of any discussion about data transformation; does data management belong inside the platform, or is it permissible to have it happen upstream or even downstream? This becomes clear when looking at a data API as a second source of customer data. Different vendors will have different methods for loading data, and how that data is mapped or matched to existing records.

An advantage of data transformations occurring upstream is that data is then presumably ready for use from a marketing standpoint when it reaches the system, with the major caveat that again you’re ceding control – not just pertaining to the timing, but in terms of the specificity of data transformation. As it pertains to matching, for example, if data cleansing happens upstream, what happens when records for [email protected] and [email protected] are returned from two different sources? How will you know whether those should be matched, particularly if neither one aligns with an existing signal for a customer, such as a physical address?

Unpacking one more layer of this conundrum, even if the match itself can be trusted, that is not the same as compiling an updated, accurate and precise Golden Record, which might also need to determine – in the jon.smith vs. jonathan.smith example – which email to actually use when engaging with said customer. The second advantage of performing data transformation internal to the platform, then, is that when matching new data with existing records, you have visibility and control that the data is matched and merged correctly, yielding complete confidence in the validity of a Golden Record.

Identity Graph vs. Golden Record

If less control is a result of data transformation happening upstream when a Golden Record is created in the system, the same drawback holds true for yet another way in which data enters a CDP or MMH, via files such as an FTP or one of any number of cloud buckets. As with an API, if you’re picking up the data expecting – hoping? – that data transformations have already occurred, there is going to be some uncertainty over whether the data is fit-for-purpose.

One end-around to this problem is to simply hang on to all the raw data and do some piecemeal matching to associate it with a customer. Some vendors take this approach and deem a resulting record an “identity graph,” referring to a collection of every customer signal and its attendant attributes, and then attaching all identifiers within a signal to a customer. Once again, though, that definition of an identity graph is not on equal footing with a Golden Record. Taking a signal coming in, matching it using a known identifier and attaching it to an identity graph does not represent a data transformation per se because really all you’re doing is keeping raw data and pushing the problem off to someone downstream who must then figure out what to do with it.

This is a key distinction when identity resolution comes up as a core capability. There are vendors who embrace a purely identity graph-based approach, claiming they will figure out which signals are attached to which real-world customer, so you can throw in as much raw data as you want and it will all be resolved. Yet the same issue crops up where identity resolution alone does not tell you how to accurately reach out to a customer according to the customer’s own preferences or per compliance rules.

If you have five different phone numbers, do they all belong to the same master record? Which should you use? Keeping raw data is fine, but there is an important distinction between an identity graph and a Golden Record that cannot be overlooked. The latter may accept raw data but ensures that a corrected, cleansed and normalized version is created and persisted for later use, with identity resolution a key component – but certainly not the only component.

Native connectors to various enterprise data sources in addition to Martech touchpoints represents a final common source of customer data, with an advantage being control from within the platform the cadence, the choice of touchpoint and the depth and breadth of data coming in. One consequence is a requirement to transform that native data to the format that’s needed, which is where an ETL tool comes into play. As with sourcing data from an API or a file, the ideal place for data transformations are within the customer data layer internal to the platform.

Many CDP and MMH vendors slide past the problem of making data fit-for-purpose and, in doing so, also slide past the problem of ensuring that data is at the cadence that’s needed to drive the needed results. If a solution makes a disconnect between data cleansing going on upstream, versus the data cleansing and creation of the Golden Record somewhere downstream, what happens is that problems are offloaded to someone else somewhere else, and the result is data that is not fit-for-purpose which will translate into an inferior customer experience.