Identity resolution is the process of finding, cleansing, matching, merging and relating every disparate signal about a customer from MarTech touchpoints, enterprise systems and databases/lakes to produce an accurate, complete, up-to-date view of a customer. Depending on the use case or business purpose, “customer” in the context of identity resolution may refer to any party to an interaction such as a household, a prospect, a consumer of goods or services, a prospect, a patient, a member, an employee, a B2B client, a buyer or even a product.
Identity resolution is a key function of a composable CDP.
Why is Identity Resolution Important?
Identity resolution allows companies to differentiate one customer from another and link signals to a unique customer ID even with conflicting or overlapping signals and identifiers such as multiple device IDs, shared devices, different physical and email addresses, various nicknames, heads of household, shared accounts and other various forms of “messy” customer data.
Identity resolution is an important process in the creation of a single customer view, which is also known as a Golden Record or Customer 360. Part of a Golden Record is a customer’s full identity graph, which contains everything there is to know about a customer – devices, names, ID’s, addresses, email, nicknames, etc. – and robust identity resolution generates a more accurate and trustworthy identity graph.
What is a Golden Record?
The Redpoint Golden Record is a single customer view that combines data from any source (website, mobile app, eCommerce platform, POS, social media, CRM, etc.) to form a holistic unified record of a customer and the customer’s engagement with a brand across every touchpoint.
The Golden Record constructs everything that is knowable about every customer, from every available source. A full identity graph is just part of a Golden Record. It also includes full contact history, transactional, demographic and preference data, and all attributes and data aggregations – all processed and updated in real-time. With data quality processes completed at data ingestion, and using persistent key maintenance and tunable matching and merging as part of advanced identity resolution, a Golden Record is the foundation for powering personalized customer experiences that requiring being able to differentiate one customer from another.
What is Deterministic vs. Probabilistic Matching in Identity Resolution?
In resolving an identity to build or enhance a Golden Record, deterministic and probabilistic matching are the two main ways to determine whether certain signals or interactions are related to the same customer. These two methods provide different degrees of confidence in the match.
Basic identity resolution takes a deterministic, exact match approach that only matches identical phone numbers, physical addresses, names or other exact identifiers. A broader type of deterministic matching may identify the same user across different devices by matching the same user profiles with a common identifier, such as an email address.
Probabilistic matching uses advanced analytics to link customer records, such as identifying two disparate customer records that represent the same individual using multiple identifiers and “close enough” matches.
The output of both deterministic and probabilistic matching is to discern with a high level of certainty that different data fragments or records belong to the same identity. Both are important when the use case for identity resolution is to build a Golden Record because they both contribute to compiling a full, complete and accurate identity graph that represents a detailed collection of all connected experiences, interactions and facets of a customer, person, household, organization or even product.
A combination of deterministic and probabilistic matching is important for accurately reconciling multiple signals. An online browsing session that ends with a shopping cart fulfillment will be considered a single interaction yet contain multiple signals about the customer’s ID that may be at odds with one another. The name provided during the online checkout process might not be associated with the credit card number and shipping address, for instance, even if it is associated with the device IP and web behavior. Advanced identity resolution using both deterministic and probabilistic matching will determine with a high degree of accuracy which customer ID should be linked with the interaction.
What Level of Identity Resolution Do I Need?
There are three identity resolution categories, each with its own level of trust and confidence in the record produced and based on the underlying purpose or use case. Broadly speaking, the categories are broken out into identity resolution in AdTech, in MarTech and in the highly regulated industries of healthcare and financial services. Tunable identity resolution capabilities allow companies to adjust match accuracy levels according to the desired use case.
- In the AdTech space, identity resolution is often limited to the use of third-party data in an anonymous capacity and is therefore the least stringent of the three. In this use case, an advertiser might use minimal segmentation rules to target a broad audience, where an exact match is unimportant and any PII is encrypted.
- In the MarTech space, where marketing and other business functions are intent on providing a consistently personalized customer experience, identity resolution is an important process in generating an accurate Golden Record. First-party data compiled and stored in a customer data platform (CDP) is used to link, analyze and deduplicate customer records to provide a consistent CX, among many other use cases.
- For regulated businesses including healthcare and financial services that are subject to customer identity access management (CIAM) protections and guidelines, identity resolution will be tuned to produce an exact match for many business functions, including a provider engaging with a patient about a treatment plan or any other sharing of PHI.
For each identity resolution category, the ultimate function is to produce outcomes that meet a business’s specific needs. A highly accurate and trustworthy match, for example, is required to communicate with a person about a transaction perhaps deeper into the customer journey, where a looser match may be sufficient for earlier stages of a journey, e.g. communicating marketing information or educational content to an individual or household.
Is Identity Resolution a Function of Every Customer Data Platform?
It should be. The creation of a unified profile is listed by the CDP Institute as one of the five required functionalities to be considered a customer data platform. For that reason alone, every customer data platform will claim some form of identity resolution. However, it’s important to look under the hood because not every CDP treats identity resolution with the same level of importance.
CDP’s with basic capabilities limit matching to a deterministic, exact match approach. An example is considering two records a match with an exact match of phone numbers, physical address or first and last name.
An exact match approach is insufficient when the goal is to create a differentiated customer experience. One issue is that customers have multiple identifiers. Matching an exact address fails to account for situations where a customer might be using an alternative address, or phone number, email address, loyalty account number, etc. Another is that an exact match does not allow for human error, such as a data entry clerk mis-typing a street address on a form. A customer might make a similar error, or simply write the same physical address or even their name differently from one form to the next. (Street vs. St., Dave vs. David, etc.).
A CDP with advanced identity resolution capabilities, on the other hand, will incorporate both deterministic and probabilistic matching techniques to account for all variances in how customers identify themselves.
What is Householding in the Context of Identity Resolution?
Advanced identity resolution capabilities will also accommodate householding, or identifying members of a business. By including all disparate signals about a customer (prospect, household, business, B2B client, etc.), advanced identity resolution provides a contextual customer understanding, and identifies relationships (person to household, person to organization, organization to organization) as well as matches.
Another distinction between a CDP that performs advanced identity resolution vs. basic identity resolution is the latter often ignores data cleansing steps or relies on third-party reference files with the assumption that the files are updated and accurate. In reality, reference files might be updated infrequently using non-persistent keys, meaning that every time there is new information about a customer there is no ability to make a longitudinal match based on previous keys.
If one of your use cases for implementing a composable CDP is to develop a single view of the customer for segment-of-one marketing, it is important to know how the CDP approaches identity resolution, and whether there is an assumption that the data used to create a unified profile has already been cleansed and made ready for business use.
What Are the Components of Identity Resolution?
Here are some important capabilities to consider when selecting an identity resolution solution, with details on several components: Data Quality, Data Governance, Data Ingestion, Data Matching, Data Merging, Persistent Key Management, and Data Stewardship.
- Data Quality
High quality data that is ready for business use is crucial for identity resolution to produce accurate, trustworthy results. Data cleansing, normalization and validation reduce errors in the identity resolution process, build trust in the resulting Golden Record and supports the creation and delivery of a consistently personalized customer experience.
Cleansing and normalizing data at the point of entry solves for the traditional “garbage-in, garbage-out” problem, avoiding the downstream problem of inaccurate matching, overmatching or undermatching that render identity resolution ineffective or incomplete. Cleansing at the point of data entry also avoids the need to repeatedly reformat or clean up data when exporting it, as is often done in so-called “Reverse ETL” solutions. A complete, composable CDP’s approach to data quality will take care of all data hygiene and data transformation tasks at the point raw data is ingested.
- Data Governance
Distinct from data quality, data governance creates the framework and rules by which organizations will use customer data. Its main role is to ensure the necessary data informs critical business functions. It rests on a steady supply of high-quality data, putting in place frameworks for security, privacy, permissions, access and other operational concerns.
- Data Ingestion
Data ingestion is the act of importing data and mapping it to known customer attributes for use in data quality and identity processes. With an increasing number of data sources and types, businesses are challenged with ingesting and processing data fast enough to support business goals.
When the purpose of identity resolution is to provide a personalized customer experience, data ingestion must obtain data from every source that could conceivably provide signals about a customer, business, household or other entity. A partial list of data points might include CRM records, IoT data from a connected device, mobile data, web streaming data, social media data, any behavioral data and transactional data. Data can be either streaming or batch, structured or unstructured; first-party data, second-party data and third-party data from marketing technology and enterprise systems are all potential sources of customer signals. Speed and context are important components of data ingestion. Reconciling identifiers and signals at the moment of data ingestion is a key step in resolving an identity in the timeframe needed to deliver a relevant, hyper-personalized omnichannel CX.
When the creation of an identity graph with accurate mapping of incoming data is the goal of data ingestion, the collection of signals that constitute the framework of the customer or entity must be reconciled to provide marketers and business users a structured view of attributes and identifiers. It is possible to ingest data without mapping and parsing it right away, which may produce the benefit of lightning-fast data ingestion with the tradeoff that a resulting identity graph will lack a contextual understanding. Recency is an important component in making sense of incoming signals to construct a consolidated view of the customer or other identity being resolved.
- Data Matching
Once data quality processes have occurred, the next step in identity resolution is the matching of disparate records. Data matching refers to reconciling different records for a customer (household, entity, etc.) across different engagement systems and data sources such as an email address and a digital certificate on a smartphone. Or reconciling, with a high degree of certainty, that separate identifiers belong to the same record (111 Main St. and 111 Main Street).
The distinction between data matching and data merging is that the latter refers to a set of business rules governing how to consolidate a set of matched signals into a single record. In a common householding use case for identity resolution, for example, data matching is the process of matching customers that belong to the same household through sharing common identifiers, typically physical address, shared device, IP address, or phone number. The customers’ names, however, would not be merged, even though they are linked to the same household record.
Data matching is an essential component of advanced identity resolution. Reconciling multiple identity proxies from across the enterprise helps build an anonymous-to-known unified record, providing business users with key context about a customer journey, revealing important clues about customer intent and helping end users better predict a customer’s behaviors.
- Data Merging
Distinct from data matching, data merging is a set of business rules that govern how to consolidate a set of matched signals into a single record. In a householding use case,
Householding is one example when matched records might not be merged, keeping separate customer ID records for each individual household member while also maintaining a separate customer ID for the household. The same concept applies to identity resolution in a B2B environment, where an organization might maintain separate customer IDs for multiple employees and set up algorithms for when to merge different facets or IDs into a corporate ID. The reasons for a merge and/or split must always be clear, and documented. When merge fields are mapped to a Golden Record, users have visibility into why a merge was made and can trust its accuracy.
- Persistent Key Management
Persistent key management in an enterprise CDP is a necessary capability for providing a contextual understanding of a customer journey over time. Persistent keys attach identifiers across multiple data sources from various signals to a unique master record in the CDP. Probabilistic matching depends on persistent key management; if a new unique ID were to be created every time a new data element was introduced, or with every operational update such as a nightly batch processing, there would be no way to reconcile various signals across a multitude of data sources and data fields.
Advanced identity resolution using persistent keys allows data to be sourced from any conceivable source of first-party or third-party data, structured or unstructured, batch or streaming, and reconciled to a master record, providing marketers and business users with a longitudinal view of a customer over time. The one constant in every unique customer journey is change. Whether a customer refers to an individual, a household, business or another entity, customers and relationships are always in flux. Marriages, separations, graduations, relocations, births, job changes and other events all contribute to a new customer understanding. Customers change email and physical addresses, use new and different devices, or have multiple email addresses or physical addresses. Persistent key management allows brands to maintain a consistent view throughout a continual journey while accounting for the vagaries of change, making it easy to create, update, merge or split records when information changes, adjusting their data to inevitable life events.
An alternative to persistent key management such as a tag management system does not provide a longitudinal view of a customer; when new keys are created with every iteration of new data and a new match is made, it is simply a match-in-time. There is no historical context with a previous match. When an organization attempts to enrich its first-party customer data using a third-party reference file that changes keys with every iteration, all context from a previous match is lost. Furthermore, providing first-party data to a third-party, such as an AdTech firm, is not analogous to performing identity resolution within an organization’s own CDP because it does not solve the data quality issue; by entrusting a third-party to perform data quality, an organization will not be able to fully trust the resulting customer record. Outsourcing identity resolution also violates user trust, and makes identity resolution a slow, inefficient process. If the intended identity resolution use case is to produce an accurate and completely trustworthy Golden Record, persistent key management is a foundational requirement.
- Data Stewardship
Data stewardship functionality is an important component of advanced identity resolution, both to oversee sourcing, cleansing, mastering and auditing of data entities that include customer/party, product or site, and to validate that data representing an entity are fit for purpose and available to the people and applications that need them.
A data steward is responsible for overseeing the quality, accuracy and completeness of records, as well as the process for correcting errors and making changes. Ultimately all these processes are automated, but it is important that a data steward be able to deal with exceptions, and to have those exceptions be documented and auditable. As the role concerns data matching, a data steward may be responsible for overseeing discrepancies, deciding whether a record that fails to meet a certain threshold is or is not valid, or determining whether the thresholds themselves are tuned to the right frequency to avoid overmatching and undermatching of records. A data steward may also be responsible for prioritizing what type of customer data is collected based on usage, or similarly deciding how long a customer record may be kept. A steward is a line of defense against the mishandling of customer data or failing to honor customer preferences and regulatory requirements, all of which have the potential for serious consequences. Assigning these and other tasks related to the collecting, keeping, managing and protecting data on behalf of a customer, having a data steward in place is a recognition that identity resolution processes are not done in a vacuum; there are human beings behind the data, and meeting legal and governance requirements
Identity Resolution: A Recap
What is identity resolution? In short, it is an indispensable core function of a complete, composable CDP. Advanced identity resolution is essential for producing an accurate, complete, up-to-date view of a customer required to deliver a hyper-personalized CX.