In a previous blog post, we explored the purpose for augmenting data quality and outlined how augmentation benefits several parts of the data quality process, with a primary focus on the outcome of delivering a personalized customer experience.
To recap, augmenting data quality helps assess the scope and quality of available data across sources to learn what’s needed to transform isolated customer “signals” into a coherent customer understanding. The cleansing, parsing and normalization of data is another data quality area that can be improved by augmentation. With a firm understanding of customer signals and how those signals map to various targets, the next goal of data quality augmentation is to perform identity resolution processes – not just to match records, but to develop a complete, accurate golden record.
Once identity resolution is performed successfully – accurate matching and merging, mapping of relationships and entities, de-duplications, etc. – another task for augmentation via intelligent models is to augment the data itself. What this means is augmenting data to help calculate metrics such as customer lifetime value (CLV) or likelihood to churn – or augmenting the metrics themselves.
Augmentation vs. Human-Based Rules
It’s worth noting that CLV, churn rate and other metrics likely exist with some element of human-driven rules for discerning the propensity of a customer to churn, to sign up for a loyalty program, to increase their average spend, etc. Augmenting this area of data quality means setting up predictive models and then measuring the quality of the machine learning algorithms against the quality of the human-driven rules. The goal, then, is to understand the precise effect of augmentation – is it really obtaining insights beyond the realm of human-driven rules?
This calculation also applies to areas within identity resolution such as match rules – is augmentation producing a better set of matches than otherwise? In any event, the process to put augmentation through its paces, so to speak, is to run the augmentation workload, look at the matches, the lifetime value aggregates or whatever it is you’re testing, run the human-driven workload and see which produced the better outcome.
It may be that each proves valuable, and thus worthwhile to use both. One interesting example of using a combination is to use human oversight to improve on probabilistic matching analysis, with a data steward responsible for examining unresolved matches. The data steward can even be a test bed for additional learning through augmentation, with the decisions the steward makes on grouping together individual records used to train a model in a supervised learning workflow.
The caveat to employing a hybrid approach is there must be open, available mechanisms to define your augmentation, which is true of augmentation in general. What this means is that augmentation works best in a situation where the business knows what needs to be trained and what doesn’t, and where a training set includes modeled output such as matches or CLV. The trained model then becomes smarter based on learning from the output.
Align Augmentation with Specific Business Outcomes
Any consideration of when or whether to augment data quality should include a firm understanding how any model deployed is operating, as well as a firm understanding of the results. This condition ties back to the first important point of augmenting data quality, that it only be done with a clear purpose, i.e., a defined business outcome.
The danger of proceeding without such an understanding is modeling/augmenting areas of data quality without knowing why, which runs the risk of augmenting the wrong things, or augmenting areas of data quality that are not closely tied to an outcome the business is trying to push. Measuring whether a model is making the expected decisions, and how it does so, provides transparency to the process that ensures the augmentation of any area of data quality produces the desired outcomes.
Lastly, there is a connecting thread between all areas of augmented data quality that itself might stand to benefit from augmentation. The individual components – knowing what’s needed to transform signals into a customer understanding; pulling together accurate, cleansed and normalized signals to create an accurately matched, clean golden record; and the processes to supplement a golden record with enhanced data – combine to form an overall process where augmentation may help discover data quality trends. Exploring data quality trends, such as what’s happening from a data quality standpoint from individual sources, may help measure whether data is fit for purpose and identity areas for corrective action.
Following the proper augmentation guidelines and processes can provide a business with a trust index providing an overall data perspective, essentially a “credit score” for the quality of the data and processes. A trust index, compiled from individual elements of data and individual sources, can itself be augmented with models that convey the readiness of data, how trustworthy it is, where to bolster weak points, etc. Augmenting data quality is a closed loop process that, at its core, asks pertinent questions of whether data is fit for a specific business purpose as the ultimate final measurement.