Albert Einstein once said that if had an hour to solve a problem, he’d spend 55 minutes thinking about the problem and five minutes thinking about the solution. If we hold that same standard to the research and implementation of a customer data platform (CDP), determining data architecture requirements will take up the lion’s share of our allotted time.
Data architecture decisions stem from answering a business problem: the business needs customer data for normal business activities such as analytics, decisioning and marketing. In ruminating on the problem, data architecture needs crystallize, forming a clear picture about which solution will best satisfy business requirements.
Customer Data Cadence
The first architecture question to ask is what data cadence the CDP needs to support. Will data have to be updated daily, several times each day or in real-time with SLA response times measured in milliseconds? This question arrives at the fundamental business requirement, which is what purpose the data serve. Is the fastest cadence in support of a nightly batch feed for in-store transactions, for example, or in support of real-time website personalization?
Digging deeper, cadence is also a function of customer data source systems; how quickly can those source systems provide the data? A CDP should be architected to data constraints; if there is no source system returning data in real time, it makes little sense to deploy a CDP to process real-time updates. Unless, of course, additional source systems are brought to bear, which brings us to the third and final function of cadence – the service cost model, with the understanding that handling a nightly batch feed is less expensive than architecting for real-time capabilities.
Customer Data Scope
Once a determination is made about the functionality and cadence of customer data, the next consideration is the scope of the customer data. Will the CDP scope include data for known customers, unknown (anonymous website visits and/or prospects), or is it more commonly to support unknown-to-known customer journeys and the management of the entire customer lifecycle? Most organizations contemplating a CDP for the ultimate purpose of competing on customer experience would, I imagine, be interested in the latter, which would encompass managing the lifecycle from discovery (unknown to known) to managing an ongoing/existing customer and finally offboarding.
Next, data architecture entails a host of considerations surrounding the customer data integration lifecycle. Data capture, data ingestion (supporting all required sources, cadences, and formats) and hygiene and cleansing capabilities must all be managed correctly according to business requirements. Data hygiene and cleansing is often an underappreciated part of the data integration lifecycle, but it is a core requirement for accurate customer data integration and formulating a valid customer golden record. These, in turn, are the foundation for delivering a hyper-personalized, omnichannel customer experience. Additionally, the CDP must be designed to manage the transformation of source system data resolution levels (typically the account, order or transaction level) to the customer level.
When data hygiene, resolution level rectification, matching and keying processes are done in one coherent step rather than piecemeal, the unified customer profile and resulting golden record takes shape. These are essential if the CDPs business goal is to deliver personalized, relevant experiences at the cadence of the customer.
Customer Data Format & Stability
Organizations must also understand customer data format and stability, which will determine whether a SQL or NoSQL database is more suitable. Relatively stable and well-structured customer data sources may tilt the scale in favor of SQL, which will provide advantages in terms of skillset, cost and performance optimization. Conversely, if data are variable over time, a NoSQL database will provide advantages in terms of flexibility.
The final data architecture consideration relates to technology choices for the database itself pertaining to size, complexity, performance of queries, cost constraints, etc., to determine whether a straightforward SQL database will suffice, or whether BigQuery, Snowflake or another modern cloud data warehouse platform is better suited for an organization’s data infrastructure.
But as we learned from Einstein’s less-famous “theory,” that’s a decision that should take up far less time than the up-front decisions having to do with cadence, scope, flexibility and data integration. Because at the end of the day, a CDP is a technical solution to a business problem. A precise documentation of business requirements will shed light on the data architecture questions, and once those decisions are made everything else will fall into place. Arriving at a solution will certainly take longer than five minutes, but as we also learned from Einstein – everything is relative.