June 3, 2022

What to Know About Metadata at the Data Layer

We can make a distinction between data and metadata as they relate to delivering a personalized customer experience (CX) by thinking about putting together a jigsaw puzzle. If data elements are the collective pieces that create a finished work when put together with precision, metadata are implicit or explicit attributes of the pieces. The shadings, the shapes, the connections, the number of border pieces – anything that helps form an understanding of the bigger picture. Understanding and using those individual attributes is indispensable to organizing and completing the finished product which, if the analogy holds, is a personalized CX.

In a broad sense, metadata is just data about data. Understanding the various categories of metadata, particularly with the collection of customer and product data, may help explain the role of metadata in creating and delivering a personalized customer experience.

The first realm of metadata pertains to the semantic meaning of an individual piece of data, attributes that determine how we use and relate the data. This category controls naming, matching, and parsing protocols, default values and/or limitations on values – anything that helps a user of the data understand the meaning of underlying data and its contribution to the automation of the data quality process.

Understanding that a five-digit number is a ZIP code rather than a quantity is one example, that might allow a system to apply default rules are for collecting the number as five digits or with a ZIP-plus four delivery route code. How a currency is entered is another example, such as rules or defaults for how many digits after a decimal, what currency symbol is allowed, or even parsing or displaying thousands of separators. Each data element will have its own rules for how to handle elements individually or collectively; currencies may be added together, for instance, where ZIP codes may not.

Semantic metadata supports tasks and rules for the collection and analysis of incoming data. When an individual piece of data is collected, this metadata allows software or people to analyze the individual attributes to provide an understanding of what the data means.

Operational Statistics

A second realm of metadata as it relates to the collection of data from various sources is an operational understanding, both of the data itself and the characteristics of the data source.

The operational characteristics include statistics such as the percentage of values in a particular column that had errors. How many ZIP code entries in a nine-digit field had just five digits? Were street addresses uniform? What percentage of emails show as undeliverable? Operational characteristics of the data source itself pertain more to questions such as how often are you reading from a particular data source, when did you last query it, what is the average delay between making a call out to a system and the system responding. In general, is the system operationally capable of handling what the user wants to accomplish? An understanding of this realm of metadata contributes to the transparency of overall system performance and the quality of information being put into the system. Is the data fit for purpose, essentially, and is it usable?

Metadata and Trust

The first two categories of metadata speak to the level of trust that a user, a marketing department, an organization has in data being collected. Semantic attributes and operational statistics about incoming data provide core metrics that represent the meaning and quality of the data itself and the quality of the data source in terms of its availability, recency, and performance.

Trust, in this context, is derived from aggregate calculation over time across multiple attributes based on a set of rules and models that determine data quality. Trust stems from knowing the sources of metadata and understanding how you arrive at a particular piece of metadata. It ranges from very simple metadata, such as a database providing column names for a table, to much more complex such as parsing through all the values of a particular column to derive a statistic about the data itself.

In addition to metadata describing incoming data from various data sources and providing meaning to the data, there are additional categories of metadata more closely associated with the use of the data in building or mediating customer experience, such as a relevancy/value index and various permissions.

In a follow-up blog, we’ll delve into those metadata categories and how they help not only with an understanding of the underlying data, but also their importance in creating and refining a personalized customer experience.