Sunday, October 29, 2023

Does CDP Need a New Definition?

The earliest Customer Data Platform systems were introduced before 2010; the term CDP was coined to describe this emerging class in 2013. My definition had changed very little when we launched the CDP Institute in 2016, and has been the same ever since: “packaged software that builds a unified, persistent customer database accessible to other systems”. The Institute added the RealCDP checklist* in 2019 to attach more specifics to the definition in the hopes of helping buyers ensure a system that called itself a CDP could actually support the use cases they expected a CDP to support. By then, industry analysts were beginning to offer their own definitions which, while worded differently, were broadly consistent with the Institute definition. Even the major marketing suite vendors, who initially argued a separate (“persistent”) database wasn’t necessary, eventually discarded that position and introduced products that matched our criteria.

A successful concept like CDP quickly takes on a life of its own. It soon became apparent that many people were using CDP in a much looser sense to mean any system that built and shared customer profiles. This extended past packaged software to include custom-built systems and included systems whose scope was more limited than a true CDP. This expansion skewed some survey resuls but otherwise seemed relatively harmless; in any case, resistance seemed both pedantic and futile. What really mattered was these systems still gave CDP users access to the unified profiles.

Unfortunately, the evolution of the term didn’t stop there. As CDP became popular, many vendors adopted the label whether or not they actually met even the looser definition. At the same time, legitimate CDP vendors offered additional capabilities to analyze and deploy the data in the profiles. The resulting confusion ultimately led some vendors to avoid the CDP label entirely because it no longer provided a useful way to differentiate their products. Today, vendors seeking the latest label are more likely to call themselves digital experience managers than a CDP, even if their products meet the CDP requirements.

But the greatest challenge to the utility of the CDP label arose in the past few years when a number of vendors chose to claim that the core feature of the CDP – a dedicated database built by importing data from other systems, a.k.a. “persistence” – could be abandoned while still calling the result a CDP.  Their argument was the customer profiles could reside in a general purpose data warehouse, which most companies already had in place. 

The claim gained some plausibility from the fact that modern cloud data warehouse technologies, such as Snowflake, Google Big Query, and AWS Redshift, are in fact used by some conventional CDP vendors. The problem was they implicitly assumed that every company’s data warehouse had organized the data into useable customer profiles. In fact, very few data warehouses perform the specific tasks, most notably identity resolution, customer-level aggregation, and real-time response, needed to support CDP use cases. While it’s technically possible to add those features to an existing data warehouse, it’s usually a major project that often costs more, takes longer, and delivers less useful results than installing a separate, conventional CDP. (As always, the details depend on the situation.)

One positive result of the interest in warehouse-based profiles has been the decision of some CDP vendors to break their systems into modules that let users buy the data preparation functions separately from the rest of the CDP. This lets companies that want a warehouse-based system to still benefit from the mature capabilities those CDP vendors have developed over many years. These vendors have also often added the ability to combine data from a warehouse or other external system with data stored in a conventional, persistent CDP database, without actually loading that external data into the CDP. This gains some advantages of the warehouse-based approach, such as reduced data movement and storage costs, while retaining the benefits of the persistent CDP, such as greater control and flexibility.

You’ll note that all these changes affect the input side of the CDP data flow. It was once a very simple process: data from other systems was copied into the CDP, where it was formatted into profiles and shared with other systems. Now, the input process may be some mix of copying data into the CDP, reading fully-formed profiles from a warehouse, or combining internal and external data. By contrast, the delivery side of the process has remained the same: the CDP shares its profiles with other systems. 

In some cases, this has led to a subtle shift in perception of the CDP’s purpose: from a system that builds customer profiles, to a system that delivers those profiles to other systems. In this view, the core function of the CDP is to convert general purpose profiles into the specific formats needed by analytical, orchestration, and delivery systems (which we can call “activation systems” if you’ll trade some jargon for simplicity). This may still involve some data processing, such as advance calculation of model scores and aggregates to make them available in real time, and new data structures to hold the results of that processing. So there’s more happening here than a simple data transfer or API call.

Some people carry this shift even further and argue the CDP should really be defined as an activation system that reads profiles from somewhere else. Since three-quarters of the CDPs we track actually do have at least some activation capabilities, this isn’t quite as crazy as it may sound.

All that said, I’m not yet ready to redefine CDP. “Packaged software” and “persistent customer database” are meaningful terms that distinguish one configuration from other approaches (“custom software” and “external profiles”) that can also make profiles available to other systems. More important, the packaged/persistent configuration has significant cost and execution advantages over the custom/external alternative. So it’s important to avoid blurring the distinction between the two approaches. 

 At most, I’d offer retronyms that distinguish the “packaged CDP” (packaged/persistent) from the “warehouse CDP” (custom/external). By definition, the “warehouse CDP” doesn’t build its own profiles, so the term seems to ignore the fundamental function of the CDP.  You might call that oxymoronic, if not just plain moronic. But if we slightly redefine the core function of the CDP to be delivery of profiles rather than creating them, we can overcome that objection and, perhaps, give the market terms it intuitively understands.

You’ll also note the shift from profile creation to delivery puts the emphasis on the delivery side of the CDP flow, which we’ve noted has been the most stable and applies to both the packaged and warehouse approaches. This lets us join the trend to redefining CDP as a profile sharing system.

In short, the key distinction is whether the primary customer profiles are built and stored in the company data warehouse or in a separate CDP database. It’s worth maintaining that distinction because one approach relies on the company’s technical staff to assemble the functions needed to build and store the profiles, while the other relies on a packaged CDP to provide those functions. There are variations within each theme, including whether the warehouse uses modules from a CDP vendor to assemble its profiles or to help deliver them, whether real-time data is posted to the warehouse or held separately, and whether the CDP enriches its profiles with external data without importing that data. These are important from a practical standpoint, but do not affect the fundamental architecture of the system. Focusing first on where the profiles reside should help buyers understand the most important choice they have to make. This choice will in turn determine the other decisions they have to consider. 

 

_________________________________________________________

* ingest data from all sources, retain all details, keep the data as long as desired, build unified profiles, share the profiles with any other system, and enable real-time event-triggers and profile access.