all the time. Here are my current answers.
It’s eleven years since the term “customer data platform” appeared in 2013. What stages has it gone through over that time?
The first stage was just to recognize that CDP was a separate category of software. What was new about CDP was that it was packaged software that was building a customer database. Before then, customer databases were custom projects, like a data warehouse, and the only packaged marketing software was applications like campaign management systems or predictive modeling tools. The earliest CDPs actually bundled the database building capability with an application. But, fairly soon, vendors realized that there was more value in the database building features, which were rare, than in the applications themselves, which were fairly common.
Once we had identified the CDP category, the next stage was convincing people that it was important. The concept itself was easy to grasp (“all customer data in one place”). But there was initial skepticism about whether it was a legitimately separate category or just another name for existing technologies such as CRM, data warehouses or DMPs. So we spent most of our time explaining the difference between CDP and other systems that also built customer databases. In fact, the formal CDP Institute definition – "packaged software that builds a persistent, unified customer database accessible to other systems" – is carefully crafted so that each term identifies a differentiator between CDP and some other type of system. “Packaged software”, for example, distinguishes CDP from data warehouses and data lakes, which are custom built. I won’t go through the other elements point-by-point.
Once the concept was reasonably well defined, we faced skepticism about need for separate CDP database, compared to just reading existing source data in real time to build profiles. That’s what ”persistent” refers to in the CDP definition. It took the big martech vendors including Salesforce and Adobe several years to accept that. The reason is that building profiles on demand by assembling data in real time just takes too long.
Even after category was established, there was great confusion about what really qualified as a CDP. That’s why the CDP Institute launched its RealCDP certification program, which expanded on our definition by defining five key requirements: assemble all data types, keep all details, retain the data indefinitely (subject to regulatory constraints), build unified profiles, and share the data with other systems. We later added a sixth point relating to two real time capabilities: access to individual customer profiles, for things like call centers and website personalization, and real time event triggers, for things like responding to dropped shopping carts. Of course, we are just one voice among many, and are easily drowned out by vendor promotions. So, unfortunately, some of that confusion persists to this day.
Part of the reason for the continuing confusion is a debate over whether CDP just builds profiles or also should include data activation capabilities such as analytics and personalization. Our view is they’re not essential, but over time, we’ve seen the industry split between CDPs that only build profiles and CDPs that include activation functions. The majority of CDP vendors provide activation, so that’s clearly what most buyers want.
More recently, we’ve seen interest in using customer data beyond marketing, which makes IT and data teams more interested in CDP projects. That’s a big change because when CDP was just a marketing tool, IT teams were very happy to let marketers buy the CDP becaise it kept the marketers happy without consuming IT resources. Now, CDP is too important to be left to marketers. This also makes the CDPs that just build profiles more appealing in some situations – and we have in fact seen slightly faster growth in that group most recently.
IT involvement in turn brings more interest in companies building their own CDP, and leveraging their existing data lakes and warehouses as the foundation. This has been called ‘composable CDP’ although it’s not really the right use of the term ‘composable’. Most people now agree that 'warehouse-based' is a better label. Semantics aside, the problem is that IT teams often underestimate the requirements for a proper CDP and, thus, the work involved in creating one. So it’s important to ensure they do a thorough job of scoping the project in advance, so they make the best choice.
The other thing we’ve seen, more or less continuously, is expansion of CDP into new industries. Originally they were used primarily in retail and media. Then, they grew more common in financial services, hospitality, and telecom, which are all industries that traditionally had pretty good customer data systems. Most recently, we’re seeing CDPs in education, healthcare, and government applications.
We’re also seeing CDP used more for advertising applications, as companies lean more heavily on first-party data to replace the loss of third-party cookies and replace other targeting methods that become harder as privacy rules become more stringent.
In fact, CDP also supports other privacy-related applications, such as closer control over ensuring that contacts are authorized by consumer consent, and providing data clean rooms for privacy-safe data sharing. Funneling all customer list creation through a CDP is one way to avoid breaking privacy rules.
Where do you see the industry headed next?
The biggest issue right now is the movement towards ‘composability’. We can’t control whether IT and data departments try to build their own CDP-equivalent. But we can educate them about the actual requirements and we can give them tools that make it easier to meet those requirements. Many CDP vendors are now breaking apart their systems to provide modules that companies can use as components in building an internal system. Of course, that helps those vendors to survive a transition to a composable CDP world, but it also helps to maintain the reputation of the industry as a whole. What we don’t want to see is composable CDP projects fail because that makes people question the value of the CDP concept itself.
Closely related to the composable trend is a trend for ‘no copy’ access to external data by a CDP. This means the CDP can read data from other systems without copying it into the CDP data store. That used to be more or less impossible at scale, because finding, reading, and integrating masses of external data took too long. With today’s technologies, that process can be much faster, so it becomes a more practical alternative. Some common use cases are reading things that change quickly and are only relevant at the moment you need them, like inventory levels or local weather conditions. Of course, this ability also blurs the distinction between a traditional CDP, which imported all its data, and a warehouse-based CDP, which works with data stored externally. That’s okay but it does add still more confusion to the discussion.
I’ll also make a side note that the original CDP skeptics argued for reading source system data on demand, so it may seem this proves they were right. But so far I don’t think you can have a CDP that only works by reading external data on demand, because some processes like identity resolution and data aggregation still take too long to do purely on the fly. So I see the future CDP as having a core of data that it does copy and store in its own database, for things like the identity graph that tells how to combine data from different sources into each customer’s profile. Whether that database is a CDP or data warehouse doesn't matter: either way, the data will have been copied from the original source system and preprocessed for use by the CDP. The core data could well be supplemented with data read on-demand from external systems, which could be the original source or a data warehouse or data lake. At least in theory, this would give you the best of both worlds: minimal data copying but maximum performance.
Putting aside composability, CDPs will need to adjust to new privacy rules by adding features like encryption, advanced privacy policy management, and data clean rooms. They’ll adjust to new media and new data types, like audio and video, which need to be not just stored but analyzed in ways that are currently close to impossible. And, of course, they’ll adjust to growing AI capabilities, which will make it easier to perform some functions like adding new data sources, matching identifiers that belong to the same person, building predictive models, and analyzing business results. AI will also add to the volume and complexity of data that CDPs have to handle, which itself may require new technology to support. For example, if AI starts to create a huge variety of content personalized for each individual, that makes result analysis vastly more complicated. Somehow the CDP will need to deal with that.
On a more prosaic level, I expect to see more CDPs that are tailored to the needs of particular industries. That’s a typical development for a mature software category. Specialist systems can be more cost-effective to build and deploy, because they use special data structures and features tailored to a particular industry’s needs. This will bring down the cost of CDPs, which has been an obstacle to expanding the base of CDP purchasers.
And, as I’ve already mentioned, I expect to see CDPs used more widely beyond marketing. One thing we’ve learned in recent years is that customers expect a personalized experience every time they interact with a company, whether it’s before they buy a product or after they start using it. That requires making customer data available at every interaction point. Beyond direct customer interactions, teams like product development and operations planning also can benefit from using customer data.
Are there any particular obstacles or threats to the CDP’s continued success?
Certainly CDPs will have to adapt to the changes we’ve already discussed in data types and volumes, in the users for customer data, and in whatever technical developments change the best way for CDPs to be built. Individual vendors may struggle to keep up. But I think the need for complete, sharable customer profiles is here to stay. So to the extent that that’s the core definition of a CDP, the future of the industry is secure.
That said, I do see two major threats to the CDP industry as we know it today.
The first is technical: the cloud databases from specialists like Snowflake and Databricks and general cloud platforms like Google Cloud, AWS, and Microsoft Azure. All those vendors are increasingly eyeing the giant pile of customer data held in a CDP and wanting it for themselves. To get it, they’re adding applications to build profiles, such as data quality and identity resolution, and to do analytics and marketing. Sometimes they make the applications to be features to their own database systems; sometimes they build direct integrations with specialist applications through a marketplace of some sort. Either way, this cuts out the CDP, which has traditionally been an intermediary between enterprise data stores and business applications. Again, this comes back to the notion of the warehouse as the primary customer data store, which makes the CDP database unnecessary. I think threat is limited today because most IT departments don’t have the resources to build those systems for themselves. But as new tools make it easier to build those systems, the threat becomes more important. There are really just two ways for CDPs to respond to this threat:
- one is to become platforms themselves, which is to say, to replace the data warehouse itself. That’s not as crazy as it sounds, since the CDP is really a set of tools that builds the customer database, so it can work on top of a Snowflake or Google BigQuery or whatever database the client wants. In this world, the CDP evolves from a tool to build customer profiles to a tool to build data warehouses in general. Difficult but not impossible, at least for the largest CDP vendors who have the resources to compete.
- The other option is to become applications on top of the warehouse, offering specialized capabilities like cross-channel journey orchestration. The would build on CDPs’ existing capabilities to share their profiles with activation systems, and to themselves do activation functions that apply across all channels, such as predictive models and best offer selection. It’s actually an easier path for most CDP vendors, since it continues their role as intermediaries between data sources and business applications. But it’s also a fairly narrow role and there will be lots of competition. Specialization by industry, company size, and other variables will be the key to avoiding that competition by becoming the best choice in a particular niche.
The second threat isn’t technical, but the ability of organizations to actually make use of a CDP once it’s built. We run into this all the time, when companies tell us their staff doesn’t know what to do with a CDP or doesn’t have the skills to do what they’d like. It’s a problem that will only get worse as customer data grows more complicated and there are more possibilities for using customer data. Maybe AI will solve the problem, and that’s something CDP vendors need to invest in to make happen. But there’s no guarantee that AI will be the answer, and my guess is that even AI will need skilled users to take advantage of the possibilities it creates.
Of course, if the CDP turns out not to be useful, the staff members will blame the CDP, not their own lack of skill. But it doesn’t really matter who takes the blame: if CDPs don’t create value, companies won’t be willing to invest in them. So it’s really important for the CDP industry to help train CDP users, not just in the technical details of how to use a CDP, but in the business programs that a CDP can support.