Back in February, I mentioned that CDP Institute had published a Composable CDP Self-Assessment Tool that asks people what gaps they must fill to convert their current systems into the functional equivalent of a CDP. I recently checked how many responses we’ve received, and was disappointed that there were just fourteen. Obviously this isn’t enough to draw statistically meaningful conclusions, especially when you bear in mind that the audience of CDP Institute members can’t be considered representative of the industry as a whole. But I summarized the answers anyway to see what rough patterns might emerge. They were much more intriguing than I expected.
To set the context: the survey asks the status of 102 customer data management functions in the respondent’s current systems. These are grouped into eleven categories: data capture, data sources, ingestion, data preparation, data storage, identity linking, customer profiles, data sharing, process integration, segment creation, and segment (i.e., audience) output. Answer options are: not needed; needed and available; needed and not available; and don’t know. The most important of these is “needed and not available,” since that’s a gap to be filled. All questions were required.
When I took my first overview of the data, the first, and critically important, observation was that there did seem to be significant clusters in the answers. That was important because it suggested that people were giving accurate answers – at least, to the best of their knowledge – rather than randomly filling in a response. Had the answers been random, I would have expected to see roughly the same distribution of responses to each question, and that was definitely not the case. I’ll guess that requiring answers to all 102 questions filtered out the people who didn’t take the survey seriously.
The next observation was that gaps were more common in some areas than others. The gap percentages (i.e., percentage of "needed and not available" replies) ranged from 46% to 29%. The pattern is clear: the areas with fewest gaps (data sources, data store, data capture, and data sharing) are needed for pretty much any data warehouse, while the most gaps were tied to customer data management (data preparation, identity linking, ingestion, segment output, customer profiles). This adds to the credibility of the data, even allowing for some confirmation bias. More important, it's a reminder that you can’t assume a data warehouse built for other purposes will have all the features needed to support CDP applications.
A look individual functions shows something hidden by the category averages: even categories closely related to data warehouses will still have functional gaps for supporting a CDP. For example, while data warehouses are generally good at storing data, they often lack third party data and privacy management. Similarly, although capturing and sharing data are core capabilities for a data warehouse, they often lack real-time connectivity. This reinforces the need to look at requirements in detail when assessing the suitability of your existing warehouse as the base for a CDP.
It's also worth looking at the functions without the category filters. This shows more clearly that common data warehouse functions (such as loading structured data) are rarely gaps while CDP-specific functions (such as connections to advertising media, anonymous-to-known profile conversations, and end-user data access) are often gaps. There are too few responses to make these rankings anything approaching precise. But, even if they were, what any individual company needs would still depend on its own situation.
Given the small sample size of this data set, it’s best to view these results more as anecdotes than an industry survey. But I still believe they offer some useful guidance to CDP vendors and users:
- For vendors: it’s worth noting which categories show the most gaps. The “composable CDP” discussion was initially driven by companies offering reverse ETL features, which correspond roughly to the data sharing, segment creation, and segment output categories. Ironically, these fall in the middle of category gap rankings. The biggest gaps, and probably the greatest opportunity for CDP components, relate to specialized data preparation, identity linking, ingestion and segment output. I'd guess that the reason the "composable" discussion started elsewhere is tools for specialized data prep, identity linking, etc. are already widely available, so there was less room for new entrants. By contrast, reverse ETL is a relatively new category where it's easier for a new firm to establish itself. CDP is actually just one application of reverse ETL; in "crossing the chasm" terms, it's a beach head where new vendors can establish themselves and then grow beyond.
- For users: the list of common gaps is a helpful reference for ensuring that problematic functions are all considered in a requirements analysis. This is especially important at companies with strong existing data warehouse teams, whose members may not be familiar with CDP requirements. To the extent that users' own systems match the category gap averages, it may make the most sense to consider custom enhancements to existing capabilities for categories with relatively few gaps, while buying new packaged components for categories with many gaps. Of course, if a company has many gaps across all categories, it probably makes more sense to buy a traditional CDP than to buy, integrate, and maintain a large number of separate components.