Thursday, April 25, 2013
Given how much vendors and analysts love to create new categories, I’m genuinely perplexed that no one has yet named this one. I’ll step in myself, and hereby christen the concept as “Customer Data Platform”. Aside from having a relatively available three letter abbreviation (see Acronym Finder for other uses of CDP), the merits of this name include:
- “Customer” shows the scope extends to all customer-related functions, not just marketing;
- “Data” shows the primary focus is on data, not execution; and
- “Platform” shows it does more than data management while supporting other systems
But, you may ask, is this really new? Certainly systems for Customer Data Integration (CDI) have been around for decades: these include specialized products like Harte-Hanks Trillium and SAS DataFlux, CDI features within general data management suites like Informatica and Pentaho, and integration within cloud-based business intelligence products like GoodData and Birst. Many of those products have limited capabilities for working with newer data sources like Web sites and social networks, but the real distinction between them and CDPs is that the older systems are mainly designed to assemble data. Some also provide analytics, but they don't extend to real-time decisions based on predictive models.
Similarly, there have long been specialized systems for real-time interaction management (such as Infor Interaction Advisor and Oracle Real Time Decisions) and for predictive modeling (SAS, IBM SPSS, KXEN). Some interaction managers do create predictive models, and the really big vendors (IBM, SAS, Oracle) have all three key components (CDI, real-time decisions, and predictive models) somewhere in their stables. But systems that closely couple just those features with the goal of feeding data as well as recommendations to execution systems? Those are something new.
By now, you’re probably wondering if I’ll ever get around to actually naming the vendors I have in mind. I’ve recently written about some of them, including Reachforce/SetLogik and Lattice Engines. I also include RedPoint in the mix, because it has all the key capabilities (database development, predictive models, and real time decisions) even though it also offers conventional campaign management. Others I haven’t yet written about include Mintigo and Gainsight. Of course, each has a different mix of features and its own market position. Indeed, several have specifically told me they do not compete with the others. Fair enough, but I still see enough similarity to group them together.
All this is a very long-winded introduction to Causata, yet another member of this new class. By now, you can probably guess Causata’s main functions: assemble customer data from multiple sources, consolidate it by customer, place it in an analytics-friendly format, run predictive models against it, and respond in real time to recommendation requests from other systems including Web sites, email, banner ads, and call centers. And you’d be right.
But that’s not the end of the story. With any product, it’s the details that matter. Causata is particularly strong in the data management department, accepting both batch and real-time data feeds and storing data as different types of events (email sent, Web site visit, call center interaction, etc.), each having predefined attributes. The system also has a particularly sophisticated “identity association” service, which looks for simultaneous events involving different identifiers as a way to link them, and can chain identifiers that were linked at different times. When I spoke with Causata about two months ago, the association rules were pretty much the same for all clients, but they promised users would get more control in the future. Users could already choose which types of associations to use in specific queries.
Causata stores the assembled data in HBase, a Hadoop-based database management system that is particularly well suited to large data volumes, many different data types, and ad hoc queries. In addition to the raw data, the system can store derived values such as aggregations (e.g., number of Web page view in past 24 hours) and model scores. Users can run SQL queries to extract data for analysis and predictive modeling in third-party software including QlikView, Tableau, SAS, and R. Prebuilt QlikView reports show the predictive power of different variables for user-specified events. The lack of native analysis and modeling tools creates some friction for users, but also lets them stick with familiar products. So the pros and cons probably cancel each other out.
The system’s decision tools are straightforward. For each situation, users define a “decision engine” that can select among multiple options, such as campaigns, products, or marketing content. These options can have qualification rules. To make a decision, the system can test the options in sequence and pick the first one for which a customer is qualified, or pick the option with the highest predictive model score. Users can also specify a percentage of customers to receive a random option, to gather data for future decisions. An engine can return multiple decisions for situations that require more than one option, such as a Web page with several offers. Causata has some machine learning algorithms to help with the decision process. It plans to expand these to automatically select the best option in a given situation.
Decision engines are called by external systems through a Web services API that can respond in under 50 milliseconds. This is fast enough to manage Web banner ads – something not all interaction managers can achieve. Model scores and other data are updated in real time during an interaction.
Causata can be deployed on-premise by a client or as a cloud-based service. The vendor says a typical implementation starts with three or four data sources and is deployed in about 30 days – very fast for this type of system. In February, Causata introduced prebuilt applications for cross-sell, acquisition, and return programs in financial services, communications, and digital media. These will further speed deployment.
Pricing is based on the number of data sources and touchpoints, with additional charges based on data storage. Cost begins around $150,000 per year.