Monday, October 08, 2007

Proprietary Databases Rise Again

I’ve been noticing for some time that “proprietary” databases are making a come-back in the world of marketing systems. “Proprietary” is a loaded term that generally refers to anything other than the major relational databases: Oracle, SQL Server and DB2, plus some of the open source products like MySQL. In the marketing database world, proprietary systems have a long history tracing back to the mid-1980’s MCIF products from Customer Insight, OKRA Marketing, Harte-Hanks and others. These originally used specialized structures to get adequate performance from the limited PC hardware available in the mid-1980’s. Their spiritual descendants today are Alterian and SmartFocus, both with roots in the mid-1990’s Brann Viper system and both having reinvented themselves in the past few years as low cost / high performance options for service bureaus to offer their clients.

Nearly all the proprietary marketing databases used some version of an inverted (now more commonly called “columnar”) database structure. In such a structure, data for each field (e.g., Customer Name) is physically stored in adjacent blocks on the hard drive, so it can be accessed with a single read. This makes sense for marketing systems, and analytical queries in general, which typically scan all contents of a few fields. By contrast, most transaction processes use a key to find a particular record (row) and read all its elements. Standard relational databases are optimized for such transaction processing and thus store entire rows together on the hard drive, making it easy to retrieve their contents.

Columnar databases themselves date back at least to mid-1970’s products including Computer Corporation of America Model 204, Software AG ADABAS, and Applied Data Research (now CA) Datacom/DB. All of these are still available, incidentally. In an era when hardware was vastly more expensive, the great efficiency of these systems at analytical queries made them highly attractive. But as hardware costs fell and relational databases became increasingly dominant, they fell by the wayside except for a special situations. Their sweet spot of high-volume analytical applications was further invaded by massively parallel systems (Teradata and more recently Netezza) and multi-dimensional data cubes (Cognos Powerplay, Oracle/Hyperion EssBase, etc.). These had different strengths and weaknesses but still competed for some of the same business.

What’s interesting today is that a new generation of proprietary systems is appearing. Vertica has recently gained a great deal of attention due to the involvement of database pioneer Michael Stonebraker, architect of INGRES and POSTGRES. (Click here for an excellent technical analysis by the Winter Corporation; registration required.) QD Technology, launched last year (see my review), isn’t precisely a columnar structure, but uses indexes and compression in a similar fashion. I can’t prove it, but suspect the new interest in alternative approaches is because analytical databases are now getting so large—tens and hundreds of gigabytes—that the efficiency advantages of non-relational systems (which translate into cost savings) are now too great to ignore.

We’ll see where all this leads. One of the few columnar systems introduced in the 1990’s was Expressway (technically, a bit map index—not unlike Model 204), which was purchased by Sybase and is now moderately successful as Sybase IQ. I think Oracle also added some bit-map capabilities during this period, and suspect the other relational database vendors have their versions as well. If columnar approaches continue to gain strength, we can certainly expect the major database vendors to add them as options, even though they are literally orthogonal to standard relational database design. In the meantime, it’s fun to see some new options become available and to hope that costs will come down as new competitors enter the domain of very large analytical databases.

No comments: