Thursday, July 24, 2008

Sybase IQ vs. Vertica: Comparisons are Misleading, But Fun

I received the “Vertica Fast Lane” e-newsletter yesterday, which I am amused to note from its URL is generated by Eloqua. (This is only amusing because I’m researching Eloqua for unrelated reasons these days. Still, if I can offer some advice to the Vertica Marketing Department, it’s best to hide that sort of thing.)

The newsletter contained a link to a post on Vertica’s blog entitled “Debunking a Myth: Column-Stores vs. Indexes”. Naturally caught my attention, given my own recent post suggesting that use indexes is a critical difference between SybaseIQ and the new columnar databases, of which Vertica is the most prominent.

As it turned out, the Vertica post addressed a very different issue: why putting a conventional B-tree index on every column in a traditional relational database is nowhere near as efficient as using a columnar database. This is worth knowing, but doesn’t apply to Sybase IQ because IQ’s primary indexes are not B-trees. Instead, most of them are highly compressed versions of the data itself.

If anything, the article reinforced my feeling that what Sybase calls an index and what Vertica calls a compressed column are almost the same thing. The major difference seems to be that Vertica sorts its columns before storing them. This will sometimes allow greater compression and more efficient searches, although it also implies more processing during the data load. Sybase hasn’t mentioned sorting its indexes, although I suppose they might. Vertica also sometimes improves performance by storing the same data in different sort sequences.

Although Vertica’s use of sorting is an advantage, Sybase has tricks of its own. So it’s impossible to simply look at the features and say one system is “better” than the other, either in general or for specific applications. There's no alternative to live testing on actual tasks.

The Vertica newsletter also announced a preview release of the system’s next version, somewhat archly codenamed “Corinthian” (an order of Greek columns—get it?. And, yes, “archly” is a pun.) To quote Vertica, “The focus of the Corinthian release is to deliver a high degree of ANSI SQL-92 compatibility and set the stage for SQL-99 enhancements in follow-on releases.”

This raises an issue that hadn’t occurred to me, since I had assumed that Vertica and other columnar databases already were compliant with major SQL standards. But apparently the missing capabilities were fairly substantial, since “Corinthian” adds nested sub-queries; outer-, cross- and self-joins; union and union-all set operations; and VarChar long string support. These cannot be critical features, since people have been using Vertica without them. But they do represent the sort of limitations that sometimes pop up only after someone has purchased a system and tried to deploy it. Once more, there's no substitute for doing your homework.

1 comment:

Mark said...

Glad you mentioned doing your homework. It's surprising what you find missing when you start throwing BI-tool generated SQL and hand-written SQL at these databases.

Limitations are only apparently with some real testing, e.g. until Vertica v4 you had to colocate data for joins, which means you couldn't join two tables together if they were distributed differently, such as two fact tables in a dimensional model.

SQL compatibility and optimization are still big problems because it's hard to do those things right and simple brute force parallelism or column storage won't solve them.