Thursday, July 09, 2009

ParAccel Toots Its Horn and Revs Its Database Engine

Summary: Over the past year, columnar analytical database vendor ParAccel has methodically proven its claims about speed, scalability and easy deployment. Now it's looking to grow fast.

When I first wrote about analytical database vendor ParAccel in a February 2008 post, it was one of several barely distinguishable vendors offering massively parallel, SQL-compatible columnar databases. Their main claim to fame was a record-setting performance on the TPC-H benchmark, but even the significance of that was unclear since few vendors bother with the TPC process.

Since then, ParAccel has delivered an impressive string of accomplishments, including deals with demanding customers (Merkle, PriceChopper, Autometrics, TRX) and an important alliance with EMC to create a “scalable analytic appliance”. To top it off, they recently announced their 2.0 release, a new TPC-H record, and $22 million Series C funding. (Full disclosure: they also hired me to write a white paper.)

Of all these, perhaps the most significant news is that the new TPC-H benchmark comes at the 30 terabyte level.* ParAccel’s previous TPC-H championships were at the 100 GB to 1 TB levels.

The change reflects a general growth in the scale of systems supported by MPP columnar databases. ParAccel reports its largest production installation holds 18 TB of compressed data, which probably translates to something more than 50 TB of input. Segment-leader Vertica reports several production installations larger than 100 TB. Neither had more than 10 TB in production a year ago.

These figures still don’t put the columnar systems in the same ballpark as the petabyte-scale database appliances like Netezza, Greenplum and Aster Data, but they do open up some major new possibilities. In case you’re wondering, ParAccel’s TPC-H results were seven times faster and had 16 times better price / performance than the previous record, held by Oracle.

But pure scalability isn’t the key selling point for ParAccel. More than anything, the company stresses its ability to handle complex queries without specialized data schemas or indexes. This means that existing data structures can be loaded as is and queried immediately. The net result is a much faster “time to answer” than competitive systems, which do tailor schemas and/or indexes to specific questions. It also means that new queries can be answered immediately, without waiting for schema modifications or new indexes.

The 2.0 release extends these advantages with a new query optimizer that handles very complex joins and correlated subqueries; parallel data loading (nearly 9 TB per hour in the TPC-H benchmark) and User Defined Functions; enhanced compression; and “blended scans” that avoid Storage Area Network (SAN) controller bottlenecks by loading SAN data onto compute nodes and querying them directly. It also adds some special features such as Oracle SQL support and column encryption for financial data. Another set of enhancements are designed to provide enterprise-class reliability, availability and manageability, such as back-up and failover. Several of these features are already in production, although the official 2.0 release date is August.

The new release and added funding mark a transition of ParAccel from quiet introduction to full-throated selling. Over the past year, the company has carefully limited its participation in Proof of Concept (POC) competitions, the key selection tool in this segment. This gave it time to refine its POC processes, add system features, and build initial client references. It says it can now complete a typical POC in three days, often leaving while other vendors are still getting started. The company is now ramping up its lead generation and inside sales operations, aiming to grow quickly beyond its dozen-plus existing installations. (To provide some context: Vertica reports more than 100 clients.) We'll see what comes next.

* For some serious doubt-sowing about the new benchmarks, see Daniel Abadi's post (be sure to read the comments) and ParAccel's response. What really matters, as ParAccel points out, is performance in customer POCs. The company says its performance has never been beaten, although there was one tie. (For sheer entertainment, check out the related string on Curt Monash's blog.)

No comments: