Comments on Customer Experience Matrix: Looking for Differences in MPP Analytical Databases

Hi Andy,Thanks for your question. I've addressed ...

2008-10-21T13:45:00.000-04:00

Hi Andy,

Thanks for your question. I've addressed that topic in a some detail in a recent DM Review column, which you can find in my archive at
http://archive.raabassociatesinc.com/2008/07/analytical-database-options/

Basically, the MPP systems use a row-oriented structure and rely on partitioning to spread the work of querying across multiple processors and memory stores. The columnar systems use a column-oriented structure that lets them reduce the work of querying by only reading the columns of data needed for a particular query. The one other salient point is that the columnar systems generally have not been deployed at the 100TB scales common to the MPP systems. (Sybase IQ is a major exception--it has proven itself at that level.) Of course, the vendors swear scalability is no problem, but I always want to see some production installations first.

This is a really useful posting. How are these too...

2008-10-21T13:17:00.000-04:00

This is a really useful posting.
How are these tools different from vendors like Vertica and ParAccel that boast the use of column based DBMS

http://www.asterdata.com/blog/index.php/2008/10/06...

2008-10-07T10:57:00.000-04:00

http://www.asterdata.com/blog/index.php/2008/10/06/aster-ncluster-30-aligning-product-with-vision/

http://www.asterdata.com/blog/index.php/2008/10/06/growing-your-business-with-frontline-data-warehouses/

http://www.asterdata.com/blog/index.php/2008/10/06/conversation-with-lenin-gali-director-of-bi-for-sharethis/

David:Two other points (apologies for multiple pos...

2008-08-29T15:49:00.000-04:00

David:
Two other points (apologies for multiple posts): Kognitio's database is WX2, not WS2...and to clarify, WX2 has been repeatedly and significantly upgraded during its time on the market. Yes, the database is the "most mature" on the market today (to use your words), but it's also in its sixth iteration, with significant investment in time and money being made to get WX2 to the point where it is today. It's so stable, in fact, that several companies are using WX2 in a BC/DR implementation. You may be less likely to do that with databases where the code is not as proven over time.

Thanks, again...and have a great weekend.

I did receive a clarification from Kognitio about ...

2008-08-29T12:33:00.000-04:00

I did receive a clarification from Kognitio about how they WX2 avoid excessive inter-node traffic. The one-sentence answer is that when data is loaded from disk to memory, the system redistributes it so related data is in the same node’s memory.

The long answer explains how they do this efficiently. One critical point is that WS2 converts multi-step SQL statements into a single stream of machine language. This extracts only the necessary data and determines how it should be organized, so the volume of data is minimized and the data makes just one trip through the network before reaching the correct node. It also lets the system process all intermediate results in memory, rather than writing th results of each step to disk as a conventional SQL engine might. Another critical point is that the system can store some data permanently in memory, so it does not need to reload it from disk for each new query. This data is selected by the database administrator, not automatically, so there is some user skill involved. The system does provide some help via reports on past results.

The Kognitio Web site offers several white papers with some additional information. Start with the one written by Bloor Research, whose work I greatly respect. But even that paper doesn’t make the critical point about data being redistributed when loaded into memory--you read it here first.

David - Thanks for writing about Aster. From our e...

2008-08-29T11:05:00.000-04:00

David - Thanks for writing about Aster. From our experience, intelligent partitioning is actually far superior to round-robin (also known as 'random') scattering of data across nodes. With random scattering, the system cannot exploit the benefit of optimal organization of data on different nodes and needs to pull data from all nodes. This actually forces systems that do random scattering to keep data in memory, because pulling all data from disks is infeasible. Since memory per node is limited, in-memory systems require a lot of hardware (e.g. 100 nodes for 5 TB) increasing total cost significantly. Aster goes beyond simple hash partitioning and have patent-pending partitioning algorithms for optimally placing data across nodes (POD Partitioning). In addition, Aster nodes optimize query performance by combining compression and aggressive buffering to optimally shuffle data segments at query run-time (POD Transport). With Aster, one could set up a 5 TB system with 3-4 nodes. With 100 nodes, the capacity would be 150-200 TB, giving far superior price/performance results.

I shall ask them to clarify...

2008-08-29T07:31:00.000-04:00

I shall ask them to clarify...

Agreed -- I don't see the connection between loadi...

2008-08-28T23:53:00.000-04:00

Agreed -- I don't see the connection between loading data into memory and reducing inter-node traffic either :)

Sorry to have been obscure. Most MPP databases pl...

2008-08-28T22:11:00.000-04:00

Sorry to have been obscure. Most MPP databases place related data on the same node. For example, if most of your queries look at data by customer, you might put all data for the same customer on the same node. This means that most query processing can occur without moving data from one node to another, which improves performance because such movement could easily become a bottleneck. This allocation process is what I referred to as "intelligent partitioning", and, as I said, is a very common strategy. Kognitio doesn't bother with this, apparently because loading data into memory before the query is executed allows it to reduce the volume of inter-node traffic. Quite honestly, I don't see why that is so, but I'll take their word for it.

David,You made a statement I didn't quite understa...

2008-08-28T21:22:00.000-04:00

David,

You made a statement I didn't quite understand:

"It can do this [distribute incoming data in a round-robin fashion] without creating excessive inter-node traffic because it loads data into memory during query execution"

Can you explain this in a little more detail?

David:One quick point, if I may: Kognitio's offer...

2008-08-28T16:25:00.000-04:00

David:
One quick point, if I may: Kognitio's offering is called Data Warehousing as a Service (DaaS). In addition, please understand that Kognitio allows its WX2 database to be deployed across multiple locations at no additional charge, since its licensing is based on the amount of data being analyzed, not seats or sites.

Thanks.