These are exciting times in the world of analytical systems. The Web has created new demands to handle unprecedented data volumes and semi-structured data. Cloud-based deployment offers near-infinite hardware scalability and flexibility. Acquisitions by enterprise software giants have opened opportunities for smaller, more nimble alternatives. The result has been an explosion of companies using new techniques for managing and analyzing huge data volumes. Many, including Vertica, Aster Data, Greenplum, and Netezza, have also been assimilated by enterprise vendors. Others, including Paraccel, Kognitio, SAND Technology, and 1010data, have grown steadily while remaining independent.
Quantivo is part of this latest technical flowering. Its core is a data structure that sits somewhere between columnar – now the most common approach for analytical processing – and hierarchical indexes. Specifically, Quantivo identifies unique pairs of values within incoming records and stores each pair only once. It then builds new pairs by identifying unique combinations of each pair with another element (which might itself be previously-identified pair). For example, the first analysis might find all transactions that involve a specific product on a single date. The second pass might find all transactions with that product/date pair that were made by a given customer. The process repeats, working up a hierarchy of combinations. The system also tracks the number of times each pair occurs and the specific records involved, so the full detail of the original data can always be reconstructed. The system doesn’t build pairs for all possible combinations of data elements, although users can define several hierarchies if they want the same element to be part of several pairs. Indexes also allow analysis across pairs that are not directly built into the data.
Quantivo doesn’t hide the details of its approach, but it doesn’t talk much about them, either. This likely reflects a (correct) judgment that users will care more about the system’s benefits than how it works. Those benefits include data compression (typically 10% of the original volume), fast response, high scalability, flexible schemas, handling unstructured data, and efficient processing of queries that cause problems in standard SQL. The choice of hierarchy also inherently organizes data around “concepts”, which can be different from the physical structures of the inputs. Queries are relatively simple because users see only a flat list of data elements; relationships are managed automatically, behind the scenes.
Yes there are trade-offs. The data must be processed during the load – a computation-intensive task that takes longer than creating a simple columnar database. The schema must be designed, which requires some technical expertise and can lead to slower response for queries outside the expected paths. Non-SQL queries require a Quantivo-built user interface, meaning users cannot stick with their familiar SQL-based business intelligence tools.
Some of these issues are mitigated by Quantivo’s other major differentiator: it was designed from the start as a true multi-tenant cloud-based solution. This means it can easily spread its workload across multiple servers and replicate data as needed to support higher data volumes (billions of rows), reduce load times (typically one to two hours per day for daily updates), and improve response time. Support for multiple hierarchies and queries across hierarchies also minimize the price of making a bad decision during initial schema design, since even unplanned queries will execute reasonably efficiently. Designing a schema is relatively simple: in fact, Quantivo plans to add self-service data provisioning, including schema design, in the near future.
The Quantivo user interface is purposely designed to look like other business intelligence tools: users get a list of measures and dimensions, which they drag into place to create pivot tables. They can create filters, define calculated values, add summary levels, and drill down to details. There are simple graphics such as bar and pie charts, but no advanced visualization.
The unique power of the system lies within the filters, which can select data that’s outside the standard dimensions. For example, a filter could select all transactions for customers who purchased a specific product – the type of market basket analysis that’s hard with traditional SQL queries. The system also calculates “association metrics” which compare the data across two groups, such as products in the baskets including product A vs. products in baskets including product B.
Quantivo pricing is based on data volume and
complexity. A typical client pays $40,000 to $50,000 per year for
100 million to one billion rows of data, making the system remarkably affordable for a product of its type. Most clients are using the system for customer data analytics, and the vendor has created connectors for SAP, Salesforce.com, Marketo, Responsys, ExactTarget, Omniture, Google Analytics, IBM, Microsoft, and NCR systems. Quantivo launched as a cloud-based service in 2008 and has an undisclosed number of paying clients.
Tuesday, February 21, 2012
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment