Friday, August 03, 2007

What Makes QlikTech So Good?

To carry on a bit with yesterday’s topic—QlikTech fascinates me on two levels: first, because it is such a powerful technology, and second because it’s a real-time case study in how a superior technology penetrates an established market. The general topic of diffusion of innovation has always intrigued me, and it would be fun to map QlikView against the usual models (hype curve, chasm crossing, tipping point, etc.) in a future post. Perhaps I shall.

But I think it’s important to first explain exactly just what makes QlikView so good. General statements about speed and ease of development are discounted by most IT professionals because they’ve heard them all before. Benchmark tests, while slightly more concrete, are also suspect because they can be designed to favor whoever sponsors them. User case studies may be most convincing evidence, but resemble the testimonials for weight-loss programs: they are obviously selected by the vendor and may represent atypical cases. Plus, you don’t know what else was going on that contributed to the results.

QlikTech itself has recognized all this and adopted “seeing is believing” as their strategy: rather than try to convince people how good they are, they show them with Webinars, pre-built demonstrations, detailed tutorials, documentation, and, most important, a fully-functional trial version. What they barely do is discuss the technology itself.

This is an effective strategy with early adopters, who like to get their hands dirty and are seeking a “game changing” improvement in capabilities. But while it creates evangelists, it doesn’t give them anything beyond than own personal experience to testify to the product’s value. So most QlikTech users find themselves making exactly the sort of generic claims about speed and ease of use that are so easily discounted by those unfamiliar with the product. If the individual making the claims has personal credibility, or better still independent decision-making authority, this is good enough to sell the product. But if QlikTech is competing against other solutions that are better known and perhaps more compatible with existing staff skills, a single enthusiastic advocate may not win out—even though they happen to be backed by the truth.

What they need is a story: a convincing explanation of WHY QlikTech is better. Maybe this is only important for certain types of decision-makers—call them skeptics or analytical or rationalists or whatever. But this is a pretty common sort of person in IT departments. Some of them are almost physically uncomfortable with the raving enthusiasm that QlikView can produce.

So let me try to articulate exactly what makes QlikView so good. The underlying technology is what QlikTech calls an “associative” database, meaning data values are directly linked with related values, rather than using the traditional table-and-row organization of a relational database. (Yes, that’s pretty vague—as I say, the company doesn’t explain it in detail. Perhaps their U.S. Patent [number 6,236,986 B1, issued in 2001] would help but I haven’t looked. I don’t think QlikTech uses “associative” in the same way as Simon Williams of LazySoft, which is where Google and Wikipedia point go when you query the term.)

Whatever the technical details, the result of QlikTech’s method is that users can select any value of any data element and get a list of all other values on records associated with that element. So, to take a trivial example, selecting a date could give a list of products ordered on that date. You could do that in SQL too, but let’s say the date is on a header record while the product ID is in a detail record. You’d have to set up a join between the two—easy if you know SQL, but otherwise inaccessible. And if you had a longer trail of relations the SQL gets uglier: let’s say the order headers were linked to customer IDs which were linked to customer accounts which were linked to addresses, and you wanted to find products sold in New Jersey. That’s a whole lot of joining going on. Or if you wanted to go the other way: find people in New Jersey who bought a particular product. In QlikTech, you simply select the state or the product ID, and that’s that.

Why is this a big deal? After all, plenty of SQL-based tools can generate that query for non-technical users who don’t know SQL. But those tools have to be set up by somebody, who has to design the database tables, define the joins, and very likely specify which data elements are available and how they’re presented. That somebody is a skilled technician, or probably several technicians (data architects, database administrators, query builders, etc.). QlikTech needs none of that because it’s not generating SQL code to begin with. Instead, users just load the data and the system automatically (and immediately) makes it available. Where multiple tables are involved, the system automatically joins them on fields with matching names. So, okay, someobody does need to know enough to name the fields correctly – but that’s just all the skill required..

The advantages really become apparent when you think about the work needed to set up a serious business intelligence system. The real work in deploying a Cognos or BusinessObjects is defining the dimensions, measures, drill paths, and so on, so the system can generate SQL queries or the prebuilt cubes needed to avoid those queries. Even minor changes like adding a new dimension are a big deal. All that effort simply goes away in QlikTech. Basically, you load the raw data and start building reports, drawing graphs, or doing whatever you need to extract the information you want. This is why development time is cut so dramatically and why developers need so little training.

Of course, QlikView’s tools for building reports and charts are important, and they’re very easy to use as well (basically all point-and-click). But that’s just icing on the cake—they’re not really so different from similar tools that sit on top of SQL or multi-dimensional databases.

The other advantages cited by QlikTech users are speed and scalability. These are simpler to explain: the database sits in memory. The associative approach provides some help here, too, since it reducing storage requirements by removing redundant occurrences of each data value and by storing the data as binary codes. But the main reason QlikView is incredibly fast is that the data is held in memory. The scalability part comes in with 64 bit processors, which can address pretty much any amount of memory. It’s still necessary to stress that QlikView isn’t just putting SQL tables into memory: it’s storing the associative structures, with all their ease of use advantages. This is an important distinction between QlikTech and other in-memory systems.

I’ve skipped over other benefits of QlikView; it really is a very rich and well thought out system. Perhaps I’ll write about them some other time. The key point for now is that people need to understand QlikView using a fundamentally different database technology, one that hugely simplifies application development by making the normal database design tasks unnecessary. The fantastic claims for QlikTech only become plausible once you recognize that this difference is what makes them possible.

(disclaimer: although Client X Client is a QlikTech reseller, they have no responsibility for the contents of this blog.)

13 comments:

Unknown said...

Excellent article on Qliktech. I am interested to know what you think are Qlikview's limitations. I cant imagine this software performing well in a complex terabyte environment.

David Raab said...

It’s important to recognize that QlikTech is a reporting system, not a data warehouse. You’ll still need your multi-terabyte warehouse to assemble the data. QlikTech then will work with extracts. QlikView scripting does provide some data load and transformation capabilities, but it will not replace enterprise ETL tools.

As to scalability – QlikTech has installations with hundreds of millions of rows of data, and I have personally worked with tens of millions of rows with no problems. Many billions of rows is another matter, but results would depend on the details.

tonyyang said...

Qliktech markets themselves by saying that a data warehouse is not required.

You stated that it does not replace enterprise ETL tool and a multi-terabyte data warehouse.

I just wanted some clarification on this.

I am at a startup that wants to build good analytics reporting that will scale quickly with the site's growth and will not require us to invest money in dedicated resources to maintain a large data warehouse. Does Qliktech sound right for us? We don't want to build this, and grow out of it within a year and need to take another route.

Thanks for your time. Really appreciate your informative blog post on this topic.

David Raab said...

First of all, let me repeat that although my firm is a QlikView reseller, comments in this blog are strictly my own.

QlikView can join separate tables without bringing them into the same database; that's really the sense in which a warehouse is not required. It also is relatively insensitive to the structures of the tables it works with, and typically goes against detail rather than summary data. This means you don't need the careful database design and data cubes required by most business intelligence tools. That's a huge benefit in terms of design cost, flexibility and deployment time.

As to ETL: QlikView connects to major data sources (SQL, XML, text files) and has a powerful scripting language. So it can do quite a bit. But if you have complicated data integration requirements, like 'fuzzy' matching on names and addresses or Web log analysis, you'll still need an enterprise ETL tool like Informatica or a specialized Web analysis system like Omniture.

I was told two years ago that the largest QlikView installation is about two billion rows. There might be a bigger one by now. I don't know the size in storage but maybe it's terabytes. In general, I wouldn't expect many people to have a machine with that much memory, even allowing for the compression that QlikView creates. But I'd also expect someone with that much data to do some reduction before loading it into QlikView or any other analysis tool.

I can't judge your business situation without knowing the details. But one of the nicest things about QlikTech is its very low cost in all dimensions: software, hardware, setup, and report development. So it's actually a very smart strategy to start with QlikView even if you just think of it as a data exploration and prototyping tool. It will almost certainly meet your initial needs. Then, if you need something else after a year, you'll still have gotten a year's worth of data access, learned a lot on a small investment, and avoided a big investment that might have been a mistake. In practice, I'm pretty sure you'd end up keeping QlikView as a reporting tool even if you added other tools to do data preparation and maintain a big repository.

I hope this helps. Feel free to contact me directly at draab@clientxclient.com if you want to go into more detail.

David Raab said...

Following up - there is a case at http://www.qlikview.com/upload/Case_Studies_eng/Case_QlikView_on_Itanium_KBV_eng_051108.pdf describing a QlikView installation working against 15-20 terabytes of data. But it also mentions the server has 8 GB of memory, so what's happening is they are loading extracts from the 15-20 TB into QlikView. Nothing wrong with that but I don't want to leave the false impression that they are actually loading multiple terabytes into memory directly.

Incidentally, 8 GB is definitely not the largest QlikView server in operation - I have personnally worked on a bigger one, and am sure that was not the world champion.

I've asked QlikTech for more information on this topic. Once I learn more, I think I'll make a separate post rather than talking to myself here in comments. (Blogger doesn't let me edit comments, which is why I keep adding new ones.)

GangaV said...

A great assessment of how prospects will perceive claims made by Qlikview.

When positioning QlikView it is worth pointing out that data warehouses serve need to serve two key functions (amongst others):

(1) map source data via metadata that make sense to human consumers - e.g., map raw column names into their business names
(2) represent the data in structures/schema that supports reporting tools - for performance as well as analytical needs. This includes star-schemas, multi-dimensional cubes as well as aggregations, period-to-date calculations, balances, etc.

Simple column name mapping can be done within Qlikview queries, but it is best left to specialist tools and a data warehouse in an enterprise situation.

Where Qlikview can come into play is to effectively eliminate the need for the second and the most challenging of the above. Those who have spent months and years building and maintaining data warehouses and semantic layers will understand the compromises that need to be made between performance and analytical needs. Datawarehouses evolve over years as business users learn about the BI tool and IT users understand business requirements. Qlikview's promise is to give end users access to all data with the flexibility and speed needed to discover and analyse it from day one, without having to go through this prolonged evolution of their data warehouse. Proving how scalable the associative relationship model would be the challenge to convince the IT community.

Juan Martin said...

David, great article about QlikView.

I also work in a QlikView reseller, and I would like to share how we solve some of the problems we face in large enterprise BI situations:

QlikView provides QVD files to address reading, writing and storage of huge amounts of data. Basically, the QVD file stores a table in QlikView associative compressed format. The speed for retrieving rows from a QVD file is amazing, millions of rows per second.

Taking this into account, the architectural approach we use for managing enterprise data with QlikView is the following:

1)We build a first layer in which we load information from OLTP systems and store it in QVD files. In an initial data migration, all historical information is stored into these QVD files. New OLTP records are loaded into the existing QVDs periodically, in increments.

2)A second (optional) layer includes merging the raw QVDs from the 1st layer to create intermediate QVD structures ready to be used by end user QVW applications.

3)The third layer is comprised of the QVW end user applications that read the intermediate QVDs as well as the 1st layer QVDs as needed.

I like to think of this architecture of QVD files as a "QlikView datawarehouse".

I hope this helps to address the datawarehouse topic.

David Raab said...

Thanks Juan. We've taken a similar approach. Moving the data from its original source to the QVD format can be quite time-consuming for large volumes, so it is best to do it just once. Similarly, doing calculations and other data preparation in scripts and saving the results is often critical to producing quick response for end-users, since it minimizes the amount of time spent waiting for on-the-fly calculations. I haven't personally needed to build applications that reload from QVDs on-demand, but can easily envision situations where this would also be an important strategy to ensure adequate performance.

Handsome Massage said...

We recently brought QlikView in to do some very heavy lifting for us on teh BI front. Their tool connects directly to our 40 TB worth of data stored across multiple databases and technologies (including a Neteeza box) with excellent results. And the front end interface makes reports so remarkably easy to construct that we have almost everyone around here trained on how to use them.

To David's point, it was such a new way of seeing and doing thing that it quite literally changed everything.

Harris Reynolds said...

Hi David. A very interesting post, and I'll definitely agree that QlikTech has some compelling technology. A couple counter points to consider though:

1) Without knowing the details of QlikTech's underlying technology there is no way to claim that they do not in fact use a relational database under the covers. It would actually make great sense from a technical perspective as an "implementation detail" to use this approach despite not publicizing it. It would be a bit crazy to implement there own in memory DB, but I'd bet they do something similar regardless of what they publicly call it.

2) At the end of the day the only way enterprises store most data is in a relational DB; using Qlikveiw does not obviate the need for well-organized data (tables, relationships, etc). I would posit a guess that most data extracts imported into their product ultimately reside in a relational DB... so the claim that an enterprise does not need these skills to use QView is superficial. Even QView needs good data to generate good results.

David Raab said...

Hi Harris,

I fully agree with your second point. QlikView is a reporting system that has to import data from somewhere, and that somewhere will indeed almost always be a relational database. In fact, I'd go further and say it will often a data warehouse built on a relational system. So, yes, the enterprise does need those skills. What makes QlikView special is the people who write the reports don't need them.

As to your first point--QlikView doesn't work like a relational database, specifically in the way it handles joins. I discussed this at some length detail in comments on Curt Monash's blog http://www.dbms2.com/2008/08/04/qliktech-qlikview-update/. The salient points were:

"If establishing relations based on shared keys is a join, then, yes, that’s what QlikView does. So does any other database that links related records. But, to me, the term “SQL join” implies a process executed during a query that compares the key fields on two records and links the records where those fields match. As best I can tell, QlikView doesn’t do this, presumably because those relationships were already stored in pointers when the database is built. Thus, query-time processing is nil. I am quite certain that QlikView is not a conventional SQL database engine—whatever that might be—wrapped in a clever package. It just doesn’t perform that way."

And, more specifically:

"The other point I had intended to add to yesterday’s comment was a clarification about how QlikView joins differ from SQL joins. QlikView reads the data “in place”, rather than creating a result set like SQL. The specific advantage comes with many-to-many joins. Imagine two tables with three rows, each having the same key value. A traditional SQL join would create nine rows in the result set: that is, you get three records from matching the first record in table A with all three records in table B; another three from matching the second record in table A against all of table B, and yet another three from matching the third record in table A against table B. This means that if you, say, added the values from table B records, they would be triple-counted. QlikView would recognize that the records all match, but still reads table B directly. Therefore no redundant records are created, and a sum of fields from table B would be correct. Better still, a report that showed the sum data from table A and the sum of data from table B would give the correct answer regardless of whether you had selected one, two or three rows from table A.

"(Apologies if this is too abstract. One practical example is calculating response rates to a promotion. Table A has each response, and table B has a single record with the audience quantity. If you do a SQL join of table A to table B, all the records in table A match table B, so the result set has one record for each response, and every record has the audience quantity on it. Calculating response rate by dividing the sum of the responses into the sum of the audience quantities would therefore give the wrong result. This doesn’t happen in QlikView.)"

AJ said...

Hi David, Can you please explain how QlikView handles situations where, for example, a customer in two source systems is stored differently (GE vs. General Electric)? How can QlikView show all transactions for this customer under one name. In a traditional data warehousing setting, this is accomplished by using separately prepared conformed dimensions where one of the two names is retained.

David Raab said...

Hi AJ. There's no magic here: the QlikView file would need some sort of key that links the two sets of transactions. QlikView scripts would allow you to create this, or you could draw the data from an existing warehouse.