Thursday, July 31, 2008

How to Report on Ease of Use?

Yesterday’s post on classifying demand generation systems prompted some strong reactions. The basic issue is how to treat ease of use when describing vendors.

It’s hard to even define the issue without prejudicing the discussion. Are we talking about vendor rankings, vendor comparisons, or vendor analyses?

- Ranking implies a single score for each product. The approach is popular but it leads people to avoid evaluating systems against their own requirements. So I reject it.

- Vendor comparisons give each several scores to each vendor, for multiple categories. I have no problem with this, although it still leaves the question of what the categories should be.

- Vendor analyses attempt to describe what it's like to use a product. This is ultimately what buyers need to know, but it doesn’t lead directly to deciding which product is best for a given company.

Ultimately, then, a vendor comparison is what’s needed. Scoring vendors on several categories will highlight their strengths and weaknesses. Buyers then match these scores against their own requirements, focusing on the areas that are important to them. The mathematically inclined can assign formal weights to the different categories and generate a combined score if they wish. In fact, I do this regularly as a consultant. But the combined scores themselves are actually much less important than the understanding gained of trade-offs between products. Do we prefer a product that is better at function A than function B, or vice versa? Do we accept less functionality in return for lower cost or higher ease of use? Decisions are really made on that basis. The final ranking is just a byproduct.

The question, then, is whether ease of use should be one of the categories in this analysis. In theory I have no problem with including it. Ease of use does, however, pose some practical problems.

- It’s hard to measure. Ease of use is somewhat subjective. Things that are obvious to one person may not be obvious to someone else. Even a concrete measure like the time to set up a program or the number of keystrokes to accomplish a given task often depends on how familiar users are with a given system. This is not to say that usability differences don’t exist or are unmeasurable. But it does mean they are difficult to present accurately.

- ease depends on the situation. The interface that makes it easy to set up a simple project may make it difficult or impossible to handle a more complicated one. Conversely, features that support complex tasks often get in the way when you just want to do something simple. If one system does simple things easily and another does complicated things easily, which gets the better score?

I think this second item suggests that ease of use should be judged in conjunction with individual functions, rather than in general. In fact, it’s already part of a good functional assessment: the real question is usually not whether a system can do something, but how it does it. If the “how” is awkward, this lowers the score. This is precisely why I gather so much detail about the systems I evaluate, because I need to understand that “how”.

This leads me pretty much back to where I started, which is opposed to breaking out ease of use as a separate element in a system comparison. But I do recognize that people care deeply about it, so perhaps it would make sense to assess each function separately in terms of power and ease of use. Or, maybe some functions should be split into things like “simple email campaigns” and “complex email campaigns”. Ease of use would then be built into the score for each of them.

I’m still open to suggestion on this matter. Let me know what you think.

Tuesday, July 29, 2008

How Do You Classify Demand Generation Systems?

I’ve been pondering recently how to classify demand generation systems. Since my ultimate goal is to help potential buyers decide which product to purchase, the obvious approach is to first classify the buyers themselves and then determine which systems best fit which group. Note that while this seems obvious, it’s quite different from how analyst firms like Gartner and Forrester set up their classifications. Their ratings are based on market positions, with categories such as “leaders”, “visionaries”, and “contenders”.

This approach has always bothered me. Even though the analysts explicitly state that buyers should not simply limit their consideration to market “leaders”, that is exactly what many people do. The underlying psychology is simple: people (especially Americans, perhaps) love a contest, and everyone wants to work with a “leader”. Oh, and it’s less work than trying to understand your actual requirements and how well different systems match them.

Did you detect a note of hostility? Indeed. Anointing leaders is popular but it encourages buyers to make bad decisions. This is not quite up there with giving a toddler a gun, since the buyers are responsible adults. But it could, and should, be handled more carefully.

Now I feel better. What was I writing about? Right--classifying demand generation systems.

Clearly one way to classify buyers is based on the size of their company. Like the rich, really big firms are different from you and I. In particular, really big companies are likely to have separate marketing operations in different regions and perhaps for different product lines and customer segments. These offices must work on their own projects but still share plans and materials to coordinate across hundreds of marketing campaigns. They need fine-grained security so the groups don't accidentally change each other's work. Large firms may also demand an on-premise rather than externally-hosted solution, although this is becoming less of an issue.

So far so good. But that's just one dimension, and Consultant Union rules clearly state that all topics must be analyzed in a two-dimensional matrix.

It’s tempting to make the second dimension something to do with user skills or ease of use, which are pretty much two sides of the same coin. But everyone wants their system to be as easy to use as possible, and what’s possible depends largely on the complexity of the marketing programs being built. Since the first dimension already relates to program complexity, having ease of use as a second dimension would be largely redundant. Plus, what looks hard to me may seem simple to you, so this is something that’s very hard to measure objectively.

I think a more useful second dimension is the scope of functions supported. This relates to the number of channels and business activities.

- As to channels: any demand generation system will generate outbound emails and Web landing pages, and send leads them to a sales automation system. For many marketing departments, that’s plenty. But some systems also outbound call centers, mobile (SMS) messaging, direct mail, online chat, and RSS feeds. Potential buyers vary considerably in which of these channels they want their system to support, depending on whether they use them and how happy they are with their current solution.

- Business activities can extend beyond the core demand generation functions (basically, campaign planning, content management and lead scoring) to the rest of marketing management: planning, promotion calendars, Web analytics, performance measurement, financial reporting, predictive modeling, and integration of external data. Again, needs depend on both user activities and satisfaction with existing systems.

Scope is a bit tricky as a dimension because systems will have different combinations of functions, and users will have different needs. But it’s easy enough to generate a specific checklist of items for users to consult. A simple count of the functions supported will give a nice axis for a two-dimensional chart.

So that’s my current thinking on the subject: one dimension measures the ability to coordinate distributed marketing programs, and the other measures the scope of functions provided. Let me know if you agree or what you'd propose as alternatives.

Thursday, July 24, 2008

Two Acquisitions Extend SQL Server

I don't usually bother to post "breaking news" here, but I've recently seen two acquisitions by Microsoft that seem worth noting. On July 14, the company announced purchase of data quality software vendor Zoomix, and just today it announced purchase of data appliance vendor DATAllegro. Both deals seem to represent an attempt to make SQL Server a more complete solution--in terms of data preparation in the Zoomix case, and high-end scalability with DATAllegro.

Of the two deals, the DATAllegro one seems more intriguing, only because DATAllegro was so obviously not built around SQL Server to begin with. The whole point of the product was to use open source software (the Ingres database in this case) and commodity components. Switching to the proprietary Microsoft world just seems so, well, different. The FAQ accompanying the announcement makes clear that the DATAllegro technology will only be available in the future in combination with SQL Server. So anyone looking for evidence of a more open-systems-friendly Microsoft will have to point elsewhere.

The Zoomix acquisition seems more straightforward. Microsoft has been extending the data prepartion capabilities of SQL Server for quite some time now, and already had a pretty impressive set of tools. My concern here is that Zoomix actually had some extremely flexible matching and extraction capabilities. These overlap with other SQL Server components, so they are likely to get lost when Zoomix is assimilated into the product. That would be a pity.

Sybase IQ vs. Vertica: Comparisons are Misleading, But Fun

I received the “Vertica Fast Lane” e-newsletter yesterday, which I am amused to note from its URL is generated by Eloqua. (This is only amusing because I’m researching Eloqua for unrelated reasons these days. Still, if I can offer some advice to the Vertica Marketing Department, it’s best to hide that sort of thing.)

The newsletter contained a link to a post on Vertica’s blog entitled “Debunking a Myth: Column-Stores vs. Indexes”. Naturally caught my attention, given my own recent post suggesting that use indexes is a critical difference between SybaseIQ and the new columnar databases, of which Vertica is the most prominent.

As it turned out, the Vertica post addressed a very different issue: why putting a conventional B-tree index on every column in a traditional relational database is nowhere near as efficient as using a columnar database. This is worth knowing, but doesn’t apply to Sybase IQ because IQ’s primary indexes are not B-trees. Instead, most of them are highly compressed versions of the data itself.

If anything, the article reinforced my feeling that what Sybase calls an index and what Vertica calls a compressed column are almost the same thing. The major difference seems to be that Vertica sorts its columns before storing them. This will sometimes allow greater compression and more efficient searches, although it also implies more processing during the data load. Sybase hasn’t mentioned sorting its indexes, although I suppose they might. Vertica also sometimes improves performance by storing the same data in different sort sequences.

Although Vertica’s use of sorting is an advantage, Sybase has tricks of its own. So it’s impossible to simply look at the features and say one system is “better” than the other, either in general or for specific applications. There's no alternative to live testing on actual tasks.

The Vertica newsletter also announced a preview release of the system’s next version, somewhat archly codenamed “Corinthian” (an order of Greek columns—get it?. And, yes, “archly” is a pun.) To quote Vertica, “The focus of the Corinthian release is to deliver a high degree of ANSI SQL-92 compatibility and set the stage for SQL-99 enhancements in follow-on releases.”

This raises an issue that hadn’t occurred to me, since I had assumed that Vertica and other columnar databases already were compliant with major SQL standards. But apparently the missing capabilities were fairly substantial, since “Corinthian” adds nested sub-queries; outer-, cross- and self-joins; union and union-all set operations; and VarChar long string support. These cannot be critical features, since people have been using Vertica without them. But they do represent the sort of limitations that sometimes pop up only after someone has purchased a system and tried to deploy it. Once more, there's no substitute for doing your homework.

Thursday, July 17, 2008

QlikView 8.5 Does More, Costs Less

I haven’t been working much with QlikView recently, which is why I haven’t been writing about it. But I did receive news of their latest release, 8.5, which was noteworthy for at least two reasons.

The first is new pricing. Without going into the details, I can say that QlikView significantly lowered the cost of an entry level system, while also making that system more scalable. This should make it much easier for organizations that find QlikView intriguing to actually give it a try.

The second bit of news was an enhancement that allows comparisons of different selections within the same report. This is admittedly esoteric, but it does address an issue that came up fairly often.

To backtrack a bit: the fundamental operation of QlikView is that users select sets of records by clicking (or ‘qliking’, if you insist) on lists of values. For example, the interface for an application might have lists of regions, years and product, plus a chart showing revenues and costs. Without any selections, the chart would show data for all regions, years and products combined. To drill into the details, users would click on a particular combination of regions, years and products. The system would then show the data for the selected items only. (I know this doesn’t sound revolutionary, and as a functionality, it isn’t. What makes QlikView great is how easily you, or I, or a clever eight-year-old, could set up that application. But that’s not the point just now.)

The problem was that sometimes people wanted to compare different selections. If these could be treated as dimensions, it was not a problem: a few clicks could add a ‘year’ dimension to the report I just described, and year-to-year comparisons would appear automatically. What was happening technically was the records within a single selection were being separated for reporting.

But sometimes things are more complicated. If you wanted to compare this year’s results for Product A against last year’s results for Product B, it took some fairly fancy coding. (Not all that fancy, actually, but more work than QlikView usually requires.) The new set features let users simply create and save one selection, then create another, totally independent selection, and compare them directly. In fact, you can bookmark as many selections as you like, and compare any pair you wish. This will be very helpful in many situations.

But wait: there’s more. The new version also supports set operations, which can find records that belong to both, either or only one of the pair of sets. So you could easily find customers who bought last year but not this year, or people who bought either of two products but not both. (Again, this was possible before, but is now much simpler.) You can also do still more elaborate selections, but it gives me a headache to even think about describing them.

Now, I’m quite certain that no one is going to buy or not buy QlikView because of these particular features. In fact, the new pricing makes it even more likely that the product will be purchased by business users outside of IT departments, who are unlikely to drill into this level of technical detail. Those users see QlikView as essentially as a productivity tool—Excel on steroids. This greatly understates what QlikView can actually do, but it doesn’t matter: the users will discover its real capabilities once they get started. What’s important is getting QlikView into companies despite the usual resistance from IT organizations, who often (and correctly, from the IT perspective) don’t see much benefit.

Saturday, July 12, 2008

Sybase IQ: A Different Kind of Columnar Database (Or Is It The Other Way Around?)

I spent a fair amount of time this past week getting ready for my part in the July 10 DM Radio Webcast on columnar databases. Much of this was spent updating my information on SybaseIQ, whose CTO Irfan Khan was a co-panelist.

Sybase was particularly eager to educate me because I apparently ruffled a few feathers when my July DM Review column described SybaseIQ as a “variation on a columnar database” and listed it separately from other columnar systems. Since IQ has been around for much longer than the other columnar systems and has a vastly larger installed base—over 1,300 customers, as they reminded me several times—the Sybase position seems to be that they should be considered the standard, and everyone else as the variation. (Not that they put it that way.) I can certainly see why it would be frustrating to be set apart from other columnar systems at exactly the moment when columnar technology is finally at the center of attention.

The irony is that I’ve long been fond of SybaseIQ, precisely because I felt its unconventional approach offered advantages that few people recognized. I also feel good about IQ because I wrote about its technology back in 1994, before Sybase purchased it from Expressway Technologies—as I reminded Sybase several times.

In truth, though, that original article was part of the problem. Expressway was an indexing system that used a very clever, and patented, variation on bitmap indexes that allowed calculations within the index itself. Although that technology is still an important feature within SybaseIQ, it now supplements a true column-based data store. Thus, while Expressway was not a columnar database, SybaseIQ is.

I was aware that Sybase had extended Expressway substantially, which is why my DM Review article did refer to them as a type of columnar database. So there was no error in what I wrote. But I’ll admit that until this week’s briefings I didn’t realize just how far SybaseIQ has moved from its bitmapped roots. It now uses seven or nine types of indexes (depending on which document you read), including traditional b-tree indexes and word indexes. Many of its other indexes do use some form of bitmaps, often in conjunction with tokenization (i.e., replacing an actual value with a key that points to a look-up table of actual values. Tokenization saves space when the same value occurs repeatedly, because the key is much smaller than the value itself. Think how much smaller a database is if it stores “MA” instead of “Massachusetts” in its addresses. )

Of course, tokenization is really a data compression technique, so I have a hard time considering a column of tokenized data to be an index. To me, an index is an access mechanism, not the data itself, regardless of how well it’s compressed. Sybase serenely glides over the distinction with the Zen-like aphorism that “the index is the column” (or maybe it was the other way around). I’m not sure I agree, but the point doesn’t seem worth debating

Yet, semantics aside, SybaseIQ’s heavy reliance on “indexes” is a major difference between it and the raft of other systems currently gaining attention as columnar databases: Vertica, ParAccel, Exasol and Calpont among them. These systems do rely heavily on compression of their data columns, but don’t describe (or, presumably, use) these as indexes. In particular, so far as I know, they don’t build different kinds of indexes on the same column, which IQ treats as a main selling point. Some of the other systems store several versions of the same column in different sort sequences, but that’s quite different.

The other very clear distinction between IQ and the other columnar systems is that IQ uses Symmetrical Multi-Processing (SMP) servers to process queries against a unified data store, while the others rely on shared nothing or Massively Multi-Processor (MMP) servers. This reflects a fundamentally different approach to scalability. Sybase scales by having different servers execute different queries simultaneously, relying on its indexes to minimize the amount of data that must be read from the disk. The MPP-based systems scale by partitioning the data so that many servers can work in parallel to scan it quickly. (Naturally, the MPP systems do more than a brute-force column scan; for example, those sorted columns can substantially reduce read volumes.)

It’s possible that understanding these differences would allow someone to judge which type of columnar system works better for a particular application. But I am not that someone. Sybase makes a plausible case that its approach is inherently better for a wider range of ad hoc queries, because it doesn’t depend on how the data is partitioned or sorted. However, I haven’t heard the other vendors’ side of that argument. In any event, actual performance will depend on how the architecture has been implemented. So even a theoretically superior approach will not necessarily deliver better results in real life. Until the industry has a great deal more experience with the MPP systems in particular, the only way to know which database is better for a particular application will be to test them.

The SMP/MPP distinction does raise a question about SybaseIQ’s uniqueness. My original DM Review article actually listed two classes of columnar systems: SMP-based and MPP-based. Other SMP-based systems include Alterian, SmartFocus, Infobright, 1010Data and open-source LucidDB. (The LucidDB site contains some good technical explanations of columnar techniques, incidentally.)

I chose not to list SybaseIQ in the SMP category because I thought its reliance on bitmap techniques makes it significantly different from the others, and in particular because I believed it made IQ substantially more scalable. I’m not so sure about the bitmap part anymore, now that realize SybaseIQ makes less use of bitmaps than I thought, and have found that some of the other vendors use them too. On the other hand, IQ’s proven scalability is still much greater than any of these other systems—Sybase cites installations over 100 TB, while none of the others (possibly excepting Infobright) has an installation over 10 TB.

So where does all this leave us? Regarding SybaseIQ, not so far from where we started: I still say it’s an excellent columnar database that is significantly different from the (MPP-based) columnar databases that are the focus of recent attention. But, to me, the really important word in the preceding sentence is “excellent”, not “columnar”. The point of the original DM Review article was that there are many kinds of analytical databases available, and you should consider them all when assessing which might fit your needs. It would be plain silly to finally look for alternatives to conventional relational databases and immediately restrict yourself to just one other approach.

Thursday, July 03, 2008

LucidEra Takes a Shot at On-Demand Analytics

Back in March, I wrote a fairly dismissive post about on-demand business intelligence systems. My basic objection was that the hardest part of building a business intelligence system is integrating the source data, and being on-demand doesn’t make that any easier. I still think that’s the case, but did revisit the topic recently in a conversation with Ken Rudin, CEO of on-demand business analytics vendor Lucid Era.

Rudin, who has plenty of experience with both on-demand and analytics from working at, Siebel, and Oracle, saw not one but two obstacles to business intelligence: integration and customization. He described LucidEra’s approach as not so much solving those problems as side-stepping them.

The key to this approach is (drum roll…) applications. Although LucidEra has built a platform that supports generic on-demand business intelligence, it doesn’t sell the platform. Rather, it sells preconfigured applications that use the platform for specific purposes including sales pipeline analysis, order analysis, and (just released) sales lead analysis. These are supported by standard connectors to, NetSuite (where Rudin was an advisory board member) and Oracle Order Management.

Problem(s) solved, eh? Standard applications meet customer needs without custom development (at least initially). Standard connectors integrate source data without any effort at all. Add the quick deployment and scalability inherent in the on-demand approach, and, presto, instant business value.

There’s really nothing to argue with here, except to point out that applications based on ‘integrating’ data from a single source system can easily be replaced by improvements to the source system itself. LucidEra fully recognizes this risk, and has actually built its platform to import and consolidate data from multiple sources. In fact, the preconfigured applications are just a stepping stone. The company’s long-term strategy is to expose its platform so that other people can build their own applications with it. This would certainly give it a more defensible business position. Of course, it also resurrects the customization and integration issues that the application-based strategy was intended to avoid.

LucidEra would probably argue that its technology makes this customization and integration easier than with alternative solutions. My inner database geek was excited to learn that the company uses a version of the columnar database originally developed by Broadbase (later merged with Kana), which is now open source LucidDB. An open source columnar database—how cool is that?

LucidEra also uses the open source Mondrian OLAP server (part of Pentaho) and a powerful matching engine for identity resolution. These all run on a Linux grid. There is also some technology—which Rudin said was patented, although I couldn’t find any details—that allows applications to incorporate new data without customization, through propagation of metadata changes. I don’t have much of an inner metadata geek, but if I did, he would probably find that exciting too.

This all sounds technically most excellent and highly economical. Whether it significantly reduces the cost of customization and integration is another question. If it allows non-IT people to do the work, it just might. Otherwise, it’s the same old development cycle, which is no fun at all.

So, as I said at the start of all this, I’m still skeptical of on-demand business intelligence. But LucidEra itself does seem to offer good value.

My discussion with LucidEra also touched on a couple of other topics that have been on my mind for some time. I might as well put them into writing so I can freely enjoy the weekend.

- Standard vs. custom selection of marketing metrics. The question here is simply whether standard metrics make sense. Maybe it’s not a question at all: every application presents them, and every marketer asks for them, usually in terms of “best practices”. It’s only an issue because when I think about this as a consultant, and when I listen to other consultants, the answer that comes back is that metrics should be tailored to the business situation. Consider, for example, choosing Key Performance Indicators on a Balanced Scorecard. But vox populi, vox dei (irony alert!), so I suppose I’ll have to start defining a standard set of my own.

- Campaign analysis in demand generation systems. This came up in last week’s post and the subsequent comments, which I highly recommend that you read. (There may be a quiz.) The question here is whether most demand generation systems (Eloqua, Vtrenz, Marketo, Market2Lead, Manticore, etc.) import sales results from CRM systems to measure campaign effectiveness. My impression was they did, but Rudin said that LucidEra created its lead analysis system precisely because they did not. I’ve now carefully reviewed my notes on this topic, and can tell you that Marketo and Market2Lead currently have this capability, while the other vendors I’ve listed should have it before the end of the year. So things are not quite as rosy as I thought but will soon be just fine.

Wednesday, July 02, 2008

The Value of Intra-Site Web Search: A Personal Example

I’ll do a real post later today or more likely tomorrow, but I thought I’d quickly share a recent personal experience that illustrated the importance in e-commerce of really good in-site search.

By way a background: having a good search capability is one of those Mom-and-apple-pie truths that everyone fully accepts in theory, but not everyone bothers to actually execute. So perhaps being reminded of the real-life value of doing it right will inspire some reader—maybe even YOU—to take another look at what you’re doing and how to improve it.

Anyway, I recently needed a notebook PC with a powerful video card for gaming on short notice. Or, at least, one of my kids did. He quickly found a Web site that had an excellent ranking of the video cards. But there are many dozens of cards even in the high-performance category, so I couldn’t just type them all into a search box, either on Google or within an e-commerce site. Nor did video cards within a given unit necessarily show up in search results even when I tried entering them individually.

To make a long story short, we found that on-line computer retailer NewEgg had a “power search” option that give checkboxes for what are presumably all available options in a wide variety of system parameters—product type, manufacturer, series, CPU type, CPU speed, screen size, wide screen support, resolution, operating system, video card, graphic type, disk size, memory, optical drive type, wireless LAN, blue tooth, Webcam and weight. This meant I could click off the video cards I was looking for, as well as other parameters such as screen size and weight class. The results came back, and that was that.

We weren’t necessarily thrilled with the product choices at NewEgg, and there were several minor snafus that left me a little annoyed with them. But I couldn’t find any other site with a way to efficiently locate the systems I needed. So they got the sale.

(In case you're wondering: yes I would order again and, for all you Net Promoter Score fans, I suppose I would recommend them if someone asked. But they missed an opportunity to become my preferred vendor for this sort of thing, a nuance the Net Promoter Score would fail to pick up.)

I suppose there is a slight marketing technology angle to this story as well. NewEgg has to somehow populate its checkboxes with all that information. They must do this automatically since it constantly changes. This requires parsing the data from specification sheets into a database. As data extraction challenges go, this isn’t the hardest I can imagine, but it’s still a bit of work. It should be a good use case or case study for somebody in that industry.