Wednesday, December 24, 2008
This gap is partly filled by analytical technologies such as columnar systems and database appliances, which can give good performance without schemas tailored for each task. But those systems are purchased and managed by the IT department, so they still leave analysts largely reliant on IT’s tender mercies.
A much larger portion of the gap is filled by products like QlikView, which the analysts can largely control for themselves. These can be divided into two subcategories: database engines like QlikView and illuminate, and visualization tools like Tableau and TIBCO Spotfire. The first group lets analysts do complex data manipulation and queries without extensive data modeling, while the latter group lets them do complex data exploration and presentation without formal programming. This distinction is not absolute: the database tools offer some presentation functions, and the visualization tools support some data manipulation. Both capabilities must be available for the analysts the work independently.
This brings us to ADVIZOR from ADVIZOR Solutions. ADVIZOR features an in-memory database and some data manipulation, but its primary strength is visualization. This includes at least fifteen chart types, including some with delightfully cool names like Multiscape, Parabox and Data Constellations. Analysts can easily configure these by selecting a few menu options. The charts are also somewhat interactive, allowing users to select records by clicking on a shape or drawing a box around data points. Some settings can be changed within the chart, such as selecting a measure to report on. Others, such as specifying the dimensions, require modifying the chart setup. The distinction won’t matter much to business analysts, who will have the tools build and modify the charts. But final consumers of the analyses typically run a viewer that does not permit changes to the underlying graph configuration.
On the other hand, that in-memory database can link several charts within a dashboard so selections made on one chart are immediately reflected in all others. This is arguably the greatest strength of the system, since it lets users slice data across many dimensions without writing complex queries. Colors are also consistent from one chart to the next, so that, for example, the colors assigned to different customer groups in a bar chart determine the color of the dot assigned to each customer in a scatter plot. Selecting a single group by clicking on its bar would turn the dots of all the other customers to gray. Keeping the excluded records visible in this fashion may yield more insight than simply removing them, although the system could also do that. These adjustments appear almost instantly even where millions of records are involved.
Dashboards can easily be shared with end-users through either a zero-footprint Web client or a downloadable object. Both use the Microsoft .NET platform, so Mac and Linux users need not apply. Images of ADVIZOR dashboards can easily be exported to Office documents, and can actually be manipulated from within Powerpoint if they are connected to the underlying dashboard. It’s also easy to export results such as lists of selected records.
Circling back to that database: it employs technology developed at Bell Labs during the 1990’s to support interactive visualization. The data model itself is a fairly standard one of tables linked by keys. Users can import data from text files, relational databases, Excel, Access, text files, Business Objects or Salesforce.com. They can map the table relationships and add some transformations and calculated fields during or after the import process. Although the mapping and transformations are executed interactively, the system records the sequence so the user can later edit it or repeat it automatically.
The import is fairly quick: the vendor said that an extract of three to four gigabytes across thirty tables runs in about twenty minutes, of which about five minutes is the build itself. The stored data is highly compressed but expands substantially when loaded into RAM: in the previous example, the three to four GB are saved as a 70 MB project file, but need 1.4 GB of RAM. The current version of ADVIZOR runs on 32 bit systems which limits it to 2-4 GB of RAM, although a 64 bit version is on track for release in January 2009. This will allow much larger implementations.
Pricing of ADVIZOR starts at $499 for a desktop version limited to Excel, Access or Salesforce.com source data and without table linking. (A 30-day trial version of this costs $49.) The full version starts at around $10,000, with additional charges for different types of user seats and professional services. Few clients pay less than $20,000 and a typical purchase is $50,000 to $60,000. Most buyers are business analysts or managers with limited technical skills, so the company usually helps set up their initial data loads and applications. ADVIZOR was introduced in 2004 and has several thousand end users, with a particular concentration in fund-raising for higher education. The bulk of ADVIZOR sales come through vendors who have embedded it within their own products.
Thursday, December 18, 2008
Oh well, no grudges here -- ‘tis the season and all that. I’m guessing the reason for the lack of comment is that the proposed methodology was too complex for people to take the time to assess and critique. In fact, the length of the post itself may be an obstacle, but that in turn reflects the complexity of the approach it describes. Fair enough.
So the question is, how do you simplify the methodology and still provide something useful?
- One approach would be to reduce the scope of the assessment. Instead of defining scenarios for all types of demand generation processes, pick a single process and just analyze that. Note that I said “single” not “simple”, because you want to capture the ability of the systems to do complicated things as well. This is a very tempting path and I might try it because it’s easy to experiment with. But it still raises all the issues of how you determine which tasks are performed by which types of users and how you account for the cost of hand-offs between those users. This strikes me as a very important dimension to consider, but I also recognize that it introduces quite a bit of complexity and subjectivity into the process. I also recognize that measuring even a single process will require measuring system set-up, content creation and other preliminary tasks. Thus, you still need to do a great deal of work to get metrics on one task, and that task isn’t necessarily representative of the relative strengths of the different vendors. This seems like a lot of effort for a meager result.
- Another approach would be to ask users rather than trying to run the tests independently. That is, you would do a survey that lists the various scenarios and asks users to estimate the time they require and how this is distributed among different user types. That sounds appealing insofar as now someone else does the work, but I can’t imagine how you would get enough data to be meaningful, or how you would ensure different users’ responses were consistent.
- A variation of this approach would be to ask vastly simpler questions – say, estimate the time and skill level needed for a half-dozen or so typical processes including system setup, simple outbound email campaign, setting up a nurturing campaign, etc. You might get more answers and on the whole they’d probably be more reliable, but you’re still at the mercy of the respondents’ honesty. Since vendors would have a major incentive to game the system, this is a big concern. Nor is it clear what incentives users would have to participate, how you screen out people with axes to grind, or whether users would be constrained by non-disclosure agreements from participating. Still, this may be the most practical of all the approaches I’ve come up with, so perhaps it’s worth pursuing. Maybe we get a sample of ten clients from each vendor? Sure they’d be hand-picked, but we could still hope their answers would accurately reflect any substantial differences in workload by vendor.
- Or, we could ask the vendors to run their own tests and report the results. But who would believe them? Forget it.
- Maybe we just give up on any kind of public reporting, and provide buyers with the tools to conduct their own evaluations. I think this is a sound idea and actually mentioned it in last week’s post. Certainly the users themselves would benefit, although it’s not clear how many buyers really engage in a detailed comparative analysis before making their choice. (For what it’s worth, our existing usability worksheet is the third most popular download from the Raab Guide site. I guess that’s good news.) But if the results aren’t shared, there is no benefit to the larger community. We could offer the evaluation kit for free in return for sharing the results, but I doubt this is enough of an incentive. And you’d still have the issue of ensuring that reported results are legitimate.
So there you have it, folks. I’m between a rock and a hard place. No matter how much I talk about usability, the existing Raab Guide mostly lists features, and people will use it to compare systems on that basis. But I can’t find a way to add usability to the mix that’s both objective and practical. Not sure where to go next.
Saturday, December 13, 2008
As Tuesday’s post suggested, my thoughts on usability measurement have now crystallized. To provide a meaningful and consistent comparison of usability across demand generation vendors, you could:
1. Define a set of business scenarios that must be supported by the system. Each scenario would describe a type of marketing campaign and the system tasks required to run it. These tasks would cover system set-up, materials creation, campaign design, execution and evaluation. Some tasks would be common to several scenarios, while others would be unique. For example, pretty much any kind of campaign would involve creating an email, but only some campaigns require adding custom fields to the system database.
The result would be a grid with tasks listed down the side, scenarios across the top, and checkmarks showing which tasks are used in which scenarios. A small sample is below. Note that you can build a single task list for any combination of scenarios by simply combining the checkmarks in their columns. Thus, in the sample table, scenarios 1 and 2 require tasks 1, 2 and 3. In many cases, there will be two entries for each task, one for setting it up and another for repeating it.
|scenario 1||scenario 2||scenario 3||...|
2. Develop a specific package for each scenario, with both a task list and standard materials such as a list of users to set up, data elements to capture, email contents to produce, campaign logic, etc. You also need a standard Salesforce.com installation and perhaps company Web site to integrate during testing. Assembling these packages would be quite a bit of work, but only has to be done once. The project could start with relatively simple packages and expand them over time.
3. Have an expert user (typically a vendor employee) run through the required tasks while the tester tracks their time and results. As I noted in the earlier post, this means a single expert user is simulating different users at a real client. (These are marketing managers, operations specialists, system administrators, database managers, etc.) This makes sense if we assume that the client users will all be experts in their own areas. But it also means that the tester must assess which type of user would perform each task. The test packages would include score sheets to make capturing this information as easy as possible.
4. Check the test results by executing the scenario campaigns and identifying any errors. You need this step to ensure the work was actually completed correctly—otherwise, the experts could simply zoom through their tasks without worrying about accuracy. Part of the process would be for the tester to “respond” to the promotions to ensure that the system reacts correctly. This is another labor-intensive process. Results will be summarized in an error report that is part of the final evaluation.
5. Have users select their scenarios they wish to evaluate. Then generate reports for the tasks in those scenarios, showing the (a) tasks completed (b) time required (c) workload on different users and (d) error rates. Comparing the results for different systems will give a good sense of strengths and weaknesses.
* * *
Of course, the process won’t end with the detailed reports. People will want to combine the results in a single score that can be used to rank the vendors. **sigh**. Building such a score requires adjusting for factors including:
- differences in system features (some systems lack some features and, thus, can’t execute all the specified tasks)
- differences in the importance of different tasks
- differences in the value of difference users’ time
- the impact of handing off tasks among users (this adds time and errors that won’t be captured in a single-user test)
- differences in error rates and in the importance of different errors
- differences in the mix of tasks and their importance at different companies
I’m sure there are other factors as well. A simple approach might just be to assign scores for each separate dimension, say on a 1-10 scale, and then add them for a combined score. You could evolve a more elaborate approach over time, but the resulting figures will never have any specific meaning. Still, they should provide a reasonably valid ranking of the competitors.
The report could also supplement or replace the single figure with a graph that plots the results on two or more dimensions. For example, a classic scatter plot could position the each system based on breadth (number of tasks completed) vs. productivity (error-free tasks completed per hour). This would more clearly illustrate the trade offs between the products.
The good news in all this is that the closer you get to a specific company’s requirements, the more you can replace any generic assumptions with that company’s own data. This means that any aggregate score becomes much more meaningful for that company.
Let me clarify and expand that last point, because it’s very important. The tests just need to be done once because the results (tasks completed, work time, user workload, error rate) don’t change based on the individual client. So you could store those results in a database and then apply client-specific parameters such as scenarios chosen, task mix, and user costs to get a client-specific ranking without actually conducting any client-specific tests.
Of course, no one would purchase a system based solely on someone else’s tests. But having the data available could greatly speed the evaluation process and improve buyers’ understanding of the real differences between systems. In addition, the test scenarios themselves should be help buyers to decide what they want to see demonstrated.
(Before I lose track of it, let me point out that this approach doesn’t address the ease-of-learning component of usability. That requires a larger base of testers, or at least an assessment of what looks difficult to learn. It’s possible that assessing ease-of-learning really involves the same judgment as assessing which type of user will perform each task. Both, after all, are based on how hard the task looks. In any case, this issue needs more thought.)
What Do You Think?
Well, this all sounds just great to me, but I’m not an objective observer. I’m really curious to learn what you think (on the assumption that “you”, the readers of this blog, include many demand generation vendors and users).
Users: Would you use something like this during the selection process? Would you be able to prioritize your requirements and estimate the numbers of different tasks (campaigns, emails, landing pages, etc.) per year? What would you pay for a customized report based on your inputs? Would you prefer to do this sort of testing for yourself? If so, would you pay for the task lists and scenario packages to help you run your own tests? Would you want consulting help with conducting those tests? In general, do you actually conduct a detailed vendor comparison before making a choice?
Vendors: Does this approach seem fair? Is it very different from the standard scenarios you’ve already worked up for training and sales demonstrations? Would the standard task lists and scenarios make it easier to gather prospects' business requirements? Would the test results accelerate your sales cycles and deployment times? Would testing provide a useful benchmark for your development efforts? Would you participate in the testing, knowing the results were going to be published in a database? Would you pay to have the tests done? Would you help to fund initial development?
Everybody: Does this make sense? What flaws or risks do you see? What’s the best way to build the scenarios, task lists and packages? In particular, could be they be built cooperatively (“crowd sourced”) with a Wiki or something similar?
Please comment and ask others to comment as well.
Tuesday, December 09, 2008
I think we all know that the simplest practical measure of intelligence is how often someone agrees with you. On that scale, University of Ottawa Professor Timothy Lethbridge must be some kind of genius, because his course notes on Software Usability express my opinions on the topic even better and in more detail than I’ve yet to do for myself. Specifically, he lists the following basic process for measuring usability:
- understand your users, and recognize that they fall into different classes
- understand the tasks that users will perform with the system
- pick a representative set of tasks
- pick a representative set of users
- define the questions you want to answer about usability
- pick the metrics that answer those questions
- have the users perform the tasks and measure their performance
This is very much the approach that I’ve been writing about, in pretty much the same words. Happily, Lethbridge provides additional refinement of the concepts. Just paging through his notes, some of his suggestions include:
- classifying users in several dimensions, including the job type, experience with the tasks, general computer experience, personality type, and general abilities (e.g. language skills, physical disabilities, etc.). I’d be more specific and add skills such as analytical or technical knowledge.
- defining tasks based on use cases (I tend to call these business processes, but it’s pretty much the same); understanding how often each task is performed, how much time it takes, and how important it is; and testing different tasks for different types of users. “THIS STEP CAN BE A LOT OF WORK” the notes warn us, and, indeed, building the proper task list is probably the hardest step in the whole process.
- a list of metrics:
- proficiency, defined as the time to complete the chosen tasks. That strikes me as an odd label, since I usually think of proficiency as an attribute of a user not a system. The obvious alternative is efficiency, but as we’ll see in a moment, he uses that for something else. Maybe “productivity” would be better; I think this comes close to the standard definition of labor productivity as output per hour.
- learnability, defined as time to reach a specified level of proficiency.
- efficiency, defined as proficiency of an expert. There’s no corresponding term for “proficiency of a novice”, which I think there should be. So maybe what you really need is “expert efficiency” and “novice efficiency”, or “expert and novice “productivity”, and discard “proficiency” altogether.
- memorability, defined as proficiency after a period of non-use. If you discard proficiency, this could be “efficiency (or productivity) after a period of non-use”, which makes just as much sense.
- error handling, defined as number or time spent on deviations from the ideal way to perform a task. I’m not so sure about this one. After all, time spent on deviations is part of total time spent, which is already captured in proficiency or efficiency or whatever you call it. I’d rather see a measure of error rate, which would be defined as number or percentage of tasks performed correctly (by users with a certain level of training). Now that I think about it, none of Lethbridge’s measures incorporate any notion of output quality—a rather curious and important omission.
- satisfaction, defined subjectively by users on a scale of 1 to 5.
- plot a “learning curve” on the two dimensions of proficiency and training / practice time; the shape of the curve provides useful insights into novice productivity (what can new users do without any training); learnability (a steep early curve means people learn the system quickly) and eventual efficiency (the level of proficiency where the curve flattens out).
- even expert users may not make best use of the system if stop learning before they master all its features. So they system should lead them to explore new features by offering tips or making contextual suggestions.
At this point, we’re about half way through the notes. The second half provides specific suggestions on:
- measuring learnability (e.g. by looking at features that make systems easy to learn);
- causes of efficiency problems (e.g. slow response time, lack of an easy step-by-step route to perform a task);
- choosing experts and what to do when experts are unavailable (basically, plot of learning curve of new users);
- measuring memorability (which may involve different retention periods for different types of tasks; and should also distinguish between frequently and infrequently used tasks, with special attention to handling emergencies)
- classifying errors (based on whether they were caused by user accidents or confusion [Lethbridge says that accidents are not the system’s fault while confusion is; this is not a distinction I find convincing]; also based on whether the user discovers them immediately or after some delay, the system points them out, or they are never made known to the user)
- measuring satisfaction (surveys should be based on real and varied work rather than just a few small tasks, should be limited to 10-15 questions, should use a “Likert Scale” of strongly agree to strongly disagree, and should vary the sequence and wording of questions)
- measuring different classes of users (consider their experience with computers, the application domain and the system being tested; best way to measure proficiency differences is to compare the bottom 25% of users with the 3rd best 25%, since this will eliminate outliers)
This is all good stuff. Of course, my own interest is applying it to measuring usability for demand generation systems. My main take-aways for that are:
1. defining user types and tasks to measure are really important. But I knew that already.
2. choosing the actual metrics takes more thought than I’ve previously given it. Time to complete the chosen tasks (I think I’ll settle on calling it productivity) is clearly the most important. But learnability (which I think comes down to time to reach a specified level of expertise) and error rate matter too.
For marketing automation systems in particular, I think it’s reasonable to assume that all users will be trained in the tasks they perform. (This isn’t the case for other systems, e.g. ATM machines and most consumer Web sites, which are used by wholly untrained users.) The key to this assumption is that different tasks will be the responsibility of different users; otherwise, I’d be assuming that all users are trained in everything. So it does require determining which users will do which tasks in different systems.
On the other hand, assuming that all tasks are performed by experts in those tasks does mean that someone who is expert in all tasks (e.g., a vendor sales engineer) can actually provide a good measure of system productivity. I know this is a very convenient conclusion for me to reach, but I swear I didn’t start out aiming for it. Still, I do think it’s sound and it may provide a huge shortcut in developing usability comparisons for the Raab Guide. What is does do is require a separate focus on learnability so we don’t lose sight of that one. I’m not sure what to do about error rate, but do know it has to be measured for experts, not novices. Perhaps when we set up the test tasks, we can involve specific contents that can later be checked for errors. Interesting project, this is.
3. the role of surveys is limited. This is another convenient conclusion, since statistically meaningful surveys would require finding a large number of demand generation system users and gathering detailed information about their levels of expertise. It would still be interesting to do some preliminary surveys of marketers to help understand the tasks they find important and, to the degree possible, to understand the system features they like or dislike. But the classic usability surveys that ask users how they feel about their systems are probably not necessary or even very helpful in this situation.
This matters because much of the literature I’ve seen treats surveys as the primary tool in the usability measurement. This is why I am relieved to find an alternative.
As an aside: many usability surveys such as SUMI (Software Usability Measurement Inventory) are proprietary. My research did turn up what looks like a good public version
Measuring Usability with the USE Questionnaire by Arnold M. Lund from the
Society for Technical Communication (STC) Usability SIG Newsletter of October 2001. The acronym USE stands for the three main categories: Usefulness, Satisfaction and Ease of Use/Ease of Learning. The article provides a good explanation of the logic behind the survey, and is well worth reading if you’re interested in the topic. The questions, which would be asked on a 7-point Likert Scale, are:
- It helps me be more effective.
- It helps me be more productive.
- It is useful.
- It gives me more control over the activities in my life.
- It makes the things I want to accomplish easier to get done.
- It saves me time when I use it.
- It meets my needs.
- It does everything I would expect it to do.
Ease of Use
- It is easy to use.
- It is simple to use.
- It is user friendly.
- It requires the fewest steps possible to accomplish what I want to do with it.
- It is flexible.
- Using it is effortless.
- I can use it without written instructions.
- I don't notice any inconsistencies as I use it.
- Both occasional and regular users would like it.
- I can recover from mistakes quickly and easily.
- I can use it successfully every time.
Ease of Learning
- I learned to use it quickly.
- I easily remember how to use it.
- It is easy to learn to use it.
- I quickly became skillful with it.
- I am satisfied with it.
- I would recommend it to a friend.
- It is fun to use.
- It works the way I want it to work.
- It is wonderful.
- I feel I need to have it.
- It is pleasant to use.
Apart from the difficulties of recruiting and analyzing a large enough number of respondents, this type of survey only gives a general view of the product in question. In the case of demand generation, this wouldn’t allow us to understand the specific strengths and weaknesses of different products, which is a key objective of any comparative research. Any results from this sort of survey would be interesting in their own right, but couldn’t themselves provide a substitute for the more detailed task-based research.
Wednesday, December 03, 2008
My little tour of demand generation vendors landed at Pardot just before Thanksgiving. As you’ll recall from my post on Web activity statistics, Pardot is one of the higher-ranked vendors not already in the Raab Guide to Demand Generation Systems. So I was quite curious to see what they had to offer.
What I found was intriguing. While last week’s post found that Marketbright aims at more sophisticated clients, Pardot explicitly targets small and midsize businesses (or SMBs as we fondly acronymize them [yes, that’s a word, at least according to http://www.urbandictionary.com/]). Actually I don’t know why I find the contrast between Pardot and Marketbright intriguing, except for the implication that marketers can be divided into two simple categories, SMB and Enterprise, and no further distinctions are necessary. The analyst in me rejects this as an obvious and appalling over-simplification, but there’s a sneaky, almost guilty pleasure in contemplating whether it might be correct.
What’s odd about the SMB vs. Enterprise dichotomy is that both sets of systems are quite similar. Pardot and other SMB systems don’t just offer a few simple features. In fact, Pardot in particular provides advanced capabilities including progressive profiling (automatically changing the questions on forms as customer answer them) and dynamic content (rule-driven selection of content blocks within emails and Web pages). The only common feature that’s missing in Pardot is rule-based branching within multi-step programs. Even this is far from a fatal flaw, since (a) users can simulate it with rules that move customers from one program to another and (b) intra-program branching will be added by the end of this month.
What really distinguishes the Enterprise vendors is the ability to limit different users to different tasks. This involves rights management and content management features that seem arcane but are nevertheless critical when marketing responsibilities are divided by function, channel, region and product organizations. Although enterprise marketing programs are more complex than SMB programs, most SMB systems can actually handle complex programs quite well. Conversely, although SMB vendors stress their products’ ease of use, simple things are not necessarily harder to do in the Enterprise products. I’m still trying to work out a systematic approach to measuring usability, but my current feeling is that there are large variations among products with both the SMB and Enterprise groups.
Back to Pardot. It certainly considers ease of use to be one of its advantages, and I saw nothing to contradict this. Functionally, it has all the capabilities you’d expect of a demand generation product: users create personalized emails and Web pages with a drag-and-drop interface; track responders with cookies; look up visitors' companies based on their IP address; run multi-step drip marketing campaigns; score leads based on activities and attributes; and integrate tightly with Salesforce.com and other CRM systems. These are nicely implemented with refinements including:
- integrated site search, including the ability to use visitor queries as part of their behavior profiles (something I haven’t seen in other demand generation products)
- different scoring rules for customers in different segments (other systems could achieve this but not so directly)
- auto-response messages tied to completion of an email form (again, this often requires more work in other systems)
- ability to post data from externally-hosted forms via API calls (not just batch file imports) and to forward posted data to external systems
- email address validation that goes beyond the format checking available in most products to include rejecting addresses from free domains such as gmail or yahoo, and validating that the address is active on the specified host
- ability to capture campaign costs by importing data from Google AdWords and other sources
- plug-ins that let the system track emails sent by Outlook, Thunderbird and Apple email clients
This is an impressive list that suggests a thoughtfully designed system. But I didn’t check Pardot against my full list of possible features, so don’t get the impression that it does everything. For example, its approach to revenue reporting is no better than average: the system imports revenue from the sales automation opportunity records, and then assigns it to the first campaign of the associated lead. This is a common approach, but quite simplistic. More sophisticated methods give more control over which campaigns are credited and can divide revenue among multiple campaigns. Nor does Pardot have the refined user rights management and content management features associated with enterprise systems. It also limits database customization to adding user-defined fields to the prospect table. (This is another area where Enterprise vendors tend to be more flexible than SMB systems, albeit with considerable variation within each group.)
The point here is that Pardot, like all systems, has its own strengths and weaknesses. This is why the simple SMB vs. Enterprise dichotomy isn’t enough. People who need specific features won’t necessarily find them in all products of one group or the other. You really do have to look closely at the individual products before making a choice. QED.
One other factor clearly distinguishes SMB from Enterprise systems, and that’s pricing. Pardot’s lowest-price system, $500 per month, may be too constrained for most companies (no CRM integration, maximum of five landing pages, etc.),. But its $750 per month offering should be practical for many SMBs and a $1,250 per month option allows still higher volumes. (Pricing details are published on their Web site – which is itself typical of SMB products.) This pricing is low even among SMB demand generation systems. By comparison, limited versions cost $1,500 per month for Marketo and $1,000 for Manticore Technology, and both charge $2,400 per month for their cheapest complete offering. (Note: other SMB-oriented vendors including ActiveConversion and OfficeAutoPilot also have entry pricing in the $500 per month range, although neither publishes the details.)
The Pardot product began as an internal project for the marketing group at Hannon Hill, a content management system developer. Pardot was spun off about two years ago and launched its product at the end of 2007. It recently signed its 100th client.