Monday, November 27, 2006

Still More on Multi-Variate Testing (Really Pushing It for a Monday)

My last entry described in detail the issues relating to automated deployment of multi-variate Web test results. But automated deployment is just one consideration in evaluating such systems. Here is an overview of some others.

- segmentation: as pointed out in a comment by Kefta’s Mark Ogne, “testing can only provide long term, fruitful answers within a relatively homogeneous group of people.” Of course, finding the best ways to segment a Web site’s visitors is a challenge in itself. But assuming this has been accomplished, the testing system should be able to identify the most productive combination of components for each segment. Ideally the segments would be defined using all types of information, including visitor source (e.g. search words), on-site behavior (previous clicks), and profiles (based on earlier visits or other transactions with the company). For maximum effectiveness, the system should be able to set up different test plans for each segment.

- force or exclude specified combinations: there may be particular combinations you must test, such as a previous winner or the boss’s favorite. You may wish to exclude other combinations, perhaps because you’ve tested them before. The system should make this easy to do.

- allow linkages among test components: certain components may only make sense in combination; for example, service plans may only be offered for some products, or some headlines may be related to specific photos. The testing system must allow the user to define such connections and ensure only the appropriate combinations are displayed. This should accommodate more than simple one-to-one relationships: for example, three different photos might be compatible with the same headline, while three different headlines might be compatible with just one of those photos. Such linkages, and tests in general, should extend across more than a single Web page so each visitor sees consistent treatments throughout the site.

- allow linkages across visits: treatments for the same visitor should also be consistent across site visits. Although this is basically an extension of the need for page-to-page consistency, the technical solutions are different. Session-to-session consistency implies a persistent cookie or user profile or both, and is harder to achieve because of visitor behavior such as deleting cookies, logging in from different machines, or using different online identities.

- measure results across multiple pages and multiple visits: even when the components being tested reside on a single page, it’s often important to look at behaviors elsewhere on the site. For example, different versions of the landing page may attract customers with different buying patterns. The system must able to capture such results and use them to evaluate test performance. It should also be able to integrate behaviors from outside of the Web site, such as phone orders or store visits. As with linkages among test components, different technologies may be involved when measuring results within a single page, across multiple pages, across visits and across channels. This means a system’s capabilities for each type of measurement must be evaluated separately.

- allow multiple success measures. Different tests may target different behaviors, such as capturing a name, generating an inquiry or placing an order. The test system must be able to handle this. In addition, users may want to measure multiple behaviors as part of a single test: say, average order size, number of orders, and profit margin. The system should be able to capture and report on these as well. As discussed in last Wednesday’s post, it can be difficult to combine several measures into one value for the test to maximize. But the system should at least be able to show the expected results of the tested combinations in terms of each measure.

- account for interactions among variables: this is a technical issue and one where vendors who use different test designs make claims that only an expert can assess. The fundamental concern is that specific combinations of components may yield results that are different from what would be predicted by viewing them independently. To take a trivial example, a headline and body text that gave conflicting information would probably depress results. Be sure to explore how any vendor you consider handles this issue and make sure you are comfortable with their approach.

- reporting: the basic output of a multi-variate test is a report showing how different elements performed, by themselves and in combination with others. Beyond that, you want help in understanding what this means: ranking of elements by importance; ranking of alternatives within each element; confidence statistics indicating how reliable the results are; any apparent interaction effects; estimated results for the best combination if it was not actually tested. A multi-variate test generates a great deal of data, so efficient, understandable presentation is critical. In addition to their actual reporting features, some vendors provide human analysts to review and explain results.

- integration difficulty and performance: the multi-variate testing systems all take over some aspect of Web page presentation by controlling certain portions of your pages. The work involved to set this up and the speed and reliability with which test pages are rendered are critical factors in successful deployment. Specific issues include the amount of new code that must be embedded in each page, how much this code changes from test to test, how much volume the system can handle (in number of pages rendered and complexity of the content), how result measurement is incorporated, how any cookies and visitor profiles are managed, and mechanisms to handle failures such as unavailable servers or missing content.

- impact on Web search engines: this is another technical issue, but a fairly straightforward one. Content managed by the testing system is generally not part of the static Web pages read by the “spiders” that search engines to use index the Web. The standard solution seems to be to put the important search terms in a portion of the static page that visitors will not see but the spiders will still read. Again, you need to understand the details of each vendor’s approach, and in particular how much work is involved in keeping the invisible search tags consistent with the actual, visible site.

- hosted vs. installed deployment: all of the multi-variate testing products are offered as hosted solutions. Memtrics and SiteSpect also offer installed options; the others don’t seem to but I can’t say for sure. Yet even hosted solutions can vary in details such as where test content is stored and whether software for the user interface is installed locally. If this is a major concern in your company, check with the vendors for the options available.

- test setup: last but certainly not least, what’s involved in actually setting up a test on the system? How much does the user need to know about Web technology, the details of the site, test design principles, and the mechanics of the test system itself? How hard is it to set up a new test and how hard to make changes? Does the system help to prevent users from setting up tests that conflict with each other? What kind of security functions are available—in a large organization, there may be separate managers for different site sections, for content management, and for approvals after the test is defined. How are privacy concerns addressed? What training does the company provide and what human assistance is available for technical, test design and marketing issues? The questions could go on, but the basic point is you need to walk through the process from start to finish with each vendor and imagine what it would be like to do this on a regular basis. If the system is too hard to use, then it really doesn’t matter what else it’s good at.


Jeff said...


thanks for your ongoing attention to this growing niche of web optimization.

re: the integration of optimization technology ("...the amount of new code that must be embedded in each page..."), you will note that SiteSpect is the only solution not requiring JavaScript or server-side coding. Because of this, time to implement (and iterate) tests is greatly reduced.

SiteSpect, Inc.

David Raab said...

Thanks Jeff. Actually, I DID note that when I reviewed your site before writing my post, but didn't want to say anything specific about your product without having explored it in more detail. The logic behind my language was that "none" is also a valid "amount of new code".

As to how SiteSpect does it--I thought you were somehow intercepting the Web traffic and substituting the test components that way. But now that I double-check your site, I see no mention of this. Perhaps I have you confused with someone else, or perhaps I'm just not seeing something I saw previously. In any event, would you care to explain how SiteSpect works in this regard?

Jeff said...


"none is a valid amount" ... just like a quant to say something like that (you're in good company :)

Let me try to elaborate a bit more:

SiteSpect operates on the network datastream and inserts/modifies/removes content on-the-fly as visitors browse through a site. In this fasion, SiteSpect is non-intrusive; there's no JavaScript to add, maintain or remove. Think of SiteSpect like a packet sniffer (which tracks user behavior), but with the added power of being able to actually change the web server's content on a user-to-user basis through multivariate testing.

The benefits of non-intrusiveness are many, but SiteSpect users cite the following as significant advantages:

- tests are created and launched in hours (not days or weeks). This is because there's no need to modify page code, stage code changes, or run through onerous Q/A cycles. This streamlined testing cycle means that more marketing questions can be asked (and answered) in less time, and findings can be put into action sooner.

- the ability to test and optimize dynamically-generated content. A prime example is a site's search engine, like (fictitious), where you search for "sterling silver bracelet". The site's search results depend on your query at that very moment (and could change in real-time due to inventory, merchandising, personalization, etc.) Now, site-side search is ripe for optimization, right? The visitor is expressing intent, after all. So what if you want to test whether search results should be listed one-per-line vs. a 4-across grid? Or, swap the product image from left to right side, highlight discount/promotion in blue, enlarge/reduce image size, etc. SiteSpect is uniquely able to test these sorts of changes because it operates on the live datastream using a very flexible, standardized pattern matching system.

Hope this helps clarify.