Thursday, July 23, 2009

A/B Testing; What Result are you Testing Against?

Most B2B marketers today understand the importance of testing. A quick change in a subject line, some copy, or an offer can dramatically alter the performance of a campaign. At Eloqua, our own marketing team does A/B testing on almost every piece that is sent out. However, the question we all face as marketers, is what result we should be testing against. In B2C marketing, it is often significantly easier, as campaigns are often designed to drive explicit purchase behavior, and success can be measured against revenue results.

In B2B marketing, however, the revenue result is often significantly further away. Ideally in testing a campaign, one would be able to determine which drove more actual buying behavior, but with most buying processes stretching out over months, it would be impossible to efficiently test and launch a campaign in this manner. Inversely, testing against the common results of opens, or click-throughs only tells a small fraction of the story, as there is only a very loose tie between an email open or clickthrough, and the final result of a purchase.

Luckily, our options for what results we test against, in fact, span a spectrum. A careful selection of the end result to test against guides exactly what our tests will show. Looked at along two dimensions, we have the following options to test against:

Email Opens: The easiest to test against, but the least accurate by far. This is mainly an indicator of the quality of your subject line. Whereas this can be tested very quickly, it suffers from technological differences in email platforms, as well as a very limited testing scope.

Email Click-throughs: Slightly more difficult to test against as it requires tracking of links clicked, but this is common in most email platforms today. However, this also suffers from a very limited accuracy, as it does not indicate much about the recipient’s interest in purchasing

Inquiries: Tracking of which email drove more inquiries (landing page form submissions) is a significant bump in test accuracy. This now tests whether the subject line was compelling enough to lead to the email being read, the content and offer was interesting and drove a click-through, and the landing page was optimized to maximize form submission rates. This is a very comprehensive test of your campaign, and usually sees results within one or two days of running an A/B test campaign

Qualified Inquiries: Even higher in terms of testing accuracy is testing against qualified inquiries. If, in the A/B test, email A drove 100 inquiries, and email B drove 80, but most of B’s inquiries are the right target executive, while most of A’s inquiries are students and more junior staff members, clearly email B is the best option. Note that the dimension of lead scoring we are talking about here is explicit scoring, as we are just looking to see whether the right executives are the ones inquiring.

Opportunities: We do see an increase in accuracy as we move to Opportunities as a result to test against, but this also increases our difficulty significantly. There are often many more factors involved in qualifying an opportunity as being ready for sales than just one campaign, so this leads to a significant increase in complexity of the situation to analyze.

Revenue: This is clearly the highest accuracy to test against, but the length of a sales cycle means that it is prohibitively difficult to work with, and the ideal timing to run the campaign in question may have long passed by the time that the test results are available. The way we need to think about B2B marketing analysis means that, in general, it is nearly impossible to test the effectiveness of a single campaign on revenue in a meaningful way.

Each of these options has benefits and drawbacks, so it is important to consider what you are testing against when defining your A/B test. In my experience, in a B2B marketing situation, testing to determine which option produced more qualified inquiries often provides the optimal balance between ease of testing and accuracy of results.
Many of the topics on this blog are discussed in more detail in my book Digital Body Language
In my day job, I am with Eloqua, the marketing automation software used by the worlds best marketers
Come talk with me or one of my colleagues at a live event, or join in on a webinar


Patrick Woods said...

Thanks for the helpful post, Steve. We've been having the same kind of discussion in our office, and your chart provides a nice framework for our thoughts.

Steven Woods said...

Glad you found it useful, thanks for the kind words.

Yaj said...

Good analysis. It's probably one of the first one's that I've read, addressing the buying cycle of B2B businesses.

This would mean that in order to assess the effectiveness of a campaign, it'll take almost 4-6 months. But by then, buyer behavior,market situation, technology changes or new competitors may emerge.

How would you deduce accurate results?

Steven Woods said...

agreed, if you're going to measure all the way to results, it has to go through a full sales cycle, so yes, it will take that long... often not worth it, due to the challenges you point out. Best to measure to a level of qualified inquiry in most cases.

Tim Wilson said...

I like the visual! One other factor, though, is that, even without the time lag issue between an e-mail and revenue, there is the fact buyers ultimately make purchase decisions based on multiple touches. Campaign attribution gets messy in a hurry. It would be very dangerous to go down a path of assessing your A/B results as though an e-mail existed in a vacuum and was the lead's only interaction with your marketing efforts. Right?

Steven Woods said...

you're definitely correct on that. Perhaps it would be more accurate to look at the accuracy of the measurement beginning to decrease as one looks further out due to all the confounding factors.

Tough challenges in attribution, for sure.

Shawn D said...


We almost always test subject lines and text vs html. In the results we look for clicks and inquiries. What we often see is that an email version with fewer clicks actually generates a higher number of inquiries ( form submits).

We also look at lead quality, using the engagement score primarily as a metric. Were there more A1s that responded to version B etc.?

I agree with Dave, you have to look carefully at 3 fundamental points before any email broadcast let alone start any form of testing - a/b or multivariate.

1. The List
2. The Offer
3. The Creative

Steven Woods said...

Definitely a good point Shawn - the list, offer, and creative remain the most important elements to think through (if we can extend "offer/creative" to include "matching your message to where the buyer in in his/her buying process".