Data Archives - Alex Birkett

The 9 Best Business Automation Software Tools in 2025

Shadrack Wanjohi — Thu, 16 Sep 2021 14:35:01 +0000

In addition to getting rid of repetitive tasks through automation, customer and employee needs are evolving, leading to an increase in the adoption of business automation software.

Some companies are looking to use one automation platform to improve the efficiency of their business processes. Others are looking for business automation software that will help them improve the customer experience while others want to improve employee experience.

To achieve each of these goals, here’s how different departments in companies have adopted business automation software in the past 12 months:

Image source

The rate of adoption differs between different departments, depending on the priorities that each company has.

So, whatever objectives you want to achieve using business automation software, you’re in the right place because In this post, we’re going to talk about eight of the best business automation software and tools. They will help streamline your operations, improve your business processes, and deliver business results.

But first…

What is business automation software?

Business automation software is a set of tools that help you automate recurring business activities, streamline your workflows, and improve the efficiency of your business processes.

Business automation software can either be an all-in-one platform that helps you automate different business operations from one place or a standalone automation solution that is dedicated to automating a specific business operation.

And if you’re like most marketers, you have automated a business process or two at work.

It could be email automation to support your marketing activities, chatbot software for customer onboarding, or using your favorite market research tools for market research to support product development.

Even though automation helps you get rid of repetitive tasks, 40% of automations today still lack structure because businesses aren’t allocating enough money to spend on business automation software.

This means that most automations are misaligned with what employees and customers need and expect, making it difficult to deliver the business results you’re looking for.

And if you want to scale your business operations, improve employee productivity, or deliver better buying experiences, you’ll need to go beyond basic automation and think of automating workflows. And if you want to level up, consider using an automation tool that relies on artificial intelligence.

Editor’s note: I’m going to use some affiliate links when possible to try to earn some revenue from my content. These don’t change the opinions espoused in the content nor the style in which they are written. If I think a product sucks, I’m not going to say otherwise. This is just a bonus and a way to fund the whole operation. Anyway, enjoy the article!

The 9 Best Business Automation Software Tools

That said, here are the nine best business automation tools you need:

Make
Workato
Zapier
SendInBlue
ActiveCampaign
HubSpot
HootSuite
Google Data Studio
Airtable

1. Make (Formerly Integromat)

I’ve done all the shopping around.

I’ve been a power user of Zapier. I’ve built advanced multi-chain branching logic in Workato. I used to use Automate.io before they got acquired by Notion.

But my favorite for all purpose integration and automation? Make.

Why?

First off, price. They start out for free up to 1000 operations, which is substantial. The next level after that is only $9/month.

This is miraculous when you compare this to enterprise solutions like Tray.io, Mulesoft, or Workato (though clearly there is value beyond price in these solutions).

Next, usually with low price, you trade power for affordability. Not with Make.

I’ve used it to automate SEO reporting data and blend it with Google Analytics data in a Data Studio dashboard. I’ve integrated pre and post-signup data to inform the entire customer journey and funnel. I’ve used it for simple stuff, like emailing me when someone tweets with a specific keyword.

It’s awesome. Easy to use for a non-coder, but customizable and powerful if you know how to sling a little code.

One thing I particularly like about Make is they are inherently integration focused. This prevents major problems that occur with purely automation platforms like Zapier, which can break often. It also enables a broader swatch of integrations and automations across your entire tech stack, something that all-in-one software like Sendinblue and HubSpot suffer from.

So all in – price, power, ease – I’ll take Make.

Also, very important side note: I’ve found Make to be the best tool for AI workflows and agentic processes. So learn this one if you want to be future prepped.

G2 Score: 4.8/5

2. Workato

In their report, Future of B2B Buying Journey, Brent and Nick make the following observation;

“The typical buying group for a complex B2B solution involves six to 10 decision makers, each armed with four or five pieces of information they’ve gathered independently and must deconflict with the group. At the same time, the set of options and solutions buying groups can consider is expanding as new technologies, products, suppliers and services emerge.’’

Nurturing prospects in such an environment feels like jumping through hoops.

Their buying journey isn’t linear but you still want to be present during critical touch points without forcing them down a specific path since you will be pushing them away.

So, how do you make sure that your prospects and leads have relevant information when they need it as they prepare to meet other decision makers on time?

Enter Workato, a tool that uses AI for business automation.

In this case, the lead routing feature comes in handy to help you automate assigning leads and prospects to the right sales reps for nurturing.

Once a prospect or lead enters their information on one of the forms on your website, Workato’s LeadBot analyzes it and then uses it to automatically assign it to the right sales rep. Your sales reps receive a notification inside Slack where they can either accept or reject the lead (depending on their workload).

That way, your reps take action immediately and make sure that your leads have the right sales enablement content they need to make a case for your product. Here’s a short video of lead routing in action using the LeadBot:

And that’s not all.

Workato also provides you with a library of more than 400,000 automation recipe templates to help you explore different automation options. The recipes are two-fold:

Community recipes: Each app comes with its set of recipes that you can use to automate your business processes.

Image source

Search for an app you’d like to use and see the different recipes Workato provides for that app.

Recipe collections: A list of pre-made recipes for common automated processes for different apps.

Image source

So whether you’re in finance, human resources, higher education, or IT and are looking for an advanced and intelligent automation tool you will find a relevant recipe in one of Workato’’s collection of automation recipes.

Ideal for: Medium and Enterprise level users looking for an AI powered business process management software.

G2 Score: 4.7 / 5.0

3. Zapier

In a typical workday, you’ll update a few projects you’re working on, process data, or even move information from one tool to another. Some of these tasks take a few minutes and others longer depending on what you’re working on.

However, these simple tasks add up and you find yourself spending more time each day on non-essential tasks leaving you rushing to beat critical deadlines for your projects.

And for 54% of employees, automation could save them more than 20 hours a week.

And what tool would you need to save more than 20 hours every week through business process automation? Zapier

It comes with Zaps that you can set up to trigger specific actions in your workflow so you’ll no longer struggle to get back to a state of flow after taking a break to update projects or process data.

To get started, create an account and install the Chrome add on. Once you click on the add-on, you’ll receive automation suggestions depending on the webpage you’re on. Take a look at the Zap suggestions that I receive once I open Omniscient Digital:

You also get suggestions on other Zaps you can create depending on your browsing history. If you’re a frequent Gmail user, then you will get suggestions on what Zaps you can set up.

While the Zaps and suggestions make it easier to automate simple tasks, Zapier also helps you build workflows and connect apps that don’t have a two-way integration.

To see this in action, select your role and the tools you already have in your tech stack from the list provided. Zapier then provides you with workflow automation templates depending on your selection. In this case, a HR professional would want to see the Zap templates that would work with BambooHR for a better employee onboarding process:

Image source

Zapier also allows you to create custom logic paths and use filters to keep your data clean and avoid having to clean things up after the Zaps run. With custom logic paths, your workflow only runs if it meets the conditions you’ve set:

Image source

Filters allow your Zaps only if the information for what you’ve set as a filter;

So if you’re looking to supercharge your workflows through filters, conditional logic, webhooks, and multi step Zaps, then go for Zapier.

Ideal for: Small, Medium, and Enterprise level users in all industries who need a tool to automate their workflows.

G2 Score: 4.5 / 5.0

4. ActiveCampaign

The tools you use to drive awareness, nurture, and convert your leads need to help you deliver a unified customer experience.

Most companies think that they’re doing a stellar job at delivering a unified experience but research from Salesforce paints a different picture.

Image source

The more tools you have, the more disconnected your customers feel from your brands.

To avoid dropping the ball and losing your leads and customers in the process due to a disconnected customer experience, consider using ActiveCampaign for marketing automation.

To get started, use the drag and drop builder to build your workflows, whether that’s for welcome emails or cart abandonment recovery.

https://www.alexbirkett.com/wp-content/uploads/2021/09/automations.mp4

Set goals for the automation campaigns you’re running and use the advanced analytics from ActiveCampaign to track them and see how your campaigns are performing.

ActiveCampaign also comes with an attribution feature to help you see what channels are driving more conversions for your site. If you’re not sure which channel to focus your efforts and resources on, use the A/B testing feature to identify the channel with more conversions.

Once you’ve set all this up, it’s time to use ActiveCampaign to deliver a unified customer experience. Start by integrating it with any tool you have in your tech stack.

Lets say, for instance you’re dealing with refund requests from customers after a flash sale and you want to make sure that you approve them without unnecessary delays.

So, if you’re using a tool like Freshdesk as your CRM and ActiveCampaign for lead nurturing, an integration between the two tools will provide you with a single source of truth about the customer’s information to help you process the refund fast. Your support team won’t have to keep asking customers to explain their issues all over again inorder to verify and approve the refund request. Besides, with this information in one place, you’ll have an easier time addressing such an issue to avoid a repeat of the same in future promotions.

Ideal for: Small and Medium level businesses in tech, ecommerce, higher Learning, real estate, and nonprofits.

G2 Score: 4.6 / 5.0

5. HubSpot‌

Using different tools to manage your business operations comes with its own challenges. You will have to pay for each tool separately. Besides, the bigger your team, the more money you’ll have to pay for a seat in each of these tools.

Your team members also have to learn how each of these tools work. Over time, these costs add up making it difficult to run a sustainable business automation program. If working with several tools to automate your business feels like a chore, consider HubSpot.

While most know it as an all in one marketing platform, it also doubles as a business process automation software for teams looking to automate all their workflows and processes in one place.

With Hubspot, you get the Marketing, Sales and Service hubs, which all integrate and align all your marketing, customer support and sales teams.

Build your workflows and sequences from scratch or use the templates to streamline your business process and pass information about leads with teams from other departments:

Image source

When creating your workflows from scratch, use Javascript to create a custom and programmable automation for each of your business processes for more efficiency.

Ideal for: Startups looking for an all in one enterprise level business automation software.

G2 Score: 4.4 / 5.0

6. SendInBlue

If you’re a small business owner with a large email list, you’re likely to lean towards a tool like Sendinblue that allows you to pay depending on the number of emails you send per month.

After all, this model makes sense in the long run, especially if you’re in ecommerce and have to send a different number of emails per month depending on the season.

Even with their flexible pricing model, Sendinblue doesn’t skimp on features that make business process automation easier.

Using their user friendly drag and drop builder, visualize your workflows and see exactly what will happen when nurturing leads, reengaging subscribers, or even running a seasonal promotion.

Sendinble also allows you to run A/B tests on your marketing automation workflows to help you optimize and use the most effective workflow.

Image source

If you’re looking to create custom workflows for your welcome emails, upsells, and cross sells, pick an entry point, identify different actions that will trigger the workflow and the conditions that need to be in place before the workflow runs.

Sendinblue also allows you to build advanced automation workflows by installing a plugin to track the behavior of your web visitors on your website to allow you to send targeted communications. This saves you time as you will remain in contact with visitors whenever they need you.

G2 Score: 4.6 / 5.0

7. Hootsuite

In addition to increasing the efficiency of your social media marketing activities through scheduling posts and helping you track analytics, Hootsuite comes in handy with its integrations to other tools that support your social media marketing efforts.

For example, Canva’s integration with Hootsuite allows you to save time as you create social media visuals using Canva without leaving Hootsuite:

Image Source

Once your visuals are ready, schedule or post them on different social media channels and monitor the performance of your content without leaving Hootsuite.

Image source

And when it comes to managing customer complaints and feedback from different social media channels, Hootsuite’s integration with Zendesk makes it easier for you to merge all customer feedback and complaints and address them without having to move from one channel to another without leaving Hootsuite.

Image source

You can also handover your customer requests from social media to the relevant customer support representative, and let them handle the request as soon as possible hence improving the level of customer satisfaction.

See how “Open” requests boldly stand out and you can even filter all the incoming tickets to help you know whether you missed anything, without leaving Hootsuite.

Ideal for: Small and medium level businesses looking to scale their social media activities through automation.

G2 Score: 4.1 / 5.0

8. Google Data Studio

For each tool you’re using, you’ll have a reports section and an option to export your data to an excel sheet for analysis.

Assuming you’re using more than three tools in your department, then you’re going to be analyzing three or more excel sheets every other time.

It is time consuming and difficult to identify how different trends are related. For example, if your traffic increases and there’s no change in conversions, how do you explain this if you don’t have all these metrics in a single dashboard?

Besides, among the top marketing skills that companies look for in new hires, 55.9% are looking to hire marketers who can measure and collect data.

Image source

Why would more than half of the companies consider data collection and analysis as an essential skill?

Because tools collect data for you and provide you with insights based on their capabilities. As a marketer, your ability to tell a story with the data you analyze to help your team make decisions makes you indispensable.

And in case you missed it in our image above, SparkToro used Google Data Studio to share the results of their survey on essential marketing skills companies are looking for.

How do you use Google Data Studio to tell a story with the data you collect and analyze?

Start by creating your Google Data Studio and connect it with the tools you’re using. In the list of Google Connectors, you will find a list of tools you can use to import data and view it on a single dashboard.

Once you connect your tools, you can filter your data to find what you’re looking for, edit your reports, and even share your data with relevant stakeholders.

Image source

That way, it’s less work for everyone. All the data is in one place and they can follow everything you’ve created for them to see in real time.

Ideal for Businesses of all sizes looking to automate how they view data.

G2 Score: 4.3 / 5.0

9. Airtable

According to a study by SEMrush, the use of collaboration and workflow management tools increased by 8% between 2019 and 2020.

It is a small improvement, but it signals that most content teams want to improve their process management approach when it comes to content marketing by getting rid of repeatable tasks when creating content.

If you’re running a content operation, think of everytime you have to create briefs in Google Docs, send the links to writers and have a brief email exchange confirming that they received the work and they will meet the deadline.

If you’re working with one or two writers, that’s easy to manage. With multiple writers across different time zones, it’s hectic.

Also, depending on the organization you’re working with, content might need approval by the legal department or any other senior person on the content team which slows down the operation.

With Airtable, however, makes project management easier by automating your content workflow by automating some of these activities to save time and get more work done.

To see how this works, take a look at how Tommy Walker has set up his content operations inside Airtable and how each of these automations work:

It’s a solid set up, right? So if you want to have something similar for your content team, sign up for Airtable and use one of the content calendar templates provided and modify it to match with what you see in the video.

Best for: Freelancers and content teams with a small and medium level business.

G2 Score: 4.6 / 5.0

Conclusion

Before you get caught up in analysis paralysis, remember what we said at the beginning of this post: most automations are basic and lack structure.

So before you pick an automation solution you’d like to use, know whether you want to automate a whole business operation or specific elements of your business operations that are time consuming and annoying.

Knowing this will help you set the correct internal expectations as you decide to choose what tool to use and what you expect from the tools you choose to use.

Which of these business automation software would help meet your business needs?

The post The 9 Best Business Automation Software Tools in 2025 appeared first on Alex Birkett.

The 15 Best Website Analytics Tools in 2025

Alex Birkett — Wed, 21 Jul 2021 15:05:51 +0000

Website analytics as an industry has come a long way.

Once upon a time, it wasn’t so easy to collect data (nor analyze it) from your web pages.

Now, we have an absolute gluttony of web analytics software available to us, ranging from the free and open source to the enterprise and highly advanced.

I’ve worked in analytics for years. By trade, I’m a conversion rate optimization specialist as well as a content agency founder. In both of these roles, my foundational skill set is in analysis. Thus, I’ve spent more time in website analytics and digital analytics tools than any other software.

And I’ve got opinions on the best ones. I’ll outline them for you here.

First, however, let me describe what website analytics are and how they differ from other forms of analytics, like marketing analytics, product analytics, or other forms of data.

The 15 Best Website Analytics Tools in 2025

Google Analytics (GA4)
Snowplow Analytics
Heap Analytics
Adobe Analytics
HubSpot
Matomo
Yandex Metrica
Amplitude
Mixpanel
Fullstory
Woopra
HotJar
LuckyOrange
Mouseflow
Medallia

Here’s a comparison table summarizing the key features, pros, and cons of the web analytics tools mentioned in the list:

Tool	Key Features	Pros	Cons	Price	G2 Score
Google Analytics (GA4)	Free-tier tracking, enhanced e-commerce, real-time analytics, cohort analysis	Ubiquitous, free for basic use, powerful integrations, customizable tracking	Enterprise version is expensive, data sampling in free version, limited raw data access	Free, then enterprise	4.2
Snowplow Analytics	Open source, raw event data, customizable, advanced analytics	Data ownership, highly customizable, ideal for advanced analysts	Requires technical setup, not plug-and-play, no transparent pricing	Request demo	4.6
Heap Analytics	Automatic event tracking, historical data, user behavior analysis	Easy setup, powerful integrations, complete historical data	Can overwhelm with data, requires careful analysis	Free, then custom pricing	4.4
Adobe Analytics	Advanced segmentation, cohort analysis, integration with Adobe suite	Enterprise-grade power, great for large-scale businesses	Expensive, steep learning curve	Contact sales	4.0
HubSpot	CRM integration, marketing attribution, reporting tools	All-in-one marketing solution, good for SMBs, integrates with other HubSpot tools	Limited customizability, data ownership concerns, expensive upgrades	Free basic, $50/month+	4.4
Matomo	Privacy-focused, customizable, on-premise/cloud options	Open source, GDPR-compliant, great Google Analytics alternative	Requires technical expertise for advanced setup, less intuitive UI	Starts at $29/month	4.2
Yandex Metrica	Completely free, session replays, heatmaps	Free, solid features for user behavior analysis	Limited global adoption, primarily Russia-focused	Free	4.3
Amplitude	Cohort analysis, user segmentation, A/B testing integration	Excellent for product analytics, advanced automation, strong growth	Expensive for larger use cases, some limitations in channel attribution	Free, then custom pricing	4.5
Mixpanel	User identification, funnel analysis, retention metrics	Solid for product analytics, detailed user behavior tracking	Limited flexibility in custom reporting, can feel rigid for non-standard use cases	Free, $25/month+	4.4
Fullstory	Session replays, heatmaps, form analytics	Combines qualitative and quantitative insights, excellent for conversion research	Expensive, unintuitive UI	Contact sales	4.5
Woopra	Customer journey analytics, automation tools, retention tracking	Great for tracking user journeys, automation capabilities	Expensive paid plans, less adoption compared to competitors	Free, $349/month+	4.4
HotJar	Heatmaps, session replays, surveys, feedback tools	Affordable, intuitive UI, all-in-one CRO tool	Limited advanced analytics, less customizable compared to enterprise tools	Free, $39/month+	4.3
LuckyOrange	Heatmaps, session replays, live chat	Affordable pricing, live chat feature, flexible	Less recognized, fewer advanced analytics features	$10/month+	4.3
Mouseflow	Session replays, heatmaps, funnel analysis	Easy to use, qualitative insights, affordable	Limited advanced segmentation, slower than competitors	Free, $24/month+	4.7
Medallia	Digital experience scoring, session replays, journey mapping	Enterprise-grade, deep insights into user personas, strong A/B testing integration	Expensive, unclear pricing, steep learning curve	Request demo	4.4

Recommendations Based on Use Case:

Best for beginners: Google Analytics, Yandex Metrica
Best for privacy-focused businesses: Matomo
Best for product analytics: Amplitude, Mixpanel
Best for qualitative insights: HotJar, LuckyOrange, Mouseflow
Best for enterprise users: Adobe Analytics, Decibel Insight
Best all-in-one marketing platform: HubSpot

1. Google Analytics

Ahhh Google Analytics. GA. The tried-and-true, nearly ubiquitous web analytics tool.

If you’re a marketer, or really anyone doing stuff with websites or tech, you’ve almost certainly seen a Google Analytics report. Actually, if you’re a marketer and don’t know the basics of Google Analytics, I’d run to your nearest course supplier today (free: Google’s Academy. Paid: CXL Institute).

What is Google Analytics? Well, out of the box, it’s going to collect clickstream data based on cookies (review this). You’ll get channel reports: how many people came to your site from PPC, SEO (search engine referrals from Google/Bing/Organic), social media, Direct, Referral, Email, etc.?

You’ll get landing page and behavioral reports: how many people entered the site through a given page? How many page views, event triggers, site searches were there?

You’ll get engagement metrics. What’s the bounce rate, exit rate, entrance rate of a given page? How many pages per session do you have on average? What’s the average time on site?

And you’ll get user data. What browser type and device type is a visitor using? What’s their age and demographics? Are they a new or return visitor? How *many* sessions have they had?

And that’s just the basics; wait until you learn about enhanced e-commerce, real-time analytics, cohort analysis and all the cool stuff in GA4, and the literally endless possibilities introduced by Google Tag Manager and measurement protocol.

If you can think it, with Google Analytics, you can probably track it.

Cool thing? Starts out as a free tool.

Price: Starts free, then Google Analytics 360 is incredibly expensive

G2 score: 4.2

2. Snowplow Analytics

You may or may not have heard of Snowplow.

If you’re a hardcore analyst, you’ve probably heard of it. If you’re new to the game, you probably haven’t.

Snowplow analytics is indicative of a larger trend: people want to own their data.

Google Analytics is awesome, but there’s a lot of fuckery with our things are surfaced up and reported, and it’s often difficult to get raw and upsampled data in the format you’d like it (that is, unless you’re on the enterprise version).

Snowplow is an open source data analytics solution that is based on event tracking (arguably a superior way to map out your measurement strategy – source). THey’ve quickly risen in popularity the last few years, and I’d expect that only to grow in the future.

Here’s how it would work if you were a Snowplow analytics user:

Sure, a bit more complicated than plug-and-play Google Analytics. But if you’re serious about measurement and optimization (plus data ownership), then it’s likely worth it.

Price: Unavailable (request demo)

G2 score: 4.6

3. Heap Analytics

Heap Analytics, in many ways, is on the opposite side of the spectrum of web analytics tools as Snowplow.

With Snowplow, you’ll map out your events, set everything up proactively, and control the flow of your data from start to finish.

Heap Analytics, on the other hand, automatically tracks everything it can about your users. This makes it one of the easiest tools to set up and use out of the box, but it also means you could have a gluttony of information.

It all depends on what type of a data-driven culture you have. For example, engineering-heavy companies will have no problem unlocking the power of a platform like Snowplow (or Matomo), whereas development resources could be a constraint for many companies. Heap lets you surpass some of this resource bottleneck, though then the burden is on the analysis of the data you’re collecting (there’s no free lunch, they say).

In any case, Heap has made waves in the digital analytics space, and for good reason. It’s a powerful product analytics and website analytics platform that performs a wide variety of functionalities, including automated ETL, automatically merging anonymous and identified behavior, and access to complete historical data.

It also integrates well with A/B testing tools like Optimizely and VWO.

Price: Starts free (then customized pricing based on events)

G2 score: 4.4

4. Adobe Analytics

Adobe Analytics is the name I think of when I think of enterprise website analytics.

Of course, you’ve got the whole suite of Adobe’s marketing productings, including Adobe Target, and now, Marketo. So if you’re already using some or all of these tools, then Adobe Analytics is the obvious choice.

In many ways, Adobe Analytics represents the high end of the analytics market. It’s incredibly powerful and highly customizable.

I’ll be honest here, though, it’s probably the tool on this list I’ve got the least hands-on experience with. It’s simply not a tool many of the organizations I’ve worked with have used.

That said, my friends in the analytics industry tend to love this one for the aforementioned reasons: power, complexity, scale.

Price: talk to sales

G2 score: 4

5. HubSpot

HubSpot is a full suite marketing and sales platform that makes pretty much every type of MarTech tool.

While famous for the CRM, email marketing, and marketing automation offerings, they have a pretty decent set of website analytics tools as well.

Three things to me stand out about the product:

First, since they have all the other marketing tools (landing pages, email, live chat, ads, CRM), you’re really getting a great single source of truth for your marketing and customer data. This eliminates a lot of the data engineering work typically required to blend and enrich the data you’re collecting, so you can immediately make use of it.

Second, they’re quite specifically good at marketing analytics and attribution. This is something that other tools, even Google Analytics, are frustratingly bad at. Being able to tell when, how, and from where someone came to your website is such a core business question, and HubSpot helps you answer that.

Finally, the basic reporting tools start out free, and the advanced stuff comes with any of the pricing tiers that include other tools like email and marketing automation.

There are many downsides, though. Their reporting and analytics are basically the opposite of customizable. And it would be hard to argue that you’re the owner of your data when using HubSpot. It’s nearly impossible to use the analytics API and construct things in a customizable format. So for the advanced analysts, you’ll almost certainly end up feeling that the tool is lacking in power and customizability.

But I think this one is good for marketers, and you can still set up something like Snowplow or Google Analytics for more advanced uses.

Price: starts free (then $50/month for Starter tier)

G2 score: 4.4

6. Matomo

Matomo, formerly known as Piwik, is like the DuckDuckGo of the analytics world.

Google, famous and well-liked for many things, is not necessarily a beacon of privacy and security. Matomo’s positioning here is that of the Google Analytics alternative that actually cares about user privacy.

Open source and with cloud and on-premise options, you can definitely call this platform customizable. And for IT and security teams looking for something with a little more assurance, this is a great Google Analytics alternative.

Price: depends on usage (starts near $29/mo)

G2 score: 4.2

7. Yandex Metrica

Yandex Metrica is the Russian Google Analytics. It’s also the third most widely used web analytics service on the web.

Also, it’s completely free.

Price: completely free

G2 score: 4.3

8. Amplitude

Amplitude might be the most popular product analytics tool in the game right now.

They came on my radar around 2015, back when they positioned themselves as “mobile app analytics” or “app analytics.”

The feature that I remember them for was an automated way to analyze and identify the “aha moment” in a product (like Facebook’s famous 7 friends in 7 days heuristic). This isn’t an altogether hard thing to do, analysis-wise – it’s a multivariable regression in most cases (and in many, many cases, it’s actually better to find these things qualitatively).

Now, they’ve expanded capabilities around automating the analysis portion of a PM or marketer or analyst’s job. Things like funnels, cohort analyses, identifying personas and segments, identifying friction points, and more.

That’s just on the analytics side of things.

They just raised a shitload of money from Sequoia and have built out solutions for machine learning driven personalization and have recently launched an experimentation platform as well.

I’m bullish on Amplitude. I think the team is incredibly smart and the product has gotten remarkably better over time. If I were to pick a product analytics tool, this would be the one. I’d still probably combine that with a marketing analytics solution, especially to fill the gaps on channel attribution and conversion funnels.

Price: starts free (then contact sales)

G2 score: 4.5

9. Mixpanel

Mixpanel is the first company I can remember that marketed themselves as a product analytics solution.

They’ve been around since 2009, which seems like an eternity for the modern analytics industry. They came up in the same crop as tools like KissMetrics, which hoped to fill in the gaps that Google Analytics missed.

In terms of functionality, Mixpanel is excellent, especially when it comes to user stitching and identification. This was even a core differentiator before they called themselves a product analytics tool, and it’s something that led many users to choose them over Google Analytics in the heyday.

For transparency, I haven’t used Mixpanel in the past few years. But from talking to friends who use the tool, they say it’s just about on-part with Amplitude and it’s improving greatly with time. The big frustrations seem to be around the lack of custom formulas and, generally, flexibility in reporting and analysis.

This, however, is something I’ve experienced with almost all ‘product analytics’ tools. They’re great at out-of-the-box reporting, especially with things like cohort analysis and funnels. But when it comes to fringe business questions, you really have to work to pull together hacky solutions.

Anyway, if you want to measure the full funnel from conversion to product engagement to retention, Mixpanel is a solid solution.

Price: starts free, then $25/month for starter tier

G2 score: 4.4

10. Fullstory

Alright, we’re now somewhat stepping away from the world of purely quantitative website analytics, and now we’ve got a tool that does a mixture of qualitative and quantitative measurement.

Fullstory is a popular web analytics solution used by tons of SaaS companies (as well as other industries, but particularly SaaS). It’s got several powerful features to track engagement across your website:

Session replay videos
Heatmaps (clickmaps, scroll maps, hover maps)
Custom funnels
Form analytics

There is some irony in the fact that, at least in my humble opinion, Fullstory’s UI is quite unintuitive. It took me a full week or two to figure out how to find the heatmap reports for instance.

But they make up for it in their ability to tie together pretty much the whole customer journey of individual users, which is incredibly useful when it comes to doing conversion research and identifying user experience bottlenecks. This marketing platform helps you view customer behavior on individual visitors, such as where they’re scrolling, where they’re clicking, and where they’re bouncing.

This is a great compliment to a tool like Google Analytics, which purely reports on quantitative data.

Price: talk to sales

G2 score: 4.5

11. Woopra

Woopra is another customer journey analytics tool that seeks to give you a more holistic view of your visitors, users, and customers.

Their three key features are:

Journeys – tracking users across multiple touchpoints on your website and in your product
Trends – mapping the growth of key metrics
Retention – measuring and optimizing user retention

A cool differentiator is that Woopra not only gives you the ability to track, analyze, and report on data, but they also help you *do* something about it. They have a set of automation tools that allow you to trigger different types of experiences based on the data you’re collecting.

I like Woopra a lot. They’re another tool that I feel has just gotten progressively better, and much different from other tools in the space, over time.

Price: starts free (then $349/mo)

G2 score: 4.4

12. HotJar

HotJar is the first tool on this list that I would consider purely qualitative.

It’s also one of my favorite conversion rate optimization tools out there.

Here are just some of their features:

Session replay videos
Heatmaps (scroll maps, click maps, hover maps)
On-site polls
Customer surveys
Funnels
Form analytics
Visual website feedback forms

There are three things I really like about HotJar:

It’s affordable. Probably the best value for what you get in the market.
It’s super easy to use. I could grok it within a day.
It’s all-in-one. I’d rather not weight my site down with a ton of javascript snippets if I could instead just use one tool.

If I were forced to choose one qualitative tool, it would probably be HotJar. I’m not alone; this tool is incredibly popular. It’s rare to meet a website optimizer who hasn’t used this tool extensively.

Many people will recommend specific products for each of these features (like CrazyEgg for heatmaps and Qualaroo for polls). This is fine, and it’s important for many companies with advanced use cases. But for many companies, HotJar will do the job just fine

Price: starts free (then $39/month)

G2 score: 4.3

13. LuckyOrange

LuckyOrange is a qualitative user experience analytics tool that offers an incredible array of features.

Not only do they have heatmaps, session replay videos, form analytics, funnels, and polls, but they also have a live chat option. This not only helps you collect insights and feedback from visitors, but may actually help convert more visitors as well.

Apart from their stacked feature set, they’ve also got some of the best pricing options out there, the first tier starting at just $10/month. I’ve found the tools incredibly easy to use as well as flexible.

While HotJar may have greater brand awareness, LuckyOrange is a formidable alternative and in some ways is superior (though I still love the HotJar poll feature and CrazyEgg for heatmaps).

Price: starts at $10/month

G2 score: 4.3

14. Mouseflow

Mouseflow is another qualitative insights product with session replay videos, heatmaps, conversion funnel analytics, form analytics, and website feedback collection.

I haven’t used Mouseflow in a few years, but in the past I’ve always found it useful and easy to get started. Additionally, they’ve got a freemium option, so you can easily kick the tires before committing to anything.

Price: starts free, next tier begins at $24/month

G2 score: 4.7

15. Medallia

I’d consider Medallia the enterprise option for qualitative website analytics.

They’ve got all the features you’d expect: session replays, form analytics, heatmaps, and conversion funnels. These core features are also highly customizable with Medallia.

But they’ve also got some unique features, like the Digital Experience Score, a composite metric to score your website experience and identify core issues for optimization. This helps you identify and prioritize issues quickly, as well as see progress over time, which are two issues qualitative research tools tend to struggle with.

They’ve also got features to help you map out the customer journey as well as do smart segmentation. This gets you started on personalization as well as giving you deeper insights into different personas and user types that may be engaging with your website.

Price: Not available (request a demo)

G2 score: 4.4

Conclusion

Web analytics tools are a must in the modern era.

There are tons of options, ranging from the quantitative and behavioral (like Google Analytics, Snowplow and Adobe) to the product analytics category (like Mixpanel and Amplitude) to the qualitative (like HotJar, LuckyOrange, and Mouseflow). Some tools are even combining the qualitative and quantitative worlds to give you a more holistic view of the customer journey (such as Fullstory and Medallia).

There are also tools I left off of this list, like the whole class of SEO tools out there that help you monitor search performance, rankings, and backlinks (like Ahrefs or SEMRush).

There are also content specific tools like ChartBeat that give you detailed reports and real-time data user behavior specifically related to content, helping you to craft better marketing campaigns and marketing strategies. Alas, if I’d included them this list would be crazy long and barely useful.

At the end of the day, though, web analytics tools are necessary but not sufficient for progress. You’ll still need a team trained in data literacy and a data-driven culture to analyze your analytics data and actually take action on the insights you gain.

But this list should give you a solid start when it comes to considering which tools to implement to help you collect data on your website visitors.

The post The 15 Best Website Analytics Tools in 2025 appeared first on Alex Birkett.

What’s the Ideal A/B Testing Strategy?

Alex Birkett — Sun, 11 Oct 2020 20:28:56 +0000

A/B testing is, at this point, widespread and common practice.

Whether you’re a product manager hoping to quantify the impact of new features (and avoid the risk of negatively impacting growth metrics) or a marketer hoping to optimize a landing page or newsletter subject line, experimentation is the tried-and-true gold standard.

It’s not only incredibly fun, but it’s useful and efficient.

In the span of 2-4 weeks, you can try out an entirely new experience and approximate its impact. This, in and of itself, should allow creativity and innovation to flourish, while simultaneously capping the downside of shipping suboptimal experiences.

But even if we all agree on the value of experimentation, there’s a ton of debate and open questions as to how to run A/B tests.

A/B Testing is Not One Size Fits All

One set of open questions about A/B testing strategy is decidedly technical:

Which metric matters? Do you track multiple metrics, one metric, or build a composite metric?
How do you properly log and access data to analyze experiments?
Should you build your own custom experimentation platform or buy from a software vendor?
Do you run one-tailed or two-tailed T tests, bayesian A/B testing, or something else entirely (sequential testing, bandit testing, etc.)? [1]

The other set of questions, however, is more strategic:

What kind of things should I test?
What order should I prioritize my test ideas?
What goes into a proper experiment hypothesis?
How frequently should I test, or how many tests should I run?
Where do we get ideas for A/B tests?
How many variants should you run in a single experiment?

These are difficult questions.

It could be the case that there is a single, universal answers to these, but I personally doubt it. Rather, I think these answers can differ based on several factors, such as the culture of the company you work at, the size and scale of your digital properties, your tolerance for risk and reward, and your philosophy on testing and ideation. there’s some nuance based on the company you work at, where you are in terms of company size and resources, and your traffic and testing capabilities.

So this article, instead, will cover the various answers for how you could construct an A/B testing strategy — an approach at the program level — to drive consistent results for your organization.

I’m going to break this into two macro-sections:

Core A/B testing strategy assumptions
The three levers that impact A/B testing strategy success on a program level.

Here are the sections I’ll cover with regard to assumptions and a priori beliefs:

A/B testing is inherently strategic (or, what’s the purpose of A/B testing anyway?)
A/B testing always has costs
The value and predictability of A/B testing ideas

Then I’ll cover the three factors that you can impact to drive better or worse results programmatically:

Number of tests run
Win rate
Average win size per winning test

At the end of this article, you should have a good idea — based on your core beliefs and assumptions as well as the reality of your context — as to which strategic approach you should take with experimentation.

A/B Testing is Inherently Strategic

A/B testing is strategic in and of itself; by running A/B tests, you’re implicitly deciding that an aspect of your strategy is to spend the additional time and resources to reduce uncertainty in your decision making. A significance test is itself an exercise in quantifying uncertainty.

Image Source

" data-medium-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/2017-09-11-Statistical-Significance-P-Value-1.png?fit=300%2C220&ssl=1" data-large-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/2017-09-11-Statistical-Significance-P-Value-1.png?fit=820%2C600&ssl=1" class="size-full wp-image-1179" src="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/2017-09-11-Statistical-Significance-P-Value-1.png?resize=820%2C600&ssl=1" alt="" width="820" height="600" srcset="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/2017-09-11-Statistical-Significance-P-Value-1.png?w=820&ssl=1 820w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/2017-09-11-Statistical-Significance-P-Value-1.png?resize=300%2C220&ssl=1 300w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/2017-09-11-Statistical-Significance-P-Value-1.png?resize=768%2C562&ssl=1 768w" sizes="(max-width: 820px) 100vw, 820px" data-recalc-dims="1" />

Image Source

This is a choice.

One does not need to validate features as they’re shipped or copy as its written. Neither do you need to validate changes as you optimize a landing page; you can simply change the button color and move on, if you’d like.

So, A/B testing isn’t a ‘tactic,’ as many people would suggest. A/B testing is a research methodology at heart – a tool in the toolkit – but by utilizing that tool, you’re making a strategic decision that data will decide, to a large extent, what actions you’ll take on your product, website, or messaging (as opposed to opinion or other methodologies like time series comparison).

How you choose to employee this tool, however, is another strategic matter.

For instance, you don’t have to test everything (but you can test everything, as well).

Typically, there’s some decision criteria as what we test, how often, and how we run tests.

This can be illustrated by a risk quadrant I made, where low risk and low certainty decisions can be decided with a coin flip, but higher risk decisions that require higher certainty are great candidates for A/B tests:

Even with A/B testing, though, you’ll never achieve 100% certainty on a given decision.

This is due to many factors, including experiment design (there’s functionally no such thing as 100% statistical confidence) but also things like perishability and how representative your test population is.

For example, macro-economic changes could alter your audience behavior, rendering a “winning” A/B test now a loser in the near future.

A/B testing Always Has Associated Costs

There ain’t no such thing as free lunch.

On the surface, you have to invest in the A/B testing technology or at least the human resources to set up an experiment. So you have fixed and visible costs already with technology and talent. An A/B test isn’t going to run itself.

You’ve also got time costs.

An A/B test typically takes 2-4 weeks to run. The period that you’re running that test is a time period in which you’re not ‘exploiting’ the optimal experience. Therefore, you incur ‘regret,’ or the “difference between your actual payoff and the payoff you would have collected had you played the optimal (best) options at every opportunity.”

Image Source

" data-medium-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/regret.jpg?fit=300%2C150&ssl=1" data-large-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/regret.jpg?fit=840%2C420&ssl=1" class="size-full wp-image-1181" src="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/regret.jpg?resize=840%2C420&ssl=1" alt="" width="840" height="420" srcset="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/regret.jpg?w=900&ssl=1 900w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/regret.jpg?resize=300%2C150&ssl=1 300w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/regret.jpg?resize=768%2C384&ssl=1 768w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

This is related to but still distinct from another cost: opportunity costs.

Image Source

" data-medium-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/2017-11-14-AB-Test-Costs-Timeline.png?fit=300%2C84&ssl=1" data-large-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/2017-11-14-AB-Test-Costs-Timeline.png?fit=800%2C223&ssl=1" class="size-full wp-image-1182" src="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/2017-11-14-AB-Test-Costs-Timeline.png?resize=800%2C223&ssl=1" alt="" width="800" height="223" srcset="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/2017-11-14-AB-Test-Costs-Timeline.png?w=800&ssl=1 800w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/2017-11-14-AB-Test-Costs-Timeline.png?resize=300%2C84&ssl=1 300w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/2017-11-14-AB-Test-Costs-Timeline.png?resize=768%2C214&ssl=1 768w" sizes="(max-width: 800px) 100vw, 800px" data-recalc-dims="1" />

Image Source

The time you spent setting up, running, and analyzing an experiment could be spent doing something else. This is especially important and impactful at the startup stage, when ruthless prioritization is the difference between a sinking ship and another year above water.

An A/B test also usually has a run up period of user research that leads to a test hypothesis. This could include digital analytics analysis, on-site polls using Qualaroo, heatmap analysis, session replay video, or user tests (including Copytesting). This research takes time, too.

The expected value of an A/B test is the expected value of its profit minus the expected value of its cost (and remember, expected value is calculated by multiplying each of the possible outcomes by the likelihood each outcome will occur and then summing all of those values).

Image Source

" data-medium-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Decision-Model-Buy-Lottery-Ticket-Expected-Value.jpg?fit=300%2C82&ssl=1" data-large-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Decision-Model-Buy-Lottery-Ticket-Expected-Value.jpg?fit=650%2C178&ssl=1" class="size-full wp-image-1183" src="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Decision-Model-Buy-Lottery-Ticket-Expected-Value.jpg?resize=650%2C178&ssl=1" alt="" width="650" height="178" srcset="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Decision-Model-Buy-Lottery-Ticket-Expected-Value.jpg?w=650&ssl=1 650w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Decision-Model-Buy-Lottery-Ticket-Expected-Value.jpg?resize=300%2C82&ssl=1 300w" sizes="(max-width: 650px) 100vw, 650px" data-recalc-dims="1" />

Image Source

If the expected value of an A/B test isn’t positive, it’s not worth running it.

For example, if the average A/B test costs $1,000 and the average expected value of an A/B test is $500, it’s not economically feasible to run the test. Therefore, you can reduce the costs of the experiment, or you can hope to increase the win rate or the average uplift per win to tip the scales in your favor.

A/B testing is a tool used to reduce uncertainty in decision making. User research is a tool used to reduce uncertainty in what you test with the hope that what you test has a higher likelihood of winning and winning big. Therefore, you want to know the marginal value of additional information collected (which is a cost) and know when to stop collecting additional information as you hit the point of diminishing returns. Too much cost outweighs the value of A/B testing as a decision making tool.

This leads to the last open question: can we predict which ideas are more likely to win?

What Leads to Better A/B Testing Ideas

It’s common practice to prioritize A/B tests. After all, you can’t run them all at once.

Prioritization usually falls on a few dimensions: impact, ease, confidence, or some variation of these factors.

Impact is quantitative. You can figure out based on the traffic to a given page, or the number of users that will be affected by a test, what the impact may be.
Ease is also fairly objective. There’s some estimation involved, but with some experience you can estimate the cost of setting up a test in terms of complexity, design and development resources, and the time it will take to run.
Confidence (or “potential” in the PIE model) is subjective. It takes into account the predictive capabilities of the individual proposing the test. “How likely is it that this test will win in comparison to other ideas,” you’re asking.

How does one develop the fingerspitzengefühl to reliably predict winners? Depends on your belief system, but some common methods include:

Bespoke research and rational evidence
Patterns, competitor examples, historical data (also rational evidence)
Gut feel and experience

In the first method, you conduct research and analyze data to come up with hypotheses based on evidence you’ve collected. Forms of data collection tend to be from user testing, digital analytics, session replays, polls, surveys, or customer interviews.

Image Source

" data-medium-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_5.24.38_PM.png?fit=300%2C295&ssl=1" data-large-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_5.24.38_PM.png?fit=840%2C827&ssl=1" class="size-large wp-image-1184" src="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_5.24.38_PM.png?resize=840%2C827&ssl=1" alt="" width="840" height="827" srcset="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_5.24.38_PM.png?resize=1024%2C1008&ssl=1 1024w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_5.24.38_PM.png?resize=300%2C295&ssl=1 300w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_5.24.38_PM.png?resize=768%2C756&ssl=1 768w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_5.24.38_PM.png?w=1156&ssl=1 1156w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

Patterns, historical data, and inspiration from competitors are also forms of evidence collection, but they don’t presuppose original research is superior to meta-data collected from other websites or from historical data.

Here, you can group tests of similar theme or with similar hypotheses, aggregate and analyze their likelihood of success, and prioritize tests based on confidence using meta-analyses.

Image Source

" data-medium-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_5.26.21_PM.png?fit=300%2C273&ssl=1" data-large-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_5.26.21_PM.png?fit=840%2C765&ssl=1" class="size-large wp-image-1185" src="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_5.26.21_PM.png?resize=840%2C765&ssl=1" alt="" width="840" height="765" srcset="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_5.26.21_PM.png?resize=1024%2C932&ssl=1 1024w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_5.26.21_PM.png?resize=300%2C273&ssl=1 300w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_5.26.21_PM.png?resize=768%2C699&ssl=1 768w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_5.26.21_PM.png?w=1140&ssl=1 1140w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

For example, you could group a dozen tests you’ve run on your own site in the past year having to do with “social proof” (for example, adding micro-copy that says “trusted by 10,000 happy customers).

You could include data from competitors or from an experiment pattern aggregator like GoodUI. Strong positive patterns could suggest that, despite differences in context, the underlying idea or theme is strong enough to warrant prioritizing this test above others with weaker pattern-based evidence.

Patterns can also include what we call “best practices.” While we may not always quantify these practices through meta-analyses like GoodUI does, there are indeed many common practices that have been developed by UX experts and optimizers over time. [2]

Finally, some believe that you simply develop an eye for what works and what doesn’t through experience. After years of running tests, you can spot a good idea from a bad.

As much as I’m trying to objectively lay out the various belief systems and strategies, I have to tell you, I think the last method is silly.

As Matt Gershoff put it, predicting outcomes is basically a random process, so those who end up being ‘very good’ at forecasting are probably outliers or exemplifying survivorship bias (the same as covered in Fooled by Randomness by Nassim Taleb with regard to stock pickers)

Mats Einarsen adds that this will reward cynicism, as most tests don’t win, so one can always improve prediction accuracy by being a curmudgeon:

It’s also possible to believe that additional information or research does not improve your chance of setting up a winning A/B test, or at least not enough to warrant the additional cost in collecting it.

In this world of epistemic humility, prioritizing your tests based on the confidence you have in them doesn’t make any sense. Ideas are fungible, and anyway, you’d rather be surprised by a test you didn’t think would win than to validate your pre-conceived notions.

In this world, we can imagine ideas being somewhat random and evenly distributed, some winning big and some losing big, but most doing nothing at all.

This view has backing in various fields. Take, for instance, this example from The Mating Mind by Geoffrey Miller (bolding mine):

“Psychologist Dean Keith Simonton found a strong relationship between creative achievement and productive energy. Among competent professionals in any field, there appears to be a fairly constant probability of success in any given endeavor. Simonton’s data show that excellent composers do not produce a higher proportion of excellent music than good composers — they simply produce a higher total number of works. People who achieve extreme success in any creative field are almost always extremely prolific. Hans Eysenck became a famous psychologist not because all of his papers were excellent, but because he wrote over a hundred books and a thousand papers, and some of them happened to be excellent. Those who write only ten papers are much less likely to strike gold with any of them. Likewise with Picasso: if you paint 14,000 paintings in your lifetime, some of them are likely to be pretty good, even if most are mediocre. Simonton’s results are surprising. The constant probability-of-success idea sounds very counterintuitive, and of course there are exceptions to this generalization. Yet Simonton’s data on creative achieve are the most comprehensive ever collected, and in every domain that he studied, creative achievement was a good indicator of the energy, time, and motivation invested in creative activity.“

So instead of trying to predict the winners before you run the test, you throw out the notion that that’s even possible, and you just try to run more options and get creative in the options you’ll run.

As I’ll discuss in the “A/B testing frequency” section, this accords to something like Andrew Anderson’s “Discipline Based Testing Methodology,” but also with what I call the “Evolutionary Tinkering” strategy [3]

Either you can try to eliminate or crowd out lower probability ideas, which implies you believe you can predict with a high degree of accuracy the outcome of a test.

Or you can iterate more frequently or run more options, essentially increasing the probability that you will find the winning variants.

Summary on A/B testing Strategy Assumptions

How you deal with uncertainty is one factor that could alter your A/B testing strategy. Another one is how you think about costs vs rewards. Finally, how you determine the quality and predictability of ideas is another factor that could alter your approach to A/B testing.

As we walk through various A/B testing strategies, keep these things in mind:

Attitudes and beliefs about information and certainty
Attitudes and beliefs about predictive validity and quality of ideas
Attitudes about costs vs rewards and expected value, as well as quantitative limitations on how many tests you can run and detectable effect sizes.

These factors will change one or both of the following:

What you choose to A/B test
How you run your A/B tests, singularly and at a program level

What Are the Goals of A/B Testing?

One’s goals in running A/B tests can differ slightly, but they all tend to fall under one or multiple of these buckets:

Increase/improve a business metrics
Risk management/cap downside of implementations
Learn things about your audience/research

Of course, running an A/B test will naturally accomplish all of these goals. Typically, though, you’ll be more interested in one than the others.

For example, you hear a lot of talk around this idea that “learning is the real goal of A/B testing.” This is probably true in academia, but in business that’s basically total bullshit.

You may, periodically, run an A/B test solely to learn something about your audience, though this is typically done with the assumption that the learning will help you either grow a business metrics or cap risk later on.

Most A/B tests in a business context wouldn’t be run if there weren’t the underlying goal of improving some aspect of your business. No ROI expectation, no buy-in and resources.

Therefore, there’s not really an “earn vs learn” dichotomy (with the potential exclusion of algorithmic approaches like bandits or evolutionary algorithms); every test you run you’ll learn something, but more importantly, the primary goal is add business value.

So if we assume that our goals are either improvement or capping the downside, then we can use these goals to map onto different strategic approaches to experimentation.

The Three Levers of A/B Testing Strategy Success

Most companies want to improve business metrics.

Now, the question becomes, “what aspects of A/B testing can we control to maximize the business outcome we hope to improve?” Three things:

The number of tests (or variants) you run (aka frequency)
The % of winning tests (aka win rate)
The effect size of winning tests (aka effect size)

1. A/B testing frequency – Number of Variants

The number of variants you test could be number of A/B tests or the number of variants in an A/B/n test – and there’s debate between the two approaches here – but the goal of either is to maximize the number of “at bats” or attempts at success.

This can be for two reasons.

First, to cap the downside and manage risk at scale, you should test everything you possibly can. No feature or experience should hit production without first making sure it doesn’t worsen your business metrics. This is common in large companies with mature experimentation programs, such as booking.com, Airbnb, Facebook, or Microsoft.

Second, tinkering and innovation requires a lot of attempts. The more attempts you make, the greater the chance for success. This is particularly true if you believe ideas are fungible — i.e. any given idea is not special or more likely than any other to move the needle. My above quote from Geoffrey Miller’s “The Mating Mind” illustrated why this is the case.

Image Source

" data-medium-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Picture1.png?fit=214%2C300&ssl=1" data-large-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Picture1.png?fit=729%2C1024&ssl=1" class="size-large wp-image-1188" src="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Picture1.png?resize=729%2C1024&ssl=1" alt="" width="729" height="1024" srcset="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Picture1.png?resize=729%2C1024&ssl=1 729w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Picture1.png?resize=214%2C300&ssl=1 214w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Picture1.png?resize=768%2C1079&ssl=1 768w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Picture1.png?w=1144&ssl=1 1144w" sizes="(max-width: 729px) 100vw, 729px" data-recalc-dims="1" />

Image Source

Another reason for this approach is, according a shitload of studies (the appropriate scientific word for “a large quantity”) have shown that most A/B tests are inconclusive and the few wins tend to pay for the program as a whole, not unlike venture capital portfolios.

Take, for example, this histogram Experiment Engine (since acquired by Optimizely) put out several years ago:

Image Source

" data-medium-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/histogram-ecommerce-test-results-probability.png?fit=300%2C137&ssl=1" data-large-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/histogram-ecommerce-test-results-probability.png?fit=800%2C364&ssl=1" class="size-full wp-image-1189" src="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/histogram-ecommerce-test-results-probability.png?resize=800%2C364&ssl=1" alt="" width="800" height="364" srcset="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/histogram-ecommerce-test-results-probability.png?w=800&ssl=1 800w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/histogram-ecommerce-test-results-probability.png?resize=300%2C137&ssl=1 300w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/histogram-ecommerce-test-results-probability.png?resize=768%2C349&ssl=1 768w" sizes="(max-width: 800px) 100vw, 800px" data-recalc-dims="1" />

Image Source

Most tests hover right around that 0% mark.

Now, it may be the case that all of these tests were run by idiots and you, as an expert optimizer, could do much better.

Perhaps.

But this sentiment is replicated by both data and experience.

Take, for example, VWO’s research that found 1 out of 7 tests are winners. A 2009 paper pegged Microsoft’s win rate at about 1 out of 3. And in 2017, Ronny Kohavi wrote:

“At Google and Bing, only about 10% to 20% of experiments generate positive results. At Microsoft as a whole, one-third prove effective, one-third have neutral results, and one-third have negative results.”

I’ve also seen a good amount of research that wins we do see are often illusory; false positives due to improper experiment design or simply lacking in external validity. That’s another issue entirely, though.

Perhaps your win rate will be different. For example, if your website has been neglected for years, you can likely get many quick wins using patterns, common sense, heuristics, and some conversion research. Things get harder when your digital experience is already good, though.

If we’re to believe that most ideas are essentially ineffective, then it’s natural to want to run more experiments. This increases your chance of big wins simply due to more exposure. This is a quote from Nassim Taleb’s Antifragile (bolding mine):

“Payoffs from research are from Extremistan; they follow a power-law type of statistical distribution, with big, near-unlimited upside but, because of optionality, limited downside. Consequently, payoff from research should necessarily be linear to number of trials, not total funds involved in the trials. Since the winner will have an explosive payoff, uncapped, the right approach requires a certain style of blind funding. It means the right policy would be what is called ‘one divided by n’ or ‘1/N’ style, spreading attempts in as large a number of trials as possible: if you face n options, invest in all of them in equal amounts. Small amounts per trial, lots of trials, broader than you want. Why? Because in Extremistan, it is more important to be in something in a small amount than to miss it. As one venture capitalist told me: “The payoff can be so large that you can’t afford not to be in everything.”

Maximizing the number of experiments run also deemphasizes ruthless prioritization based on subjective ‘confidence’ in hypotheses (though not entirely) and instead seeks to cheapen the cost of experimentation and enable a broader swath of employees to run experiments.

The number of variants you test is capped by the amount of traffic you have, your resources, and your willingness to try out and source ideas. These limitations can be represented by testing capacity, velocity, and coverage.

Image Source

" data-medium-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_6.18.41_PM.png?fit=300%2C166&ssl=1" data-large-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_6.18.41_PM.png?fit=840%2C464&ssl=1" class="size-large wp-image-1190" src="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_6.18.41_PM.png?resize=840%2C464&ssl=1" alt="" width="840" height="464" srcset="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_6.18.41_PM.png?resize=1024%2C566&ssl=1 1024w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_6.18.41_PM.png?resize=300%2C166&ssl=1 300w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_6.18.41_PM.png?resize=768%2C424&ssl=1 768w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/10/Screen_Shot_2020-10-10_at_6.18.41_PM.png?w=1242&ssl=1 1242w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

Claire Vo, one of the sharpest minds in experimentation and optimization, gave a brilliant talk on this at CXL Live a few years ago:

2. A/B testing win rate

The quality of your tests matters, too. Doesn’t matter if you run 10,000 tests in a year if none of them move the needle.

While many people may think running a high tempo testing program is diametrically opposed to test quality, I don’t think that’s necessarily the case. All you need is to make sure your testing is efficient, your data is trustworthy, and you’re focusing on the impactful areas of your product, marketing, or website.

Still, if you’re focused on improving your win rate (and you believe you can predict the quality of ideas or improve the likelihood of success), it’s likely you’ll run fewer tests and place a higher emphasis on research and crafting “better” tests.

As I mentioned above, there are two general ways that optimizers try to increase their win rate: research and meta-analysis patterns.

Conversion research

Research includes both quantitative and qualitative research – surveys, heat maps, user tests and Google Analytics. One gathers enough data to diagnose what is wrong and potentially some data to build hypotheses as to why it is wrong.

See the “ResearchXL model” as well as mosts CRO agencies and in-house programs’ approach. This approach is what I’ll call the “Doctor’s Office Strategy.” Before you begin operating on a patient at random, you first want to take the time to diagnose what’s wrong with them.

Patterns, best practices, and observations

Patterns are another source of data.

You can find experiences that have been shown to work in other contexts and infer transferability onto your situation. Jakub Linowski, who runs GoodUI, is an advocate of this approach:

“There are thousands and thousands of experiments being run and if we just pay attention to all that kind of information and all those experiments, there’s most likely some things that repeat over and over that reproduced are largely generalizable. And those patterns I think are very interesting for reuse and exploitation across projects.”

Other patterns can be more qualitative. One can read behavioral psychology studies, Cialdini’s Influence, or just look at other company’s websites and take what they seem to be doing and try it on your own site.

Both the research and the patterns approach have this in common: they inherently belief that a certain quality and quantity of information you collect can lead to better experiment win rates.

Additionally, the underlying ‘why’ of a test (sometimes called the ‘hypothesis’) is very important in these strategies. In something like the Discipline-Based Testing Methodology, the narrative or the “why” doesn’t matter, only that it’s efficient and makes money. [4] [4.5]

3. Effect Size of A/B testing Wins

Finally, the last input is the effect size of a winning test. Patterns and research may help predict if a test will win, but not by how much.

This input, then, typically involves the most surprise and serendipity. It still requires that you diagnose the areas of exposure that have the highest potential for impact (e.g. running a test on a page with 1000 visitors is worse than running a test on a page with 1,000,000).

Searching for big wins also requires a bit of “irrational” behavior. As Rory Sutherland says, “Test counterintuitive things because no one else will!” [5]

The mark of a team working to increase the magnitude of a win is a willingness for trying out wacky, outside the box, creative ideas. Not only do you want more “at bats” (thus exposing yourself to more potential positive black swans), but you want to increase the beta of your options, or the diversity and range of feasible options you test. This is sometimes referred to as “innovative testing” vs. incremental testing. To continue the baseball analogy, you’re seeking home runs, not just grounders to get on base.

All of us want bigger wins as well as a greater win rate. How we go about accomplishing those things, though, differs.

CXL’s ResearchXL model seeks to maximize the likelihood of a winning test through understanding the users. Through research, one can hone in on high impact UX bottlenecks and issues with the website, and use further research to ideate treatments.

Andrew Anderson’s Discipline Based Testing Methodology also diagnoses high impact areas of the property, likely through quantitative ceilings. Though this approach ‘deconstructs’ the proposed treatments. Instead of taking research or singular experiences, this approach starts from the assumption that we don’t know what will work and that, in fact, being wrong is the best possible thing that can happen. As Andrew wrote:

“The key thing to think about as you build and design tests is that you are maximizing the beta (range of feasible options) and not the delta. It is meaningless what you think will win, it is only important that something wins. The quality of any one experience is meaningless to the system as a whole.

This means that the more things you can feasibly test while maximizing resources, and the larger the range you test, the more likely you are to get a winner and more likely to get a greater outcome. It is never about a specific test idea, it is about constructing every effort (test) to maximize the discovery of information.”

In this approach, then, you don’t just want to run more A/B tests; you want to run the maximum number of variants possible, including some that are potentially “irrational.” One can only hope that Comic Sans wins a font test, because we can earn money from the surprise.

Reducing the Cost of Experimentation Increases Expected Value, Always

To summarize, you can increase the value from your testing program in two ways: lower the cost, or increase the upside.

Many different strategies exist to increase the upside, but all cost reduction strategies look similar:

Invest in accessible technology
Make sure your data is accessible and trustworthy
Train employees on experimentation and democratize the ability to run experiments

The emphasis here isn’t primarily on predicting wins or win rate; rather, it’s on reducing the cost, organizationally and technically, of running experiments.

Sophisticated companies with data-driven culture usually have internal tools and data pipelines and centre of excellence programs that encourage, enable, and educate others to run their own experiments (think Microsoft, Airbnb, or booking.com)

When you seek to lower the cost of experimentation and run many attempts, I call that the “Evolutionary Tinkering Strategy.”

No one A/B tests will make or break you, but the process of testing a ton of things will increase the value of the program with time, and more importantly, will let you avoid shipping bad experiences.

This is different than the Doctor’s Office Strategy for two reasons: goals and resources.

Companies employing the Doctor’s Office Strategy are almost always seeking to improve business metrics, and they almost always have a very real upper limit on traffic. Therefore, it’s crucial to avoid wasting time and traffic testing “stupid” ideas (I use quotes because “stupid” ideas may end up paying off big, but it’s usually a surprise if so). [5]

The “get bigger wins” strategy is often employed due to both technical constraints (limited statistical power to detect smaller wins) and opportunity costs (small wins not worth it from a business perspective).

Thus, I’ll call this the “Growth Home Run Strategy.”

We’re not trying to avoid a strikeout; we’re trying to hit a home run. Startups and growth teams often operate like this because they have limited customer data to do conversion research, patterns and best practices tend to be implement directly and not tested, and opportunity costs mean you want to spend your time making bigger changes and seeking bigger results.

This approach is usually decentralized and a bit messier. Ideas can come from anywhere — competitors, psychological studies, research, other teams, strikes of shower inspiration, etc. With greater scale, this strategy usually evolves into the Evolutionary Tinkering Strategy as the company becomes more risk averse as well as capable of experimenting more frequently and broadly.

Conclusion

This was a long article covering all the various approaches I’ve come across from my time working in experimentation. But at the end of the journey, you may be wondering, “Great, but what strategy does Alex believe in?”

It’s a good question.

For one, I believe we should be more pragmatic and less dogmatic. Good strategists know the rules but are also fluid. I’m willing to apply the right strategy for the right situation.

In an ideal world, I’m inclined towards Andrew Anderson’s Discipline-Based Testing Methodology. This would assume I have the traffic and political buy-in to run a program like that.

I’m also partial to strategies that democratize experimentation, especially large companies with large test capacity. I see no value in gatekeeping experimentation to a single team or to a set of approved ideas that “make sense.” You’re leaving a lot of money on the table if you always want to be right.

If I’m working with a new client or an average eCommerce website, I’m almost always going to employ the ResearchXL model. Why? I want to learn about the client’s business, the users, and I want to find the best possible areas to test and optimize.

However, I would also never throw away best practices, patterns, or even ideas from competitors. I’ve frustratingly sat through hours of session replays, qualitative polls, and heat maps, only to have “dumb” ideas I stole from other websites win big.

My ethos: experimentation is the lifeblood of a data-driven organization, being wrong should be celebrated, and I don’t care why something won or where the idea came from. I’m a pragmatist and just generally an experimentation enthusiast.

Notes

[1]

How to run an A/B test is subject for a different article (or several, which I’ve written about in the past for CXL and will link to in this parahraph). I’ve touched on a few variations here, including the question of whether you should run many subsequent tests or one single A/B/n tests with as many variants as possible. Other technical test methodologies alter the accepted levels of risk and uncertainty. Such differences include one-tail vs two-tail testing, multivariate vs A/B tests, bandit algorithms or evolutionary algorithms, or flexible stopping rules like sequential testing. Again, I’m speaking to the strategic aspects of experimentation here, less so on technical differences. Though, they do relate.

[2]

Best practices are either championed or derided, but something being considered a “best practice” is just one more data input you can use to choose whether or not to test something and how to prioritize it. As Justin Rondeau put it, a “best practice” is usually just a “common practice,” and there’s nothing wrong with trying to match customers’ expectations. In the early stages of an optimization program, you can likely build a whole backlog off of best practices, which some call low hanging fruit. However, if something is so obviously broken that fixing it introduces almost zero risk, then many would opt to skip the test and just implement the change. This is especially true of companies with limited traffic, and thus, higher opportunity costs.

[3]

This isn’t precisely true. Andrew’s framework explicitly derides “number of tests” as an important input. He, instead, optimizes for efficiency and wraps up as many variants in a single experiment as is feasible. The reason I wrap these two approaches up is, ideologically at least, they’re both trying to increase the “spread” of testable options. This is opposed to an approach that seeks to find the “correct” answer before running the test, and then only using the test to “validate” that assumption

[4]

Do you care why something won? I’d like to argue that you shouldn’t. In any given experiment, there’s a lot more noise than there is signal with regard to the underlying reasons for behavior change. A blue button could win against a red one because blue is a calming hue and reduces cortisol. It could also win because the context of the website is professional, and blue is prototypically associated with professional aesthetic. Or perhaps it’s because blue contrasts better with the background, and thus, is more salient. It could be because your audiences like the color blue better. More likely, no one knows or can ever know why blue beat red. Using a narrative to spell out the underlying reason is more likely to lead you astray, not to mention waste precious time storytelling. Tell yourself too many stories, and you’re liable to limit the extent of your creativity and the options you’re willing to test in the future. See: narrative fallacy.

[4.5]

Do we need to have an “evidence-based hypothesis”? I don’t think so. After reading Against Method, I’m quite convinced that the scientific method is much messier than we were all taught. We often stumble into discoveries by accident. Rory Sutherland, for instance, wrote about the discovery of aspirin:

“Scientific progress is not a one-way street. Aspirin, for instance, was known to work as an analgesic for decades before anyone knew how it worked. It was a discovery made by experience and only much later was it explained. If science didn’t allow for such lucky accidents, its record would be much poorer – imagine if we forbade the use of penicillin, because its discovery was not predicted in advance? Yet policy and business decisions are overwhelmingly based on a ‘reason first, discovery later’ methodology, which seems wasteful in the extreme.”

More german to A/B testing, he summarized this as follows:

“Perhaps a plausible ‘why’ should not be a pre-requisite in deciding a ‘what,’ and the things we try should not be confined to those things whose future success we can most easily explain in retrospect.”

[5]

An Ode to “Dumb Ideas”

“To reach intelligent answers, you often need to ask really dumb questions.” – Rory Sutherland

Everyone should read Alchemy by Rory Sutherland. It will shake up your idea of where good ideas (and good science) comes from.

Early in the book, Sutherland tells of a test he ran with four different envelopes used by a charity to solicit donations. They randomize the delivery of four different sample groups: 100,000 announce that the envelopes had been delivered by volunteers, 100,000 encouraged people to complete a form that meant their donation would be boosted by a 25% tax rebate, 100,000 were in better quality envelopes, and 100,000 were in portrait format. The only “rational” one of these was the “increase donation by 25%'” option, yet that reduced contributions by 30% compared to the plain control. The other three tests increased donations by over 10%.

As Sutherland summarized:

“To a logical person, there would have been no point in testing three of these variables, but they are the three that actually work. This is an important metaphor for the contents of this book: if we allow the world to be run by logical people, we will only discover logical things. But in real life, most things aren’t logical – they are psycho-logical.”

The post What’s the Ideal A/B Testing Strategy? appeared first on Alex Birkett.

The 7 Pillars of Data-Driven Company Culture

Alex Birkett — Fri, 31 Jul 2020 17:25:58 +0000

“Data-driven culture” is a phrase you hear thought leaders speak about at conferences and executives fondly bestow upon their organizations. But like “freedom,” “morality,” and “consciousness,” this elusive phrase seems to evade universal understanding.

That’s to say: what the hell does a “data-driven company culture” even mean?

What is a “Data-Driven Company Culture,” Anyway?

A data-driven company, in simple terms, is a company whose implicit hierarchy of values leads individuals within the company to made decisions using data. (1)

Now, there’s a lot of nuance here.

What kind of data? Who gets to use data and make decisions? Which decisions are made with data — all of them?

How Data-Driven Companies Cross the Street

Imagine I’m crossing the street, and I need to use some input or inputs to determine when and how to cross.

I could be data-driven by looking at any single data point and using that to anchor (or justify) my decisions. For instance, maybe my data point is what color the light is (green means I go, red means I wait).

I could also be data-driven by including further variables such as the speed and direction of the wind, the position of the sun, the color of the eyes of the people on the other side of the street, or perhaps most importantly, whether or not there is a vehicle careening into the intersection and putting my street crossing in danger.

Perhaps I’m not the only one crossing the street, and in fact, I’ve got to consult with a small group of friends about when we decide to cross. We each contribute our various data points as well as a heavy dose of persuasion and storytelling to convince the group of our idea on when to cross. Only when we reach an agreement do we cross the street.

Or maybe one friend of mine has much more experience crossing streets, so he takes in his data points and blends that with his experience in order to come to a conclusion. In this case, I just follow the directions of my wise friend and hope that his leadership is truly driven by good data (and not something whimsical or poorly structured, such as his being driven by the desire to get the the destination as fast as possible without regard for data points like incoming traffic).

Now crossing the street is starting to resemble Dilbert cartoon.

I could also use data to consider which street I want to cross in the first place. If I want to get to my gym on 45th street, it doesn’t make much sense crossing a street in the other direction, even if the weather is pleasant and the street is empty.

So I say this: there’s no unified definition of a ‘data-driven company’ — it means something different to everyone.

Airbnb leads by design by clearly runs tons of experiments as well. Google famously tested 41 shades of blue. Booking.com lets every single employee run website and product experiments and they’ve built an internal database so anyone can search archived experiments. Your local startup might consider it data-driven that they talk to customers before shipping features; and they’d be right. Any of these can be called ‘data-driven.’

While that leads us to an impasse and a sense of cultural relativism (who’s to critique another’s data-driven culture?!), I believe some companies are deluding themselves and their employees when they say they’re ‘data-driven.’ (2)

There are certain pillars a true data-driven company must have in order to implicitly and explicitly elevate data-driven decision making to the most revered importance in a culture.

The 7 Pillars of a Data-Driven Company Culture

There are two types of ‘data-driven companies’ – those who say they’re data-driven and those who actually are.

In fake data-driven companies:

Decisions are made top down by HiPPOs
Data is used to justify decisions, never to invalidate or disprove preconceived notions
Data integrity is never questions and validity is presumed in all cases
Dashboards, reports, and charts are used for storytelling and success theater, not to drive decisions or improve decision making

I asked Andrew Anderson about what makes a company truly data-driven vs. fake data-driven, and he explained well what most companies mean:

“What most companies mean when they say they are “data driven” is that they have analytics/Business Intelligence (BI) and that they grab data to justify any action they want to do. It is just another layer on top of normal business operations which is used to justify actions. In almost all cases the same people making the decisions then use whatever data they can manipulate to show how valuable their work was.

In other words data is used as a shield for people to justify actions and to show they were valuable.”

So what’s a real data-driven culture look like? In my opinion, you need these pillars in place:

Ensure Data is Precise, Accessible, and Trustworthy
Invest in Data Literacy for Everyone
Define Key Success Metrics
Kill Success Theater
Be Comfortable with Uncertainty (Say “I Don’t Know”)
Build and Invest in Tooling
Empower Autonomy and Experimentation

Andrew explains further:

“Actual data-driven culture is one where data is used as a measure of all possible actions. Teams are driven by how many options they can execute on, how they use resources, and how big of a change they can create to predetermined KPIs. It is used as a sword to cut through opinion and “best practices” and people are measured based on how many actions they cut through and how far they move the needle.”

Now let’s walk through each of these data-driven company pillars.

1. Ensure Data is Precise, Accessible, and Trustworthy

As with many areas of life, the fundamentals are what matters. And if you can’t trust your data quality, it’s totally worthless.

This is true both directly and indirectly.

Directly, if you don’t have data precision (as opposed to data accuracy, a pipe dream), your data-driven decisions will be hindered because of that (worse yet, you’ll be driving highly confidently in the wrong direction because of the use of data. At least with opinions you have to admit epistemic humility).
Indirectly, imprecise data erodes cultural trust in data-driven decisions, so with time your company will revert to an opinion-driven hierarchy.

Precise data is one facet in the foundational layer of a good data culture, but you also want to have complete data. If, for instance, you can only track the behavior of a subset of users, your decisions will be based on a sampling bias, and thus still may lead you to poorer decisions.

Finally, data access: data-driven companies have accessible data. Now, there’s a whole field of data management or data governance that seeks to delineate responsibility for data infrastructure. Perhaps not everyone should be able to write new rows to a database, but in my opinion everyone should be able to query it.

Beyond that, accessing data should be made as clear and straightforward as possible. Large companies especially should look into data cataloging, good infrastructure resources, and data literacy.

2. Invest in Data Literacy for Everyone

CFO asks CEO: “What happens if we invest in developing our people and they leave us?”

CEO: “What happens if we don’t, and they stay?”

While hiring deeply trained specialists can help spur data-driven decision making, in reality you want everyone who is using data to understand how to use it.

Most data malpractices are probably not done out of malevolence, but rather ignorance. Without proper education and data literacy training, you can only fault the organization for such a heterogenous distribution of data skills in the company.

For example, in an HBR article titled “Building a Culture of Experimentation,” Stefan Thomke explains how Booking.com educates everyone at the company and empowers them to run experiments by putting new hires through a rigorous onboarding process which includes experimentation training (in addition to giving them access to all testing tools).

In the same article, he covered IBM’s then head of marketing analytics, Ari Sheinkin, who brought the company from running only 97 experiments in 2015 to running 2,822 in 2018.

How’d they make the change? In addition to good tooling, it was a lot of education and support:

“He installed easy-to-use tools, created a center of excellence to provide support, introduced a framework for conducting disciplined experiments, offered training for everyone, and made online tests free for all business groups. He also conducted an initial ‘testing blitz’ during which the marketing units had to run a total of 30 online experiments in 30 days. After that he held quarterly contests for the most innovative or most scalable experiments.”

Data-driven companies invest in education for their employees. I know anecdotally that Facebook, at least at one point in time, put their growth employees through a rigorous data analytics training during onboarding. And the famous example here is Airbnb, who runs Data University to train employees in the data-driven arts.

Image Source

" data-medium-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/0_i-3go7U8QxK70wf8.png?fit=300%2C168&ssl=1" data-large-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/0_i-3go7U8QxK70wf8.png?fit=840%2C470&ssl=1" class="size-large wp-image-1143" src="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/0_i-3go7U8QxK70wf8.png?resize=840%2C470&ssl=1" alt="" width="840" height="470" srcset="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/0_i-3go7U8QxK70wf8.png?resize=1024%2C573&ssl=1 1024w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/0_i-3go7U8QxK70wf8.png?resize=300%2C168&ssl=1 300w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/0_i-3go7U8QxK70wf8.png?resize=768%2C430&ssl=1 768w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/0_i-3go7U8QxK70wf8.png?w=1120&ssl=1 1120w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

3. Define Key Success Metrics

Even if you have all the data you could care to access and everyone knows how to use it, people can pull vastly different conclusions from the same data if you haven’t defined your desired outcomes.

In specific instances, this can muddy the results of an A/B test. Imagine, for instance, that you run a test on a landing page flow that walks through three pages: a pricing page to a signup page and then a thank you page.

You change a variable on the pricing page and you want to track that through to increase overall signups, measured by users that reach the thank you page.

Because you want a ‘full picture’ of the data, you log multiple metrics in addition to “conversions” (or users who reached the thank you page). These include bounce rate, click through rate on the pricing page, session duration, and engagement rate on the signup page.

The experiment doesn’t lift conversions, but it lifts click through rate. What do you do?

Or it does lift conversions, but bounce rate actually increases. Does this mean it messed up the user experience?

This muddiness is why you must, before you run the experiment, define an Overall Evaluation Criterion. In other words, what metric will ultimately decide the fate of the experiment?

In broader contexts, many teams can have different incentives, sometimes piecemeal towards a similar end goal (like believing increasing traffic or CTR will downstream increase conversions), but sometimes goals are diametrically opposed. In the latter case, you’ll waste more time and energy figuring out which way to go instead of actually making progress in that direction. What you want to do is define your key success metrics and align your vectors in a way that everyone is working towards the same goals and has clear indications of progress towards them.

Image Source

" data-medium-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/0_HQDGBaYC60CGixJt-1.jpg?fit=300%2C169&ssl=1" data-large-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/0_HQDGBaYC60CGixJt-1.jpg?fit=840%2C473&ssl=1" class="size-large wp-image-1147" src="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/0_HQDGBaYC60CGixJt-1.jpg?resize=840%2C473&ssl=1" alt="" width="840" height="473" srcset="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/0_HQDGBaYC60CGixJt-1.jpg?resize=1024%2C576&ssl=1 1024w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/0_HQDGBaYC60CGixJt-1.jpg?resize=300%2C169&ssl=1 300w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/0_HQDGBaYC60CGixJt-1.jpg?resize=768%2C432&ssl=1 768w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/0_HQDGBaYC60CGixJt-1.jpg?w=1400&ssl=1 1400w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

4. Kill Success Theater

A culture that celebrates nothing would be soulless; a culture that only celebrates and talks about success is insidious and subtly toxic.

Success theater is, at its core, an informal operating system that says to employees: “you’re expected to win, and you should only discuss wins. Failures need not be exemplified.”

What happens when employees aren’t incentivized to honestly share negative news or results? A cascading torrent of bad stuff:

You limit innovation and creativity due to fear of failure.
You cover up potentially disruptive and damaging problems in order to save face.
You incentive data cherry-picking and intellectual dishonesty, which erodes cultural trust in data and each other.
You cut corners and make poor long term decisions (or even unethical decisions) in order to hit your numbers.

Again, don’t fear the champagne, but don’t punish the messenger if you see numbers that don’t look great.

Further, stop incentivizing everyone to be right and successful 100% of the time. Your deepest learnings and biggest light bulb moments come from shock, surprise, disappointment, and being “wrong.” Embrace it. The best data-driven companies would never expect to bat .1000.

5. Be Comfortable with Uncertainty (Say “I Don’t Know”)

The opposite of a data-driven culture is one where the decision-making process is driven by HiPPOs (or worse, committee).

In “Building a Culture of Experimentation,” Stefan Thomke wrote of a radical experimentation idea at Booking.com: redesigning the entire home page. This excerpt says it all (bolding is mine):

“Gillian Tans, Booking.com‘s CEO at the time, was skeptical. She worried that the change would cause confusion among the company’s loyal customers. Lukas Vermeer, then the head of the firm’s core experimentation team, bet a bottle of champagne that the test would ‘tank’ — meaning it would drive down the company’s critical performance metric: customer conversion, or how many website visitors made a booking. Given that pessimism, why didn’t senior management just veto the trial? Because doing so would have violated one of Booking.com‘s core tenets: Anyone at the company can test anything — without management’s approval.”

Some companies want you to know up front what’s going to work and what isn’t. They won’t run an experiment if there’s not a valid reason or ‘evidence’ that suggests it has high probability of winning. Similarly, you should know ahead of the experiment which segment you want to send a personalized experience to and what the content should look like.

If this is the case, you’re leaving a lot of revenue on the table by avoiding the ‘discovery’ or ‘exploration’ phase of experimentation and data-driven decision making. In pursuit of “evidence-based decision making,” we forget that we don’t always have historical data to support or refute a case, and if we do, it doesn’t always extrapolate the situation at hand.

Most of the time, we fear the discovery phrase because the “wrong” result might win. But as Andrew Anderson wrote of personalization, “Be open to permutation winning that you never thought of. Being wrong is always going to provide the greatest return.”

Another quote I loved from the HBR article on experimentation culture:

“Everyone in the organization, from the leadership on down, needs to value surprises, despite the difficulty of assigning a dollar figure to them and the impossibility of predicting when and how often they’ll occur. When firms adopt this mindset, curiosity will prevail and people will see failures not as costly mistakes but as opportunities for learning.”

In the same article, David Vismans, CPO at Booking.com, warns that if you don’t value being wrong you’re unlikely to successfully maintain a data-driven culture:

“You need to ask yourself two big questions: How willing are you to be confronted every day bu how wrong you are? And how much autonomy are you willing to give to the people who work for you? and if the answer is that you don’t like to be proven wrong and don’t want employees to decide the future of your products, it’s not going to work. You will never reap the full benefits of experimentation.”

The ability to say “I don’t know” and embrace being wrong is the mark of a strong leader.

6. Build and Invest in Tooling

Tools are nothing without the human resources to manage them and the knowledge and education to use them.

However, you need tools, too.

For example, without an experimentation platform, how many tests can you feasibly run per year? Even if you’re hard coding tests ad-hoc each time and have the technical resources to do so, you’re clearly going to miss out on marketing experiments.

Infrastructure is massively important when it comes to data integrity, accessibility, and decision making. That HBR article on experimentation culture explains that any employee at Booking.com can launch an experiment on millions of customers without management’s permission. They say roughly 75% of its 1,800 technology and product staffers use actively run experiments.

How do they accomplish this? Making tools that are easy to use by everyone:

“Scientifically testing nearly every idea requires infrastructure: instrumentation, data pipelines, and data scientists. Several third-party tools and services make it easy to try experiments, but to scale things up, senior leaders must tightly integrate the testing capability into company processes…

…Standard templates allow them to set up tests with minimal effort, and processes like user recruitment, randomization, the recording of visitors’ behavior, and reporting are automated.”

In addition to the structural tools needed to run and analyze experiments, I admire their commitment to openness and knowledge sharing. For that, they’ve built a searchable repository of past experiments with full descriptions of successes, failures, iterations, and final decisions.

7. Empower Autonomy and Experimentation

At the end of the day, data analysis is a research tool for reducing uncertainty and making better decisions that improve future outcomes. Experimentation is one of the best ways to do that.

Not only do you cap your downside by limiting the damage of a bad variant, but that risk mitigation also leads to increased creativity and therefore innovation.

If you’re able to test an idea with little to no downside, theoretically that means more and better ideas will eventually be tested.

If you’re able to decouple what you test from the clusterfuck of meetings, political persuasion and cajoling, and month long buy-in process that usually precedes any decision at a company, then you’ll also ship faster.

This makes your company both more efficient and more effective. In essence, you’ll ship less bad stuff and more good stuff, reducing losses from bad ideas and exploiting gains from good ones.

No one can predict which good ideas are good and which bad ideas are bad with certainty. Most of us are no better than a coin flip (and those with better odds should re-read Fooled by Randomness lest they get too confident)

Say before each A/B test you run, you flip some coins to estimate the outcome. You collect a year’s worth of data and find a handful of coins are very good at forecasting which tests will perform well.

— Jason Nochlin (@jasonnochlin) July 17, 2020

Experimentation solves that, but culturally, it also raises the average employee’s decision making ability to the level of an executive’s by way of the great equalizer: the hypothesis test.

That’s scary for most and exciting for some, which is why everyone talks about A/B testing but very few fully embrace it.

To do so would effectively devalue the years of experience that have presumably up to this point meant that your judgement was worth much more than others’ judgement. In an A/B test, it doesn’t matter which variant you thought was going to win, it just matters what value you’re able to derive from an experiment, and how that hits the top line.

Image Source

" data-medium-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/191021.innovative.jpg?fit=300%2C214&ssl=1" data-large-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/191021.innovative.jpg?fit=840%2C600&ssl=1" class="size-large wp-image-1148" src="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/191021.innovative.jpg?resize=840%2C600&ssl=1" alt="" width="840" height="600" srcset="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/191021.innovative.jpg?resize=1024%2C731&ssl=1 1024w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/191021.innovative.jpg?resize=300%2C214&ssl=1 300w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/191021.innovative.jpg?resize=768%2C548&ssl=1 768w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/191021.innovative.jpg?w=1200&ssl=1 1200w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

As a director at Booking.com said in that wonderful HBR article, “If the test tells you that the header of the website should be pink, then it should be pink.”

Unlike the other pillars I’ve listed in this article, this one isn’t actually about the technical capabilities or even the educational resources you’ve built. It’s about letting go of the need to control every decision by nature of opinion, judgement, and conjecture, and instead empowering employees to run experiments and to let the data lead you to an answer (ahem, to be “data-driven” is to drive with data).

Obviously, you can still choose what to test and you can encase your experiments within principles. For example, dark patterns may win tests, but you can set up rules that state not to test dark patterns in the first place.

If it accords to your principles, though, it’s fair game. I would guide you not to limit the scope of options too much. Quote from the HBR article:

“Many organizations are also too conservative about the nature and amount of experimentation. Overemphasizing the importance of successful experiments may encourage employees to focus on familiar solutions or those that they already know will work and avoid testing ideas that they fear might actually fail. And it’s actually less risky to run a large number of experiments than a small number.”

One of my favorite illustrations of this is Andrew Anderson’s story where he ran a font style test. You’ll never guess which font won.

Image Source

" data-medium-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/you-are-now-thinking-in-comic-sans-78331-500-424-300x254.jpg?fit=300%2C254&ssl=1" data-large-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/you-are-now-thinking-in-comic-sans-78331-500-424-300x254.jpg?fit=300%2C254&ssl=1" class="size-full wp-image-1144" src="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/you-are-now-thinking-in-comic-sans-78331-500-424-300x254.jpg?resize=300%2C254&ssl=1" alt="" width="300" height="254" data-recalc-dims="1" />

Image Source

As Andrew explained:

“From an optimization standpoint, Comic Sans could just as easily be called “Font Variant #5,” but because we all have a visceral hatred of Comic Sans and that does not mesh with our notions of aesthetic beauty, good design, or professional pages, we must come up with an explanation to our cognitive dissonance.

Is there anything inherently wrong with comic sans? No. But from a design perspective it challenges the vision of so many. Did testing make comic sans the better option? No. It just revealed that information to us and made us face that knowledge head-on.

If you are testing in the most efficient way possible, you are going to get these results all the time.”

In any case, I won’t pressure you to test comic sans. If you hate comic sans, don’t test it. But the point here is that a culture of experimentation is the true data-driven culture.

Conclusion

There are gradations of maturity with regards to data-driven company cultures, but the basics need to be in place: if you can’t access trustworthy data, you can’t make data-driven decision. And if data is overridden by the opinions of tenured executives, what value is it to your company? Other than providing cover for the opinions of HiPPOs, of course.

I want to sum up with what I think is a great definition of a data-driven culture from Andrew Anderson:

“In a true data driven organization the team focuses on what the measure is they want to change. They come up with different actions that can be done to impact it, they then align resources around what can accomplish the most ways to accomplish that action. They then measure each way against each other and the best performer is picked. They then continue to align resources and re-focus after each action. There is no single person picking the action nor is there the same person measuring success. Everyone can have an idea and whatever performs best wins, no matter who backed it or what they are trying to do politically.”

First off, what do we mean when we say “company culture?”

Highest level definition from BuiltIn.com: “Company culture can be defined as a set of shared values, goals, attitudes and practices that characterize an organization.”

However, I don’t think this adequately describes it.

Culture is the implicit hierarchy of value in an organization. It’s the unwritten handbook of what behaviors are rewarded and admired within a company.

Some companies reward collaboration and treating coworkers like family. Some reward dry language and the absence of personality from conversation (no happy hours here). Some reward cajoling, persuasion, slide decks, and storytelling, and some reward data.

Most importantly, culture is restrictive and limiting. I love how Mihaly Csikszentmihalyi put it in Flow:

“Cultures are defensive constructions against chaos designed to reduce the impact of randomness on experience. They are adaptive responses, just as features are for birds and fur is for mammals. Cultures prescribe norms, evolve goals, build beliefs that help us tackle the challenges of existence. In so doing, they must rule out many alternative goals and beliefs, and thereby limit possibilities; but this channeling of attention to a limited set of goals and means is what allows effortless action within self-created boundaries.”

So just as much as what is rewarded, a company culture can be defined by what it explicitly outlaws as well as what it subtly frowns upon and discourages. Just to be incredibly clear, if your company frowns upon experimentation, you don’t have a data-driven company or culture.

“The Lady Doth Protest Too Much”

Most companies that are actually data-driven don’t incessantly and loudly talk about how data-driven they are. Just as a rich many doesn’t need to tell you he’s rich, be very wary of companies whose HR and advertising materials overly emphasize a certain cultural trait, whether that’s transparency, data-driven decision making, or creativity. Be particularly wary of anyone in a suit talking loudly about big data, data science, advanced analytics, artificial intelligence, or *shudder* digital transformation.

While there is some signal in this messaging (at the very least, it says something that they’re aspiring to these things), it’s often a bigger sign that the company is perhaps striving towards that trait but absent of it presently. This is especially rampant of bleeding edge companies who spend a lot of time speaking at or attending conferences. This puts them in a position to say the right words and phrases to attract good talent without actually developing or investing in a culture that enables those behaviors.

Caveat emptor. Talk is cheap.

The post The 7 Pillars of Data-Driven Company Culture appeared first on Alex Birkett.

Data Literacy: 10 Lesser Known But Super Important Concepts To Know

Alex Birkett — Tue, 07 Jul 2020 15:44:51 +0000

You and your team could be getting much more out of your data.

The purpose of data, after all, is to make better business decisions through the reduction of uncertainty.

But data has tons of limitations, and by itself has no inherent value. You need a human to interpret the data and determine a course of action.

What is Data Literacy?

Data literacy is such a simple concept: it’s being literate when it comes to reading, understanding, creating, and analyzing data. Basically, it means you know how to work with data well.

Again, data by itself isn’t valuable. A real life human with data skills needs to work with it to make better decisions.

It’s not just for data scientists or a chief data officer (cdo). Every decision maker and anyone working with data (isn’t that everyone nowadays?!) should be data literate.

Your specific data literacy index could vary depending on the needs of the job. So data literacy training could include concepts like data analytics (understanding the tool itself or the underlying data structures), data visualizations, business intelligence tools, advanced data science, or perhaps just understanding the language of data to communicate with business leaders.

There are tons of pitfalls in the process of using and understanding data. Reduce your frustrations and data malpractices, and increase your data literacy skills, with the following less known and underrated concepts.

Understand database querying and their underlying infrastructure
Know and understand different data types
The Utility vs. Precision trade-off
Bias vs. Variance
Signal vs. Noise
Correlation vs. Causation
Correlation vs. Correlation
Narrative Fallacies and Common Biases
Twyman’s Law
Leaky Abstractions in Analytics Tools

1. Understand database querying and underlying infrastructure

Understanding basic database architecture and querying (SQL) concepts will help you universally, no matter which specific analytics tool you’re using at your company.

First off, if your company is of a certain size, this should be managed by a specific team through data cataloging. It shouldn’t be, though it often is, a multi-day search to find basic data tables in some organizations.

Also, databases aren’t scary (nor is SQL, the common database querying language). If you’ve worked in spreadsheets, you already understand many of the basic concepts. A VLOOKUP in Excel is kind of like a JOIN in SQL. A Pivot Table lets you GROUP BY different variables. Filters are like a WHERE function in SQL. Formulas of course let you SUM, COUNT, etc. — all of which you can do with SQL as well.

Image Source

" data-medium-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/Screen-Shot-2020-07-07-at-10.31.25-AM.png?fit=300%2C92&ssl=1" data-large-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/Screen-Shot-2020-07-07-at-10.31.25-AM.png?fit=840%2C257&ssl=1" class="size-large wp-image-1127" src="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/Screen-Shot-2020-07-07-at-10.31.25-AM.png?resize=840%2C257&ssl=1" alt="" width="840" height="257" srcset="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/Screen-Shot-2020-07-07-at-10.31.25-AM.png?resize=1024%2C313&ssl=1 1024w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/Screen-Shot-2020-07-07-at-10.31.25-AM.png?resize=300%2C92&ssl=1 300w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/Screen-Shot-2020-07-07-at-10.31.25-AM.png?resize=768%2C235&ssl=1 768w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/Screen-Shot-2020-07-07-at-10.31.25-AM.png?w=1362&ssl=1 1362w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

To start, take a quick SQL course. There are free ones online from Udacity and Codecademy, or you can do a quick day course at General Assembly or something. Helps to learn the syntax and get some practice, but eventually you’ll want to query real life databases.

I find theory and practice datasets can hit quick diminishing returns, and eventually you’ve gotta play with some skin in the game.

Beyond databases and generic SQL knowledge, I recommend taking a course or reading the documentation on your analytics tool of choice. If your organization uses Adobe Analytics, it would help to really understand how that data is collected and surfaced. I dove deep in Google Analytics and it has helped me a bunch at HubSpot (and now with agency clients as well).

Knowing the differences in Google Analytics Scopes will let you understand, for example, why your “All Pages” report in Google Analytics (hit-scoped) doesn’t show conversion rates (which are session-scoped).

Knowing the difference in how a user, session, hit, etc. are calculated matters a lot for how you interpret that data. Best GA course to take is from CXL Institute, hands down.

2. Know Different Data Types

Knowing how data is stored is great, but even better is to grok the core underlying data types.

At a high level, data types branch off into either:

Categorical data
Numerical data

Categorical data represents characteristics. They can also be represented numerically (think 1 for a dog and 0 for a cat, or something like that).

Categorical data splits off into two more branches:

Nominal
Ordinal

Nominal data are discrete variables that don’t have quantitative value (“which college did you attend” would create nominal categorical data)

Ordinal data represents discrete and ordered units (same thing as nominal, but order matters — think “education level” and values like “high school,” “bachelor’s,” “master’s” etc)

Image Source

" data-medium-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/datatypes.png?fit=300%2C189&ssl=1" data-large-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/datatypes.png?fit=840%2C530&ssl=1" class="size-full wp-image-1128" src="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/datatypes.png?resize=840%2C530&ssl=1" alt="" width="840" height="530" srcset="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/datatypes.png?w=941&ssl=1 941w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/datatypes.png?resize=300%2C189&ssl=1 300w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/datatypes.png?resize=768%2C485&ssl=1 768w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

Numerical data can be discrete (conversion rate, or number of heads in 100 coin flips) or continuous variables (things like height or temperature — these can be further broken up into ratio or interval variables). The difference here can be confusing so here’s a post to explain.

Why does this matter? If you ever plan on designing a customer survey or doing cursory exploratory data analysis, you’ll benefit from understanding basic data types.

That’s the academic stance, but in a more pragmatic stance, analytics tools (including programming languages like R and Python and tools like Excel) also have distinct data types, and these matter for both formatting and analysis.

For example, R has 6 main data types:

character: “a”, “swc”
numeric: 2, 15.5
integer: 2L (the L tells R to store this as an integer)
logical: TRUE, FALSE
complex: 1+4i (complex numbers with real and imaginary parts)

If you don’t understand data types, you’ll get a lot of “error” messages when you start coding.

3. Utility vs. Precision Tradeoff

Data analysis is a method by which we attempt to reduce uncertainty, and uncertainty can never be completely eliminated.

Therefore, there comes a point of diminishing marginal utility with increased precision of the model.

In other words, the more data you collect or the more accurate the data matters up to a certain point, and beyond that, the value of increased data or increased accuracy matters very little but the cost of collecting the data or increasing the precision becomes very high.

For example, an A/B test is the gold standard of causal inference, but you can never be completely 100% confident that a variant is truly better than the original.

That’s why we use inferential statistics (and why you should grok what a P Value means).

Running an A/B test for 2-4 weeks and with a P Value threshold of <.05 tends to be an acceptable value of uncertainty vs the hyperbolic flip side of just running an experiment for eternity (which costs you in “regret” — the value you missed out on by not exploiting the optimal variant more quickly).

There’s a point at which “more data” is not only not beneficial, but may harm you.

4. Bias vs. Variance

This is a cool concept from Machine Learning, very related to the modeling trade-off above between the overly simple and the overly precise.

This is from TowardsDataScience:

“What is bias?

Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data.

What is variance?

Variance is the variability of model prediction for a given data point or a value which tells us spread of our data. Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but has high error rates on test data.”

There’s always a trade-off, no perfect model (“all models are wrong; some are useful”). One favors simplicity and has a high rate of Type I errors, the other overcomplicates things as misses things it should include (Type II errors)

5. Signal vs. Noise

Personally, I believe most data we look at is actually noise.

While humans constantly seek patterns and explanations, I think more of what we see can be described by randomness than we’d imagine. Whether regression to the mean, seasonality (misunderstood and over-invoked, by the way), Type I (or Type II) errors, or just a tracking error, a lot of what looks peculiar (especially when we’re seeking and incentivized to seek peculiar “insights”) is just noise.

Your goal is to find the signal, data that helps inform action and better decision making.

This is easier in some contexts than others.

For example, running a controlled experiment on a landing page with only paid acquisition traffic is a highly controlled environment, and you can use tools like statistical significance, statistical power, and confidence intervals to infer whether or not what you’re seeing is signal or noise.

Tracking time-series variables across a website and with little experience parsing out seasonality and random factors, you’re constantly at the whim of noise. This leads you to see patterns where they may not exist, and at the worst case, that leads you to take actions or make decisions based on illusions.

Image Source

" data-medium-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/Screen-Shot-2020-07-07-at-10.38.11-AM.png?fit=300%2C187&ssl=1" data-large-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/Screen-Shot-2020-07-07-at-10.38.11-AM.png?fit=840%2C525&ssl=1" class="size-large wp-image-1129" src="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/Screen-Shot-2020-07-07-at-10.38.11-AM.png?resize=840%2C525&ssl=1" alt="" width="840" height="525" srcset="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/Screen-Shot-2020-07-07-at-10.38.11-AM.png?resize=1024%2C640&ssl=1 1024w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/Screen-Shot-2020-07-07-at-10.38.11-AM.png?resize=300%2C187&ssl=1 300w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/Screen-Shot-2020-07-07-at-10.38.11-AM.png?resize=768%2C480&ssl=1 768w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2020/07/Screen-Shot-2020-07-07-at-10.38.11-AM.png?w=1498&ssl=1 1498w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

How much variance is expected in the data environment you’re working in? Does this observation fall outside of the expected margin of error? If so, how can we explain it, and if possible, can we run experiments to counter-intuit our way to a future action?

(Side note: noise and the storytelling that follows it is why I generally hate the use of things like click and scroll heat maps in decision making, as well as overly frequent causal analyses on time series or extrapolating too much meaning on tiny segments of users).

6. Correlation vs. Causation

This is elementary stuff. Just because one thing happens while another thing happens doesn’t mean they’re related (though it also doesn’t mean they’re not related — sorry to confuse things, let’s move on).

The more you look for correlations, the more they appear. That’s why “Swimming in the Data” is such a tragedy, and you need to ask clear business questions.

This is actually a much deeper topic than it appears on the surface, though in your average business role you probably don’t need to dive down the causality rabbit hole, as it gets quite philosophical (seeking answers to the epistemological question, “how does one truly know anything?”).

If you do want to go down that rabbit hole, The Book of Why by Judea Pearl is a cool non-technical book on the subject. But you may also wind up reading Hume and Popper and whatnot too (don’t blame me if you fall down that rabbit hole).

7. Correlation vs. Correlation

I’d even be careful with correlation in many cases. While they can be great starting points for further business questions and probing, they can be horrible as conclusions, and in summary statistics, they can often mislead.

See Anscombe’s quartet:

It’s important to understand concepts like variance and outliers.

8. Narrative Fallacies and Common Biases

Cognitive biases are the subject of many great books. Some of my favorites include:

Say what you will about “rationality” (I find it overrated in many cases), humans are prone to fooling themselves, especially our world of increasingly big data.

There’s no way you’ll ever stop fooling yourself (this is called “bias blind spot”), but knowing a few common fallacies and biases that have to do with data will help prevent you from at least some percentage of misreading and mistakes.

Narrative fallacy (really a series of fallacies that have to do with connecting disparate and unrelated dots in attempt to form a cohesive pattern or narrative)
Hindsight bias (overestimating your ability to have predicted an outcome that could not possibly have been predicted. “I knew it!”)
Texas Sharpshooter Fallacy (repainting the target after the data points have accumulated to make random data points appear causal and clustered/related. Also known as clustering illusion or cherrypicking.)

I find these biases and fallacies endlessly fascinating. No matter what, we’ll all still fall into them from time to time, but learning about them can help mitigate that and can help you see it when it happens in your organization.

9. Twyman’s Law

Twyman’s law states that “Any figure that looks interesting or different is usually wrong.”

So if you see an A/B test with a reporting 143%+ lift, I’d ask questions. What was the sample size? Segment? How long did the test run? Peaking at the data/p-hacking?

Not saying all surprising data is wrong, but it’s often the case that it is.

In short, skepticism and critical thinking is your best friend when it comes to data.

As my friend Mercer says, “trust, but verify.”

10. Leaky Abstractions in Analytics Tools

The platform you’re using for data has abstracted away some level of the underlying function of the tool. That’s why you bought the platform — so you don’t have to develop everything from scratch.

However, with that abstraction comes a loss in comprehension of necessary components of the system. In a great paper by Lukas Vermeer of Booking.com, he explains that experimentation platforms abstract away such aspects of experimental design as:

statistical units and tracking (how do you determine a unit of observation and log it)
sampling (how do you randomize and sample from your population)
definition and implementation of metrics (how are metrics grouped and defined)
business meaning of metrics

Yet, “the experimenter needs to be well-informed of their intricacies in order to base a sensible decision on the collected data.”

In short, using tools helps democratize data, but you still need experts with understanding of the underlying structure of the system.

If you use Google Analytics, you need someone on your team to understand how the system works and the data is structured.

You should understand how your A/B testing tool randomized users, logs units of observation, defines metrics, and performs statistical inference.

In short, there’s no shortcut. Skip the deep understanding of your tools and you’re liable to misunderstand the data they provide, which could lead you to worse outcomes or spending a ton of time and money fixing them (which defeats the utility of data in the first place!)

Conclusion (and Further Data Literacy Resources)

Data literacy is a strange term. In fact, I actually didn’t know it was a term until I started doing keyword research after I had already written this article. After learning that it basically means “knowing how to understand and interpret data,” I figure it was a good fit for this article. However, there’s no end point to data intelligence. It’s a constant journey where the more you learn, the more you realize you still don’t know.

In any case, I want to leave you with some of my favorite resources for learning more about data:

CXL Institute – awesome courses on CRO, analytics, databases, etc.
Udacity – Nanodegree programs on everything from machine learning to digital analytics and more.
How Not to be Wrong – book on interesting mathematical concepts
The Drunkard’s Walk – a book all about probability and randomness
Incerto – Nassim Taleb’s multi-volume book collection on probability, risk, etc.
A/B testing books – a list of my favorite books about A/B testing
CRO books – a list of my favorite books about CRO
5 Questions to Ask When Approaching Digital Analytics – an article I wrote for CXL on good skeptical questions to ask of your data.
How to Avoid Being Deceived By Data – article I wrote for CXL on how to lie with statistics (and avoid being deceived)

The post Data Literacy: 10 Lesser Known But Super Important Concepts To Know appeared first on Alex Birkett.

The 21 Best CRO Tools in 2025

Alex Birkett — Fri, 08 May 2020 20:31:51 +0000

What we call conversion rate optimization is actually an expansive suite of distinct functions that blend together to form this art-and-science craft of CRO.

CRO includes components of digital analytics, experimentation (A/B testing), user psychology, project management, copywriting, design, and UX research.

Nowadays, I look at it as “website product management.”

We’ve all got our preferred products, and this list is no different: it’s largely based on my own extensive experience optimizing website experiences.

Some of these will include affiliate links, which if you click and sign up for the product, might result in me getting paid. This is a win-win, because you get a good new tool and I get paid. I promise I won’t change my list based on how well these affiliate programs pay.

The 21 Best CRO (Conversion Rate Optimization) Tools

Google Analytics
Google Tag Manager
Amplitude
R & SQL
HotJar
Qualaroo
TypeForm
Google Forms
Balsamiq
Convert
Optimizely
Conductrics
Evolv
Instapage
Unbounce
UserTesting
Wynter
5 Second Test
Pingdom
CXL Institute
CRO Books

1. Google Analytics (GA4)

At the core of optimization lies measurement. While you can get a good read on weight loss by looking in the mirror, website optimization benefits from a bit more precision, a proverbial scale.

Digital analytics is an old industry with a graveyard of historical solutions, most of which lead to the nearly ubiquitous use of Google Analytics today. It’s used widely because, in its basic form, it’s free, and the free version offers an immense level of value.

Beyond that, it’s also somewhat easy to understand out of the box, and the advanced features can satisfy the esoteric end of analytics purists.

If you’re a conversion optimizer, it would be foolish not to learn and understand Google Analytics. Learn its data model. Learn the lexicon and how the data is being tracked. Learn the basic building blocks of a set up, like goal tracking, event tracking, and advanced segmentation.

Not only will Google Analytics give you a good quantitative basis to diagnose website problems and opportunity areas, it’ll likely be the solution where you eventually analyze your experiments and treatments.

You can’t go wrong taking a Google Analytics course or two and setting up the free version on your website.

And look, everyone seems to hate GA4, but it’s primarily because they haven’t learned how to use it. The interface is wonky, but if you learn how to port your data into BigQuery and Looker Studio, it’s still amazing.

2. Google Tag Manager

Google Tag Manager is Google’s tag management solution.

A tag manager is basically what it sounds like: it lets you manage and deploy various ‘tags’ or scripts you execute on your website. These could be simple third party tools that you deploy with a javascript snippet (for instance, HotJar). You can also set up advanced tracking solutions in Google Analytics using Tag Manager.

My former boss and mentor Peep Laja told me early on in my CXL days, “if you want to 10x your value as a growth marketer, learn Google Tag Manager.” He wasn’t wrong.

GTM, wielded by someone with the skill level of Simo Ahava, grants a level of near digital omniscience. The fringe cases are unlimited and expanding continuously. But even if you just deploy your 3rd party tooling and manage it via Tag Manager, you’ll get more than your requisite value from the tool.

If you’re new to tag managers, it can take some learning to ramp up on terminology and how things are set up, but there are a variety of great courses, including some basic materials from Google themselves.

3. Amplitude

Amplitude specializes in product analytics, everything that happens post sign up. This is one of the more popular tools in SaaS, and I’ve spoken to some consumer marketing leaders who use the tool for their product analytics as well.

Amplitude is wonderful because it “productizes analysis,” or in other words, it builds common analyst techniques into the platform itself so you don’t have to go through leaps and bounds to export, transform, load, clean, and analyze your data using other tools. You can view cohorts and run regressions right in the tool.

Product analysts can easily find correlative events that predict desired goals, view the success rates of various cohorts, and run what basically amounts to SQL queries within the tool itself. Good “level up” on granularity past the typical Google Analytics setup.

Similar solutions to Amplitude include KissMetrics, MixPanel, and Woopra.

4. R & SQL

Despite the abundance of analytics tooling, I’ve found more value from learning R and SQL than anything else on this list.

That Maslow quote above about every problem looking the same if you only have one tool? That’s wildly common in CRO and analytics. If all you have is Google Analytics, well, you’re going to have a lot of session-based waterfall charts and channel grouping pie charts.

R lets you break free from the paradigms of a tool’s data model and clean, organize, and analyze data your own way. It’s got great built in statistics libraries, which are particularly appropriate for A/B testing analysis. It’s also a fully fledged programming language, so you can use it to scrape web data, automate boring tasks, build data visualizations, and even host interactive applications using Shiny.

SQL is the lingua franca of data. One of the smartest data scientists I know, Begli Nursahedov, told me learning SQL is the highest leverage skill you can learn. It’s useful at any organization, and at its core, it will help you better understand the data you’re collecting at a foundational level.

Clearly, these are “CRO tools” in the same sense as the other SaaS solutions on this list, but I can’t pass them up in importance.

5. HotJar

HotJar is my go-to qualitative data analysis tool. Where Google Analytics and the above tools help you diagnose the “what” on your website, HotJar’s suite of qualitative tools can help you add some color to the quantitative trends. Typically, people refer to this as helping to answer the “why.”

Why are website visitors struggling to finish the checkout experience?
Why are mobile users first time visitors underperforming?
Why aren’t our CTAs being clicked?
What does the customer journey and user behavior look like from the first pageviews through to the end purchase?

While no tool can fully answer these questions, HotJar has several features – heat maps, session replays, form analytics, surveys and polls – that help you look in the right direction for experiment ideas and solutions.

I love that it’s a full solution, a sort of all-in-one qualitative analytics platform. Before HotJar you had to buy Crazy Egg, SurveyMonkey, KissMetrics, and ClickTale for session recording data just to get the basics. It’s just fun to use as well; great user experience.

6. Qualaroo

Quick confession: I’m not a huge fan of heat maps. Mostly noise and colorful illustrations to tell stories in my mind. My favorite qualitative tools are the ones that allow you to better probe on business questions that matter.

Sure, surveys and polls can be misused as well, but a well-worded survey can crack into that Rumsfeldian sphere of “you don’t know what you don’t know.”

To that end, Qualaroo is the best in breed software to design and deploy on-site polls and surveys. They’ve got the highest end targeting, integrations, segmentation capabilities, and logic jumps. Many alternative tools exist, but Qualaroo is my favorite.

7. TypeForm

I’m a TypeForm fanboy and power user. Their product itself is a delight, perhaps one of the only true “product led growth” companies despite everyone now claiming to operate as such.

Clearly they care deeply about customer experience and the way you feel when you use the product. That’s true of both the survey designer and the survey taker.

For some reason, taking a TypeForm survey is an order of magnitude easier than any other tool.

Anyway, not to gush too much more: TypeForm is the best survey design product I know of. It’s so flexible, and I run almost all of my user research through a TypeForm when doing CRO work.

8. Google Forms

Google Forms is free, so I use it sometimes. It’s great when you don’t need to layer on elaborate targeting and logic parameters or for internal form submission needs.

For me, Google Forms is quick and dirty; TypeForm is for when you want to do it right.

However, Google Forms does benefit from native integrations with other Google Drive products. So you can easily set up Google Sheets to receive submissions (you can do this in other tools, but typically it requires some set up).

9. Balsamiq

I’m not a visual designer, but Balsamiq gives me an easy and effective platform to design wireframes that communicate my vision for landing pages, home pages, checkout flows, or other digital experiences.

Simply put, it’s the easiest wireframing tool I’ve found for a non-designer to use.

Wireframes are all about communication, not pristine detail. I usually sketch something out on a whiteboard or pen and paper, then draw it up in Balsamiq, and then send it to designers who bring it to life (and then send it to A/B test developers to get that up and running).

If you’re more on the design side of CRO, you’ll likely explore more robust prototyping solutions like Invision or Figma or design tools like Adobe Creative Suite or Sketch.

For me, I get tons of ROI from Balsamiq.

10. Convert

A/B testing tools!

This is the meat and potatoes of conversion optimization, isn’t it?

Yes, and no. It’s clearly a myth that CRO = A/B testing, but for most programs with sufficient traffic volume, A/B testing is the gold standard for determining the effectiveness of a given user experience.

Convert is my favorite “all purpose” testing platform, for a few reasons:

It’s feature rich and goes toe to toe with Optimizely and other higher priced solutions in most features that companies typically use
It’s much more affordable
The team and customer service are leagues above other products
Great documentation and education materials
Transparent in their tracking and how they operate
Privacy focused and forward thinking.

Convert may not have some of the more advanced features of Optimizely or other personalization tools, but for the vast majority of companies, it will satisfy your experimentation needs.

In addition to Convert, I really love VWO (also known as Visual Website Optimizer). VWO has the bonus of including other CRO and analytics tools like session replays, heat and scroll maps, polls etc.

11. Optimizely

Optimizely is the biggest name in A/B testing nowadays, and for good reason: they’ve trained a generation of people on how to run A/B tests (for better or for worse – many have rightly argued that they’ve botched their statistics education and made it seem far too easy to simply set up and analyze an experiment with no statistics knowledge).

They used to have a free tier and several more affordable options, although they’ve since drastically moved up-market. This move is prohibitive for many companies in terms of pricing, but it has also brought more advanced features like server side experimentation, predictive targeting and personalization, and feature flags for product teams.

12. Conductrics

Conductrics is actually my favorite experimentation platform, though it’s probably best reserved for more advanced practitioners.

To start, Conductrics gives you options to design, deploy, and analyze experiments exactly how you’d like to, whether that’s client or server side, using a WYSIWYG editor or not, or analyzing the experiment using a one or two tailed t test or bayesian statistics. You can also run multi-armed bandit experiments, an interesting option with different use cases than your typical fixed time horizon A/B test.

It’s also got powerful predictive pooling and targeting. In other words, when you’re running new variants, it will detect segments of your user base that respond particularly favorably and you can run arms of that experience to target that population.

It’s one of the more powerful experimentation platforms, my go-to choice all things considered.

13. Google Optimize

Google Optimize is one of my least favorite A/B testing platforms in direct comparison with all the others, but it’s free, so it’s a great learning tool or way to get tests live if you don’t have the budget to spend.

Despite my smack talk, it does the basics. You can safely randomize users, stamp them with custom dimensions to analyze the data in Google Analytics, and even use these dimensions to do interesting integrative campaigns with Adwords or display ads. The native integrations with other Google tools are the real treat.

One note is that you should absolutely pull the data to a separate platform to analyze; the statistics in play are quite black box/opaque.

Note: Google Optimize no longer available Sept 2023. I compiled GO alternatives here.

13. Evolv

Evolv is another experimentation tool but of a different flavor. They deploy ‘evolutionary algorithms’ in order to splice together the ‘genes’ from your different creative and variants. These undergo transformations over ‘generations’ and evolve to produce the highest performing combination of creative.

That’s a really simplistic explanation, but for the most part pretty accurate. It’s a machine learning based optimization tool that is designed to rapidly explore different patterns and ideas.

I love it. Especially when you’re in the “discovery’ or “exploration” state of optimization, this tool can let you throw ideas together much faster and more efficiently than subsequent A/B tests or even more advanced design of experiments like factorial design.

14. Instapage

Not necessarily a conversion rate optimization tool, but landing page builders are clearly part of the arsenal of web strategists. Few CRO conversations occur without the words “landing page” thrown in.

Landing pages are just dedicated website pages. They exist to serve a conversion-oriented purpose, be it lead generation or simply a product purchase.

At scale, landing pages allow you to test your messaging and creative beyond the website, since you can tie-in ad targeting and testing. In fact, that’s why I love Instapage – it was built for high output advertisers and optimizers.

With sophisticated personalization features, easy templatization and creative management, and a fairly easy to use editor, this thing can get marketers really cranking on campaigns without the bottlenecks apparent in most developer-heavy environments.

15. Unbounce

Unbounce is my other favorite landing page builder, and in fact, I use it much more frequently. It’s got integrations with most popular marketing technology solutions, so you can pipe your leads directly into Mailchimp or whatever email tool you use.

It’s also got a lot of the same templatization features that allows for high scale creative testing.

I find Unbounce pretty easy to use, though the WYSIWYG editor does get buggy. I’d almost prefer it to be a little bit *more* developer friendly, as the marketer-friendliness seems to bring a lack of precision.

16. UserTesting

UserTesting is the best, you guessed it, user testing platform in the world! You could conduct a poor man’s user test and go to a coffee shop and have people try your website. Or, you could just do the easy thing and pay UserTesting to find you a qualified panel of users to run through your digital experience.

Of course, you can run moderated or unmoderated user tests.

User testing is an absolutely critical component of website strategy and conversion rate optimization, and I wouldn’t start a job without this tool in my arsenal.

17. Wynter

Wynter is a new player, a user testing software specifically designed to help you optimize website copywriting.

Copywriting is the last bastion of “I feel like ___,” mainly because it’s hard to get quantified data at a granular level (caveat, of course you can run a controlled experiment, but you’re still choosing *which components* to test by gut). This tool lets you known which phrases and words to look into and gives you insight on how to improve the copy on a page.

18. 5 Second Test

Another qualitative research staple, Five Second Test is an old tried and true piece of software that helps you test the clarity of your messaging. It is what it sounds like: you flash your page in front of a panel for 5 seconds and they explain what it is trying to say. You’d be surprised at how unclear your copy is (or at least, I’m surprised at how unclear my copy is much of the time).

Simple tool, profound impact. Try it on your homepage and on your value proposition.

19. Pingdom

Page speed is clearly important. Some studies at Microsoft and other juggernauts have pinpointed the value in mere millisecond page speed increases (however, other studies have not shown such sensitivity, though perhaps due to less statistical power).

Anyway, it’s simple logic: the faster a page loads, the better the user experience. The better the user experience, the more money you make. Heuristics, but mostly true.

Pingdom is a site speed tester, among other performance monitoring products. It also gives you suggestions on how to fix your page speed.

20. CXL Institute

CXL Institute is an education platform that trains digital marketers, product managers, analytics, and UX professionals.

Biased, I was in the room before, during, and after the CXL Institute launch and helped coordinate a lot of the early education programs. However, I believe it’s without peers and the absolute best place you can learn about CRO. Nothing compares. There are other programs that may dive deep on specialties (e.g. statistics), but taken as a whole, nothing will set you up for a CRO career better than CXL.

I still keep up to date on their courses because those that stagnate fall behind, and I don’t want to fall behind. If you don’t want to fall behind, check out the Institute.

21. CRO Books

Cheesy chorus at this point, but CRO isn’t about the tools, it’s about the people and their know-how. Give an amateur Conductrics and an Adobe Analytics setup, and it won’t amount to much. But a master optimizer could make do with freemium tools and still kick back and ROI.

I like courses, but I really like books. Here are some of my favorites to get you started:

I also wrote two entire blog posts outlining my favorite CRO books and my favorite A/B testing books.

Conclusion

Conversion rate optimization is an art, a science, an operating system, and a good reason to go down the never ending rabbit hole of marketing technology.

I’ve got my preferred solution, but – and I genuinely mean this – send me your new and underrated tools that I missed. Throw a comment below. Email me. Doesn’t matter. I wanna know what’s going on in this space that I might be missing.

Otherwise, hope you enjoyed this list! Now go read my article on A/B testing guidelines.

The post The 21 Best CRO Tools in 2025 appeared first on Alex Birkett.

Mo Data Mo Problems? When More Information Makes You More Wrong

Alex Birkett — Tue, 26 Nov 2019 02:59:48 +0000

More data isn’t necessarily better, and in fact, sometimes more data leads to much worse decision making.

We’re all trying to make better business decisions (I hope, at least). All decisions include some level of uncertainty. Properly collected and analyzed data can reduce that uncertainty, but never eliminate it.

However, in practice, I’ve noticed many executives and thought leaders leaning on data like the proverbial drunkard and the lightpost (using it for support rather than illumination). This can lead to both over-certainty and inaccurate decision making, the combination of the two being quite deadly.

image Source

" data-medium-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/leadership_lemmings-1.png?fit=300%2C188&ssl=1" data-large-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/leadership_lemmings-1.png?fit=504%2C315&ssl=1" class="size-full wp-image-903" src="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/leadership_lemmings-1.png?resize=504%2C315&ssl=1" alt="" width="504" height="315" srcset="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/leadership_lemmings-1.png?w=504&ssl=1 504w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/leadership_lemmings-1.png?resize=300%2C188&ssl=1 300w" sizes="(max-width: 504px) 100vw, 504px" data-recalc-dims="1" />

Image Source

I absolutely love data – collecting it, analyzing it, visualizing it. It’s probably my favorite part of my job.

But as marketers and analysts, it would be irresponsible if we didn’t speak realistically about the shortcomings of data, particularly when it comes in very large quantities.

When is ‘Too Much Data’ a Bad Thing?

There’s generally a tradeoff between accuracy and utility. In other words, collecting more data can be costly or can incur costs in how complex the dataset is, but that tradeoff may be worth it if you can gain a greater degree of precision in your measure.

In the easy example of an A/B test, you clearly want to collect enough data to feel confident enough to pull the switch and make a decision. However, there’s a lot of nuance here, even: the consequences of some decisions can be much greater than others, therefore the cost of collecting the data can be outweighed by the importance of making the right decision.

For example, it’s much more important that you not make the wrong decision when designing a bridge or researching a medicine than it is when you’re trying out new CTA button colors.

So right off the bat, we can establish that the value of data also depends on the utility it provides, given the impact of the decision.

Beyond that, sometimes collecting more data isn’t just wrong in the sense that it is costly; sometimes more data makes it less likely you’ll actually make the right decision.

Here are the five times when that is the case:

When we’re tracking the wrong thing
When we’re incorrectly tracking the right thing
When you’re able to find spurious correlations because of “swimming in the data”
When the cost of data collection supersedes its utility
When what we’re tracking is unmeasurable and we’re using data to save face

A Confidence Problem: Boldly Walking in the Wrong Direction?

The underlying theme in all of these is that more data leads to greater confidence in decision making, and making a bad decision with great confidence and ‘the data on your side’ is more dangerous than acknowledging the uncertainty in the decision.

Saying “I don’t actually know, but I have a hunch,” gives you the freedom to pivot upon receiving new data, which is a form of optionality. The opposite is when you commit too heavily to a poor decision due to misinterpreting the data. The more data that backs up your wrong decision, the more likely you are to zealously pursue it.

A real benefit to acting without data, or without much data, is we are forced to acknowledge the inherent uncertainty involved in the decision. When we have too much data, we’re often placated by the numbers. We believe the room for error is much smaller than it really is.

I’ll walk through each of these in detail through stories, quotes from smarter people than myself, and also technical explanations where applicable.

1. You’re Measuring the Wrong Things

The first mistake is when you make decisions on data that is actually tracking the wrong things.

Andrea Jones-Rooy, Professor of Data Science at NYU, gave the example of using data to make better hiring decisions. Here’s how she put it:

“Very few pause to ask if their data is measuring what they think it’s measuring. For example, if we are looking for top job candidates, we might prefer those who went to top universities.

But rather than that being a measure of talent, it might just be a measure of membership in a social network that gave someone the “right” sequence of opportunities to get them into a good college in the first place.

A person’s GPA is perhaps a great measure of someone’s ability to select classes they’re guaranteed to ace, and their SAT scores might be a lovely expression of the ability of their parents to pay for a private tutor.”

“What to measure” is a common topic in conversion rate optimization, as your impact will depend on the yardstick by which you measure yourself. If you’re optimizing for increased click through rates, you may not be improving the bottom line, but simply shuffling papers.

Similarly, we often try to quantify the user experience and tend to choose between a few different metrics – NPS, CSAT, CES, etc. – even though all of these things measure completely distinct things, none of which encompass the entire user experience.

What you track is highly important and shouldn’t be overlooked. If you’ll use a metric to make a decision in the future, put in the time to make sure it means what you think it means (this, of course, is why the Overall Evaluation Criterion, the North Star Metric, the One Metric That Matters, etc., are all such big points of discussion in our respective fields).

Practical aside, you can ignore the following: bounce rates, time on site, pages per session (unless you sell ads), click through rate, pageviews, social shares, “brand awareness” (whatever that means), and whatever other vanity metrics you use to tell stories about your work.

Tools and strategies that involve “tracking everything” are wrong because of this reason: you introduce so much noise that you can’t distinguish the signal. You’re swimming in so much unimportant data that you can’t see the stuff that matters. Nassim Nicholas Taleb explained this in Antifragile:

“More data – such as paying attention to the eye colors of the people around when crossing the street – can make you miss the big truck. When you cross the street, you remove data, anything but the essential threat.”

Measure enough shit and you’ll find a significant correlation somewhere and miss what matters for your business.

2. You’re incorrectly tracking the right things

This is one of the most heartbreaking of the big data errors I see, and it’s probably the most common.

You and your team, including executives, hash out the strategy and map out what you’ll measure to weigh its performance. You spend time mapping out your data strategy – making sure you can technically implement the tracking and that the end user can access and analyze it.

Everyone has a plan until they get punched in the face, as Mike Tyson said.

Your tracking can break down for a truly unlimited amount of reasons.

Tiny variable name changes will ruin R scripts I’ve written. Redirects can strip tracking parameters. During the two plus years I’ve been at HubSpot, we’ve had numerous tracking bugs on both the product and marketing side of things. At CXL, same thing. We did our best to remain vigilant and debug things, of course. But shit happens, and to pretend otherwise isn’t just naive, it’s foolish.

Many end-users of an analytics tool will simply put their faith in the tool, assuming what it says it is tracking is what it is actually tracking. A bounce rate means a bounce rate means a bounce rate…

Of course, a sophisticated analyst knows this isn’t the case (rather, anyone who has spun up more than a few UTM parameters has seen how things can break down in practice).

In a more theoretical context, here’s how Andrea Jones-Rooy explained data instrumentation problems:

“This could take the form of hanging a thermometer on a wall to measure the temperature, or using a stethoscope to count heartbeats. If the thermometer is broken, it might not tell you the right number of degrees. The stethoscope might not be broken, but the human doing the counting might space out and miss a beat.

Generally speaking, as long as our equipment isn’t broken and we’re doing our best, we hope these errors are statistically random and thus cancel out over time—though that’s not a great consolation if your medical screening is one of the errors.”

Take, for example, the act of running an A/B test. There are many nodes in the system here, from putting the javascript tag on your website (assume we’re using a popular client-side browser testing tool like VWO) to the targeting of your users and pages to the splitting and randomizing of traffic, the experience delivery, and the logging of events

Basically, there are a shitload of places for your perfectly planned A/B test to return incomplete or inaccurate data (and it happens all the time).

Flicker effect is commonly talked about, but that’s the tip of the iceberg. If you want to shake your faith up, just read Andrew Anderson’s spiel on variance studies.

Additionally, with A/B testing, the greater the sample size you introduce, the more able you are to detect smaller effect sizes. If you have greater than normal variance or a bug that introduces a flicker or load time increase, then that effect could turn into a false positive at large samples (more data = worse outcome).

Inaccurate tracking and bugs are inevitable in the grand scheme of thiings. That’s why it’s important you hire as many intelligent, experienced, curious, and most important, diligent humans that you can.

Another great example is using survey data or customer interview data, but not analyzing the bias you (the survey designer or interview) introduce. Your data, then, will of course be faulty and will lead you in a poor direction.

The other side of this is if you survey or interview the wrong people, even if you ask the right questions. Bad inputs = bad outputs. Formally, this is known as “selection bias,” and it’s a problem with so many measures. Take, for example, sentiment analysis from Tweets. Andrea Jones-Rooy explains:

“Using data from Twitter posts to understand public sentiment about a particular issue is flawed because most of us don’t tweet—and those who do don’t always post their true feelings. Instead, a collection of data from Twitter is just that: a way of understanding what some people who have selected to participate in this particular platform have selected to share with the world, and no more.”

The more flawed interviews you do, tweets you analyze, or responses you collect, the more confident you’ll be to continue in that direction (thus, more data = worse outcome in this case).

The solution: Trust but verify, as my friend Mercer says.

3. When you’re able to find spurious correlations because of “swimming in the data”

When you have a lot of data, it’s easy to find patterns in it that are completely meaningless.

Image Source

Image Source

Segmenting after an A/B test? Well, that’s a best practice. However, if you make decisions on those segments without accounting for multiple comparisons, you’re significantly raising the risk of false positives. Same as measuring multiple metrics during a test (which is also super common, unfortunately)

Image Source

" data-medium-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/False-Positive.png?fit=300%2C188&ssl=1" data-large-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/False-Positive.png?fit=408%2C255&ssl=1" class="size-full wp-image-902" src="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/False-Positive.png?resize=408%2C255&ssl=1" alt="" width="408" height="255" srcset="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/False-Positive.png?w=408&ssl=1 408w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/False-Positive.png?resize=300%2C188&ssl=1 300w" sizes="(max-width: 408px) 100vw, 408px" data-recalc-dims="1" />

Image Source

Really, a lot of the problem with too much data is really just the mishandling and misinterpretation of it. You can solve a lot by reading a book like Georgi Georgiev’s “Statistical Methods in Online A/B Testing.”

Outside of simple misunderstandings with statistics, we truly do have a problem of “swimming in the data,” which is another way to say we’re drowning in bullshit insights.

If you’ve got enough time, watch Justin Rondeau’s talk about the mistake of “swimming in the data” in the context of customer surveys:

The more data points you track and the more data you compile, the more meaningless insights will crop up. If you’re not careful, you’ll spend all of your time chasing statistical ghosts (or if you’re a bit more amoral, intentionally cherry-picking patterns to back up your story). Nassim Taleb says it like this in Antifragile:

“Data is now plentiful thanks to connectivity, and the proportion of spuriousness in the data increases as one gets more immersed in it. A very rarely discussed property of data: it is toxic in large quantities – even in moderate quantities.”

Related, the more often you look at data, the more likely you are to find something of interest (meaningful or not). This all roots back to multiple comparisons and alpha error inflation, but to put it into a more concrete and less academic context, here’s another quote from Taleb’s Antifragile:

“The more frequently you look at data, the more noise you are disproportionally likely to get (rather than the valuable part, called the signal); hence the higher the noise-to-signal ratio. and there is a confusion which is not psychological at all, but inherent in the data itself.

Say you look at information on a yearly basis, for stock prices, or the fertilizer sale of your father-in law’s factory, or inflation numbers in Vladivostok. Assume further that for what you are observing, at a yearly frequency, the ratio of signal to noise is about one to one (half noise, half signal) – this means that about half the changes are real improvements or degradations, the other half come from randomness. This ratio is what you get from yearly observations.

But if you look at the very same data on a daily basis, the composition would change to 95% noise, 5 % signal. And if you observe data on an hourly basis, as people immersed in the news and market price variations do, the split becomes 99.5% noise to .5% signal.

That is 200 times more noise than signal – which is why anyone who listens to the news (except when very, very significant events take place) is one step below a sucker.”

What we need is less data but more precisely tuned to deliver information to questions we care to answer. This is much better than the standard “let’s track everything and see what the data tells us” approach, which as you can see, can be dangerous in even benevolent hands, let alone those of an intentional and savvy cherry picker.

Image Source

" data-medium-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/dilbert-mot-reason.gif?fit=300%2C93&ssl=1" data-large-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/dilbert-mot-reason.gif?fit=400%2C124&ssl=1" class="size-full wp-image-901" src="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/dilbert-mot-reason.gif?resize=400%2C124&ssl=1" alt="" width="400" height="124" data-recalc-dims="1" />

Image Source

4. When the cost of data collection supersedes its utility

The expected value is a probabilistic anticipated future value of an action. In basically every business case, we’re hoping to maximize the ratio between the cost and the expected value/benefit so that the cost-benefit trade-off makes it worth it to do the action. In other words, we want to get as much as we can for as little cost as possible, and we want to know our ROI with a high degree of certainty.

It’s easy to measure costs in monetary terms, so calculating ROAS isn’t complex. It’s a bit harder to measure resource costs, but smart growth teams calculate the “ease” of a given action and use it to prioritize.

Measuring the cost of data is difficult, though, because the outputs aren’t linear. In many cases, the marginal utility of collecting more data decreases, even if it is accurately collected.

This is a commonly occurring phenomenon, as we can imagine eating donut is satisfying, but the second donut is a bit less so, with each subsequent slice having a lower marginal utility (and past a certain point, perhaps a lower total utility as you tend to get sick from eating too many).

Image Source

" data-medium-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/clip_image00210.jpg?fit=300%2C119&ssl=1" data-large-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/clip_image00210.jpg?fit=529%2C209&ssl=1" class="size-full wp-image-899" src="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/clip_image00210.jpg?resize=529%2C209&ssl=1" alt="" width="529" height="209" srcset="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/clip_image00210.jpg?w=529&ssl=1 529w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/clip_image00210.jpg?resize=300%2C119&ssl=1 300w" sizes="(max-width: 529px) 100vw, 529px" data-recalc-dims="1" />

Image Source

Two marketing examples:

User testing
A/B testing

In user testing, you don’t need more than 5-7 users. You’ll find ~80% of usability issues with 5 users, and past 7 the curve pretty much flattens:

Image Source

As Jakob Nielsen puts it:

“As you add more and more users, you learn less and less because you will keep seeing the same things again and again. There is no real need to keep observing the same thing multiple times, and you will be very motivated to go back to the drawing board and redesign the site to eliminate the usability problems.”

When you run an A/B test, there are multiple costs incurred, including the cost of setting up the test and the “regret” established during the course of the experiment (either delivering a suboptimal experience to your test group or failing to exploit a better variant sooner).

Image Source

" data-medium-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/regretABtest-1.png?fit=300%2C235&ssl=1" data-large-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/regretABtest-1.png?fit=831%2C651&ssl=1" class="size-full wp-image-905" src="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/regretABtest-1.png?resize=831%2C651&ssl=1" alt="" width="831" height="651" srcset="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/regretABtest-1.png?w=831&ssl=1 831w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/regretABtest-1.png?resize=300%2C235&ssl=1 300w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/regretABtest-1.png?resize=768%2C602&ssl=1 768w" sizes="(max-width: 831px) 100vw, 831px" data-recalc-dims="1" />

Image Source

A/B testing, like other forms of research and data collection, is a trade-off between accuracy and utility. Run the test forever and you (may) get more precision, but you lose almost all of the usefulness. You also introduce powerful opportunity costs, because you could have been running more experiments or doing more impactful work.

In reality, there are problems with running A/B tests for too long that go beyond marginal utility. In fact, due to cookie expiration and other external validity threats, your results will likely become less trustworthy by collecting too much data.

When a marketing leader can’t make a decision due to too much data and reflection, we refer to it as “analysis paralysis,” and it’s not a good thing. Sometimes it’s best to just make a decision and move to the next thing.

Image Source

" data-medium-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/dilbert_decision_process.jpg?fit=300%2C93&ssl=1" data-large-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/dilbert_decision_process.jpg?fit=560%2C174&ssl=1" class="size-full wp-image-900" src="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/dilbert_decision_process.jpg?resize=560%2C174&ssl=1" alt="" width="560" height="174" srcset="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/dilbert_decision_process.jpg?w=560&ssl=1 560w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/dilbert_decision_process.jpg?resize=300%2C93&ssl=1 300w" sizes="(max-width: 560px) 100vw, 560px" data-recalc-dims="1" />

Image Source

5. When what we’re tracking is unmeasurable and we’re using data to save face

We make decisions all day long, and some of them are going to be poor decision – no one bats a thousand. Someone with a mature understanding of data and decision making realizes this and factors it into their expectations. In reality, your good bets just need to outweigh your bad ones, and the a) more often you can do that or b) the bigger you can win when you’re right, the better.

The opposite would be a misunderstanding of the nature of uncertainty in the business world, and in that world, your nonoptimal decisions need to have a fallback, something to point to that “caused” the bad decision (god forbid it was poor judgement).

That’s why, even when it is truly (or very nearly) impossible to measure what they want to measure, leaders will sometimes ask for enough data to back up their decision (often something they’ve already decided on and just need justification for). Because, then, in the event of failure, their face is saved and their blame is absolved. After all, “the data said it would work.”

(Remember: data doesn’t say anything. We use data to reduce uncertainty and make better decisions, but we can never fully reduce uncertainty or make perfect decisions).

Another Nassim Taleb quote fits here:

“What is nonmeasurable and nonpredictable will remain nonmeasurable and nonpredictable, no matter how many PhDs with Russian and Indian names you put on the job – and no matter how much hate mail I get.”

This is a tough point to swallow, because any data we’re using as a proxy for the untrackable is probably a waste of time. Tricia Wang refers to “Quantification bias” – the unconscious valuing the measurable over the immeasurable. Eugen Eşanu summed up the problem in UX Planet:

“People become so fixated on numbers or on quantifying things, that they can’t see anything outside of it. Even when you show them clear evidence. So quantifying is addictive, and when we don’t have something to keep that in check, it’s easy to fall into a trap. Meanwhile, you are searching for future you are trying to predict in a haystack, you don’t feel or see the tornado that is coming behind your back.”

Most business decisions we make are in complex domains. We may be able to tease out some causality running experiments on a signup flow, but for strategic decisions, brand marketing, and team structure decisions (and many more examples)…well, they have first order effects that may be quasi-trackable, but the second and third order effects are usually latent and more important by orders of magnitude.

Negative second and third order effects can usually be mitigated by acting from principles first, and only then letting the data drive you.

You’ve undoubtedly come across a website that looked like a Christmas tree with all of its popups, or you’ve dealt with an unethical business leader who forgot that success is derived from compound interesting and life is a series of iterated games.

In these cases, the allure of the first order effect (higher CTR, more money made by ripping someone off) overshadowed the long term loss. So in absence of measurability, define your principles (personal, team, company) and operate within that sphere. Your career is long, so don’t burn out seeking short term wins.

Okay, so not everything can be tracked (or tracked easily) – What’s the solution, particularly for data driven marketers and organizations?

My take: Track what we can with as much precision as possible, and leave a percentage of your business portfolio open for the “unmeasurable but probably important.”

Dave Gerhardt recently posted something on this topic that I really liked:

That’s an eminently mature way to look at a strategic marketing portfolio.

As an example, I don’t think many companies are accurately measuring “brand awareness,” but clearly branding and brand awareness as concepts are important. So just do things that help prospects learn about what you do and don’t try to tie it to some proxy metric like “social media impressions” – that’s a form of scientism, not to mention it’s game-able to the point where quantification may even backfire.

I like using an 80/20 rule in portfolio development, which I’ve borrowed from Mayur Gupta:

“Do your growth efforts and performance spend benefit from a strong brand (efficiency and/or effectiveness or organic growth)? Are you able to measure and correlate?

Think about the 80–20 rule when it comes to budget distribution — if you can spend 80% of your marketing dollars on everything that is measurable and can be optimized to get to the “OUTCOMEs”, you can spend 20% however you want. Because 100% of marketing will NEVER be measurable (there is no need).”

The ratio isn’t important, only to note that not everything can be forecasted, predicted, or chosen with perfect certainty. Alefiya Dhilla, VP of Marketing at A Cloud Guru, mentioned once to me she thinks in terms of certainty/risk portfolios as well, balancing it with around 70% in tried & true trackable actions, 10-20% in optimizing and improving current systems, and the remainder in unproven or untrackable (but possibly high reward) bets.

The point is humility, much like the serenity prayer, tracking accurately what you can and being okay with what you can’t.

Conclusion

Data is a tool used to reduce the uncertainty in decision making, hopefully allowing us to make better decisions more often. However, in many cases, more data does not equal better outcomes. Data collection comes with a cost, and even if everything is tracked correctly, we need to weigh our decisions by their efficiency and ROI.

Additionally, data often makes us more confident. As Nassim Taleb put it, “Conversely, when you think you know more than you do, you are fragile (to error).”

This is a big problem is what the data is telling us is poorly calibrated with what we want to know. Whether through measuring the wrong thing or measuring the right thing incorrectly, things aren’t always clean and perfect in the world of data.

All that said, I love data and spend most of my time analyzing it. The point is just to be critical (doubt by default, especially if the numbers look too good to be true), be humble (we’ll never know everything or have 100% certainty), and constantly be improving (the best analysts still have a lot of room to grow).

The post Mo Data Mo Problems? When More Information Makes You More Wrong appeared first on Alex Birkett.

How to Learn R (for Marketers and Business Folks)

Alex Birkett — Fri, 08 Nov 2019 17:06:06 +0000

I’m a marketer, but I spend a lot of time in R. I use it to analyze A/B tests and explore data sets. I’ve also built fully functional web applications using R and Shiny to enable new processes for my team at HubSpot using the language.

There are so many free resources for learning technical skills nowadays. It’s a darn good time to be a technical marketer (and also, the “non-technical marketer” is a myth and everyone can learn this stuff).

My first project was a few years ago. While working at CXL, I undertook a project inspired by Stefania Mereu, using R to analyze survey data and create data-driven user personas.

Basically, learning R (and subsequently Python) has been a super power in many ways.

I paid my sister to learn R and take notes

I’ve wanted to write down some notes on how to get started, in case other marketers wanted to give it a crack. However, it’s tough to write about what it was like to learn when you’re standing from your current perspective.

So I paid my sister a few hundred dollars to go through a couple R courses (specifically, DataCamp’s Intro to R) and take notes on what she learned (side note: she’s a college student and would be a great intern if you want to hire her). Her background is having used a bit of SPSS and SQL for data analytics classes in school, so this was her first exposure to R.

My sister, the future data scientist. Hire her for an internship.

" data-medium-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/68608583_10214077724597154_7999597835450318848_o.jpg?fit=300%2C300&ssl=1" data-large-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/68608583_10214077724597154_7999597835450318848_o.jpg?fit=840%2C840&ssl=1" class="wp-image-861" src="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/68608583_10214077724597154_7999597835450318848_o.jpg?resize=600%2C600&ssl=1" alt="" width="600" height="600" srcset="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/68608583_10214077724597154_7999597835450318848_o.jpg?resize=1024%2C1024&ssl=1 1024w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/68608583_10214077724597154_7999597835450318848_o.jpg?resize=150%2C150&ssl=1 150w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/68608583_10214077724597154_7999597835450318848_o.jpg?resize=300%2C300&ssl=1 300w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/68608583_10214077724597154_7999597835450318848_o.jpg?resize=768%2C768&ssl=1 768w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/68608583_10214077724597154_7999597835450318848_o.jpg?w=1826&ssl=1 1826w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2019/11/68608583_10214077724597154_7999597835450318848_o.jpg?w=1680&ssl=1 1680w" sizes="(max-width: 600px) 100vw, 600px" data-recalc-dims="1" />

My sister, the future data scientist. Hire her for an internship.

I took her notes and combined them with my memory and experience of learning the language, as well as some examples of projects you can do as a marketer (click here to skip to that section). If I can learn R and use it daily/weekly, and if my sister can learn the basics of R in a few weeks as a college student, you can too! So here’s my beginner’s guide to learning R. First, why…

Why use R? What can marketers/growth/product managers do with R?

If you work with data you probably know how to use tools like Google Analytics and Microsoft Excel. You might even know some SQL or perhaps how to set up reports in a BI tool like Looker.

These are great starting point, but if you want to unlock interesting new analysis tools, production capabilities, or things like working with APIs and scraping/cleaning data, then you should learn a scripting language like R or Python. You can also automate a bunch of stuff if you learn R or Python.

For all of its benefits (particularly with machine learning), I still prefer using R to Python, and I still think R is easier to get up and running for new users (especially marketers). Because there are so many digital analysts working with R, there are tons of ready made packages to do things like access Google Analytics, visualize data, print to Google Sheets, etc. Additionally, R Studio is an epically easy place to learn R and work. I’m biased because I learned R first, but it felt so actionable right away. For example, in like an hour, you can make a heat map of your website traffic like this:

Some benefits of R:

Easy to set up, write, and QA
Centralized language with only one version being actively supported
A big community with many packages to help cut down on your learning time and coding time (it’s particularly popular among analysts and marketers)

The biggest reason I can think is that, given your current use of tools like Google Analytics, Adwords, or even a tool like SEMRush, we can get wayyy more out of our data by learning a little R (will show some examples in the article below – click here to skip the setups and basic syntax lessons and go right to the code/examples).

Sections of this article will be:

Installing R
The Basic Easy Stuff
R Data Types
Fun examples with code samples
More resources for hungry learners and future technical marketers

Installing R on Your Computer and Getting Setup Properly

This will be a relatively quick section, Texas-style:

R Studio is an integrated development environment (IDE). It makes programming much easier, as you can edit and test code and see the output and your variables/data in real-time.

While you’re messing around with R, check out Swirl. It’s an R package that teaches you to code R, in R. It’s how I picked up the syntax initially. All you have to do is install it and call it, like so:

install.packages(“swirl”)
library(swirl)
swirl()

Now you’re up and running and already have a (free) R course! Maybe you don’t even need the rest of this article.

The Super Easy Basic Stuff

First off, R is case sensitive. Second, remember that you define variables in R by using an arrow, like this:

variable_name <- “Here’s a string”

My sister was thrilled about the easiness of defining variables. Her notes:

“To assign variables is simple. Assigning a variable allows you to store a value into R – it is done by typing something such as “x <- 50”, which would assign “50” to “x”. To print out the value of a variable, all you have to do is write the name of the variable on a line – easy!”

I share her enthusiasm.

You can comment away code (write code that is not executed, mainly for documentation and communication) by using the # sign at the beginning of a line, e.g.:

Here’s what prints when I run that code:

You can do basic arithmetic easily

At its most basic level, you can use R as a calculator. Give it a try. Type in something like this:

5 + 5

Highlight that bit of code and hit command + enter.

As my sister Reilly wrote, “you can do all basic arithmetic, as well as modulo, which looks like this: %%; for example, writing “28 %% 6” gives you 4 (it returns the remainder of the division number on the left by the number on the right).”

This can be helpful if, say, you’re trying to isolate only even numbers in a dataset:

Of course, you can use variables for your mathematics as well. E.g. you can assign an integer (say 50) to a variable (say ‘x’) like such:

x <- 50

Then you can use your variable to do math! Check it out:

Functions and Loops in R

Let’s not dive too heavily into the stuff that is common to all programming languages, but I did already briefly show you how to build a function. Here’s the syntax:

function_name <- function(arg) {
## your function here
}

A For Loop is pretty easy, too:

for (val in sequence) {
statement
}

I’ll a for loop and a data set to show you what we can do. Here we just loop through a list of integers 1 through 10 and add 1 to each of them:

And here’s a totally useless function, just to illustrate syntax:

The different types of data in R

Okay, here’s the R specific stuff when it comes to how data is codified, stored, and utilized in R. This stuff matters when it comes to querying, cleaning, and analyzing data you’ll work with.

Basic values;

Decimal values and integers are both numeric
Boolean values are logical (true or false)
Text/string values are characters.

Those are the building blocks, the root values of data, in R. Then you have different data structures, such as:

Vectors
Matrices
Factors
Data Frames
Lists

You can check what data type you have by using the function ‘class()’ like the example here:

The course Reilly took, as far as I can tell, was a compendium of data types in R and mainly covered this stuff. We’ll walk through her notes and each type to explain how it is used. After that, I’ll show you a few examples of how you can easily get started with some R projects, as well as some resources for further learning (click here to skip to examples).

Vectors in R

A vector is the simplest data type in R. it’s a one-dimensional array that can hold just one data type at a time. Examples:

You can select a single unit of a vector by using this syntax: vector_name[vector_position]

Unlike other languages, it starts counting at 1 (not 0), so numeric_vector[2] gives us “5”:

If you want to select more than one, you can use a comma, e.g. numeric_vector[1,3], or you can select a series of units using a colon, e.g. numeric_vector[1:3]

My sister took a lot of notes on vectors, and probably for good reason (they’re the core data structures in R, really). I, however, find it boring to write about all the different rules and stipulations to vectors (and data types generally), so I’ll do two things:

Advice: just start writing code and seeing what you can and can’t do
Resource: here’s a reference document on vectors (print it and hang it up if you want)

Matrices in R

Awesome, now we move to matrices: a two-dimensional collection of values of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns. It’s like a spreadsheet!

You can create matrices using the function matrix(), and there are three arguments: collection of values, byrow = TRUE/FALSE, nrow = )

Argument one: a collection of values that will be arranged into rows and columns – [1:5] is equivalent to (1, 2, 3, 4, 5)
Argument two: byrow = TRUE/FALSE – if you want the matrix to be filled by rows use TRUE (like the example), if you want it to be filled by columns use FALSE
Argument three: nrow = ___ – fill in the blank with however many numbers of rows you want the matrix to have. The example has 4.

You can, of course, combine multiple vectors together like this:

Woo! More points: select them the same way as you did vectors (brackets). You can use rowSum() and colSum() to calculate the totals for each row and column in the matrix (these functions create a new vector).

Another cool function, cbind() lets you add columns to a matrix without having to redo it. It looks like this: new_matrix_name <- cbind(old_matrix, new_matrix/new_vector…..) where new_matrix_name is the new matrix with the added columns, old_matrix is the completed matrix, and new_matrix or new_vector is the part you want to add on.

rbind() is the same thing but for rows!

Alright, again, I’m getting bored just writing this stuff, so just try things out and maybe save this reference document for later.

Factors in R

Factors are a statistical data type used to store categorical variables (limited to a set of categories). Read more on categorical variables here.

In the vast majority of cases, I find that when I’m uploading data (from an API or a spreadsheet or something), R tries to coerce strings as factors and it’s annoying – that’s why when we upload a CSV or TXT, there’s an argument called stringsAsFactors and I usually set it to FALSE).

Enough about factors, here’s what my sister wrote verbatim about what she struggled with:

“I initially didn’t understand the reasoning behind levels and the summary function. I understood what they did, but I was hesitant with understanding what to do with the information given. After a lot of trial and error of applying levels() and summary() to the data sets in the course, I began to understand their purpose. Best advice: practice, research, and continue doing so until it clicks, and never hesitate to ask for help if needed.”

Bolding mine: don’t be afraid to just tinker and learn as you go.

Data Frames in R

We are so close to being finished with data types in R! Push through. Two more and they’re the things I use most in data analysis and tool building.

What’s a data frame in R?

A data frame is a two-dimensional structure in where columns contain values of a variable and rows contain a set of values (observations) from each column – the data can be numeric, logical, character, etc. Now this is like a spreadsheet!

This is more practical because analyzing data sets usually involves data frames that contain more than one type of data – for example, when working with survey answers, as the course uses: yes/no questions = logical, “how old are you” = numeric, open ended questions = character.

Data sets can clearly become massive, but cutting them down into sections can be useful: use head() and tasil() to see the first observations and final observations of a data set respectively. The first function you’ll often use on a data frame is str(). Here’s what it shows you:

You can create your own data frame using the data.frame() function

I’m going to burn through this section on data frames, despite them being my most used data type in R, simply because there’s a massive amount of information about them. You’re best off playing around and looking up some documentation (which you can do from R, by the way, with a question mark before a value, like ?data.frame):

For what it’s worth, my sister said the data frames lesson in the Data Camp course was super intuitive. I’m pretty sure if you work in Excel a lot, you’ll understand data frames right away and get a lot of use from them.

Lists in R

Last one!

What’s a list in R?

Here’s how my sister put it: ‘A list is literally a list like you would use in your daily life – it has items that differ in characteristics, activity, time frame, etc.”

In R, a list lets you put an array of objects under a list name in an orderly fashion. Objects under the list can be matrices, vectors, data frames, etc. that haven’t been introduced yet, and they can be as random as possible – they do not have to be related to one-another.

You can make a list like this: “list_name <- list()”, where list_name is the name of the list you are making, and inside the parentheses are the contents of the list (remember, they can be vectors, matrices, etc.)

You can assign names to the components of your list by adding another part to the previous example: list_name <- list() <- names(list_name) <- c(“____”….), where c() holds the names of the contents of the list respectively.

To add elements to an existing list, just use c(): new_list_name <- c(old_list, value = ____), where new_list_name is the new list you would like to make, old_list is the list you made that you want to add the information to, and value = ____ is the part you want to add to the list.

Four Use Cases for Marketers Learning R (with Code)

I’ve written in the past about how I’ve learned R (in 2017 I wrote about the user personas projects as well as attribution modeling and Google Analytics heatmaps).

Here are four news sample projects I’ve worked on and you can, too.

1. Automate Data Collection and Analysis in Google Analytics

You can do a lot of things with Google Analytics and R (read this post here to get started, and check out this site for other great ideas). Here’s a super basic script to pull some quick GA data for blog landing pages and organic traffic:

Since GA data and what you want to learn from it is highly contextual, I recommend not even copying my script here, and instead, just playing around with queries/questions you want to answer. It’s all do-able from R (and like I note in my script, you can get away from sampled data!)

2. Pull the “Head Keyword” for a Given URL with SEMRush

Recently, I was working on a “Product Led Content” audit of the HubSpot blog – essentially, I was looking for previously published articles that have a strong product focus, where we could potentially inject more mentions of our freemium tools or more conversion points.

I searched things like “site:blog.hubspot.com intitle:”how to”” and then scraped all the URLs using SEOquake.

I then pulled the title in via =ImportXML(A2, “//title”), and I ran the list of URLs through Ahrefs bulk upload tool to get the estimated keyword and traffic volume. This let me see quickly which the most impactful and popular posts were.

I also wanted the head keyword, though, because we can then use that keyword to prioritize link building outreach. The logic is that, if we rank and bring in conversions for a keyword (say “email marketing”), then we should also seek to get links from other sites who rank for that keyword.

It would take forever to get each head keyword manually through Ahrefs, so I whipped up a script in R that does so using the SEMRush API. Here’s the code:

3. Analyze A/B Test Results in R

I always use R to analyze my A/B test results, and normally to slice and dice segments for post-hoc analysis.

It’s easy to run a simple t.test() in R, but there is a lot more you can do as well. Instead of listing it all here or trying to rewrite the code to share, just read this post. It’s awesome.

Image Source

4. Check a list of URLs to see if it links to your website

People comparison shop. When someone searches “best live chat software,” they’re very likely about to buy some live chat software. The more times they see your product mentioned in a search like that, the more likely they are to consider it as something serious to check out.

That’s why I like “SERP Real Estate” (the percentage of search results for a given query that mention your brand) as an important measure of brand awareness. Unfortunately, no SEO or PR tool gives you this data. So let’s use R to pull it from SEMRush, scrape the page, and check against each page for our link!

I’ve built a whole interactive application for this, but I’ll write the generic script that does the function here:

Further Resources for Learning R

If you like courses, DataCamp is the best for pure, practical programming. Coursera has some more in-depth and formal courses (though it is my least favorite platform of those mentioned here). Udacity also has a great exploratory data analysis course using R.

I don’t find courses all that useful beyond the beginning of your learning journey, and instead, I like to work on projects and figure out as I go. Some good inspiration for projects and blogs/people to follow in the R world:

Hacker News (generally good for ideation and inspiration)
Mark Edmondson (doing insanely valuable work for R and analytics)
Dartistics (good site for beginners in the digital analytics space with examples and sample code)
Measure Slack (has a data science channel with lots of R experts where you can lurk or ask questions)
Hadley Wickham
Emily Robinson
David Robinson
Elea McDonnell Feit (wrote a great book on R for marketing research & analytics)
Tim Wilson

Conclusion

Here’s my sister’s summary of her R learning journey:

“Learning R has been rather easy so far – in terms of learning the functions and how to use them. When actually applying what is learned to data sets and analyzing them independently, I can only guess will be much more difficult. From what I’ve gathered in the four hour course, plus some practice, is that learning R isn’t difficult in itself, but it’s more about the problems you’re working on.”

Couldn’t have said it better myself! Learning R isn’t that difficult, it’s only a tool to apply to problems you’re trying to solve. So get learning!

The post How to Learn R (for Marketers and Business Folks) appeared first on Alex Birkett.

What is A/B Testing? An Advanced Guide + 29 Guidelines

Alex Birkett — Mon, 12 Nov 2018 15:33:19 +0000

A/B testing (aka split testing or online controlled experiments) is hard. It’s sometimes billed as a magic tool that spits out a decisive answer. It’s not. It’s a randomized controlled trial, albeit online and with website visitors or users, and it’s reliant upon proper statistical practices.

At the same time, I don’t think we should hold the standards so high that you need a data scientist to design and analyze every single experiment. We should democratize the practice to the most sensible extent, but we should create logical guardrails so the experiments that are run are run well.

The best way to do that I can think of is with education and a checklist. If it works for doctors, I think we can put it to use, too.

So this article is two things: a high level checklist you can use on a per test basis (you can get a Google Docs checklist here), and a comprehensive guide that explains each checklist item in detail. It’s a choose your own adventure. You can read it all (including outbound links), or just the highlights.

Also, don’t expect it to be completely extensive or cover every fringe case. I want this checklist to be usable by people at all levels of experimentation, and at any type of company (ecommerce, SaaS, lead generation, whatever). As such, I’ll break it into three parts:

The Basics – don’t run experiments if you don’t follow these guidelines. If you follow these, ~80% of your experiments should be properly run.
Intermediate Topics – slightly more esoteric concepts, but still largely useful for anyone running tests consistently. This should help reduce errors in ~90% of experiments you run.
Advanced Topics – won’t matter for most people, but will help you decide on fringe cases and more advanced testing use cases. This should bring you up to ~95-98% error reduction rate in running your tests.

I’ll also break this up into simple heuristics and longer descriptions. Depending on your level of nerdiness or laziness, you can choose your own adventure:

Basics
- Basics with Explanations
Intermediate
- Intermediate with Explanations
Advanced
- Advanced with Explanations

The frustrating part about making a guide or a checklist like this is there is so much nuance. I’m hyper aware that this will never be complete, so I’m setting the goal to be useful. To be useful means it can’t run on for the length of a textbook, though it almost does at ~6000 words.

(In the case that you want to read a textbook, read this one).

I’m not reinventing the wheel here. I’m basically compiling this from my own experiences, my mentors, papers from Microsoft, Netflix, Amazon, Booking.com and Airbnb, and other assorted sources (all listed at the end).

What is A/B Testing?

A/B testing is a controlled experiment (typically online) where two or more different versions of a page or experience are delivered randomly to different segments of visitors. Imagine a homepage where you’ve got an image slider above the fold, and then you want to try a new version instead showing a product image and product description next to a web form. You could run a split test, measure user behavior, and get the answer as to which is optimal:

Image Source

" data-medium-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/What-is-AB-Testing-1024x731-2.jpg?fit=300%2C214&ssl=1" data-large-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/What-is-AB-Testing-1024x731-2.jpg?fit=840%2C600&ssl=1" class="size-large wp-image-993" src="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/What-is-AB-Testing-1024x731-2-1024x731.jpg?resize=840%2C600&ssl=1" alt="" width="840" height="600" srcset="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/What-is-AB-Testing-1024x731-2.jpg?w=1024&ssl=1 1024w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/What-is-AB-Testing-1024x731-2.jpg?resize=300%2C214&ssl=1 300w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/What-is-AB-Testing-1024x731-2.jpg?resize=768%2C548&ssl=1 768w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

Statistical analysis is then performed to infer the performance of the new variants (the new experience or experiences, version B/C/D, etc.) in relation to the control (the original experience, or version A).

A/B tests are performed commonly in many industries including ecommerce, publications, and SaaS. In addition to running experiments on a web page, you can set up A/B tests on a variety of channels and mediums, including Facebook ads, Google ads, email newsletter workflows, email subject line copy, marketing campaigns, product features, sales scripts, etc. – the limit is really your imagination.

Experimentation typically falls under one of several roles or titles, which vary by industry and company. For example, A/B testing is strongly associated with CRO (conversion optimization or conversion rate optimization) as well as product management, though marketing managers, email marketers, user experience specialists, performance marketers, and data scientists or analysts may also run A/B tests.

The Basics: 10 Rules of A/B Testing

Decide, up front, what the goal of your test is and what metric matters to you (the Overall Evaluation Criterion).
Plan upfront what action you plan on taking in the event of a winning, losing, or inconclusive result.
Base your test on a reasonable hypothesis.
Determine specifically which audience you’ll be targeting with this test.
Estimate your minimum detectable effect, required sample size, statistical power, and how long your test will be required to run before you start running the test.
Run the test for full business cycles, accounting for naturally occurring data cycles.
Run the test for the full time period you had planned, and only then determine the statistical significance of the test (normally, as a rule of thumb, accepting a p value of <.05 as “statistically significant”).
Unless you’re correcting for multiple comparisons, stick to running one variant against the control (in general, keep it simple), and using a simple test of proportions, such as Chi Square or Z Test, to determine the statistical significance of your test.
Be skeptical about numbers that look too good to be true (see: Twyman’s Law)
Don’t shut off a variant mid test or shift traffic allocation mid test

The Basics of A/B Testing: Explained

1. Decide Your Overall Evaluation Criterion Up Front

Where you set your sights is generally where you end up. We all know the value of goal setting. Turns out, it’s even more important in experimentation.

Even if you think you’re a rational, objective person, we all want to win and to bring results. Whether intentional or not, sometimes we bring results by cherry picking the data.

Here’s an example (a real one, from the wild). Buffer wants to A/B test their Tweets. They launch two of ‘em out:

Can you tell which one the winner was?

Without reading their blog post, I genuinely could not tell you which one performed better. Why? I have no idea what metric they’re looking to move. On Tweet two, clicks went down but everything else went up. If clicks to the website is the goal, Tweet one is the winner. If retweets, tweet number two wins.

So, before you ever set a test live, choose your overall evaluation criterion (or North Star metric, whatever you want to call it), or I swear to you, you’ll start hedging and justifying that “hey, but click through rate/engagement/time on site/whatever increase on the variation. I think that’s a sign we should set it live.” It will happen. Be objective in your criterion.

(Side note, I’ve smack talked this A/B test case study many times, and there are many more problems with it than just the lack of a single metric that matters, including not controlling for several confounding variables – like time – or using proper statistics to analyze it.)

Make sure, then, that you’re properly logging your experiment data, including number of visitors and their bucketing, your conversion goals, and any behavior necessary to track in the conversion funnel.

2. Plan Your Proposed Action Per Test Result

What do you hope to do if your test wins? Usually this is a pretty easy answer (roll it out live, of course).

But what do you plan to do if your test loses? Or even murkier, what if it’s inconclusive?

I realize this sounds simple on paper. You might be thinking, “move onto the next test.” Or “try out a different variation of the same hypothesis.” Or “test on a larger segment of our audience to get the necessary data.”

That’s the point, there are many decisions you could make that affect your testing process as a whole. It’s not as simple as “roll it out live” or “don’t roll it out live.”

Say your test is trending positive but not quite significant at a p value of < .05. You actually do see a significant lift, though, in a micro-conversion, like click through rate. What do you do?

It’s not my place to tell you what to do. But you should state your planned actions up front so you don’t run into the myriad of cognitive biases that we humans have to deal with.

3. Base your test on a reasonable hypothesis

What is a hypothesis, anyway?

It’s not a guess as to what will happen in your A/B test. It’s not a prediction. It’s one big component of ye old Scientific Method.

A good hypothesis is “a statement about what you believe to be true today.” It should be falsifiable, and it should have a reason behind it.

This is the best article I’ve read on experiment hypotheses: https://medium.com/@talraviv/thats-not-a-hypothesis-25666b01d5b4

I look at developing a hypothesis as a process of being clear in my thinking and approach to the science of A/B testing. It slows me down, and it makes me think “what are we doing here?” As the article above states, not every hypothesis needs to be based on mounds of data. It quotes Feynman: “It is not unscientific to take a guess, although many people who are not in science believe that it is.”

I do believe any mature testing program will require the proper use of hypotheses. Andrew Anderson has a different take, and a super valid one, about the misuse of hypotheses in the testing industry. I largely agree with his take, and I think it’s mostly based on the fact that most people are using the term “hypothesis” incorrectly.

4. Determine specifically which audience you’ll be targeting with this test

This is relatively quick and easy to understand. Which population would you like to test on – desktop, mobile, PPC audience #12, users vs. non-users, customer who read our FAQ page, a specific sequence of web pages etc. – and how can you take measures to exclude the data of those who don’t apply to that category?

It’s relatively easy to do this, at least for broad technological categorizations like device category, using common A/B testing platforms.

Point is this: you want to learn about a specific audience, and the less you pollute that sample, the cleaner your answers will be.

5. Estimate your MDE, sample size, statistical power, and how long your test will run before you run it

Most of the work in A/B testing comes before you ever set the test live. Once it’s live, it’s easy! Analyzing the test after the fact is especially easier if you’ve done the hard and prudent work up front.

What do you need to plan? The feasibility of your test in terms of traffic and time length, what minimum detectable effect you’d need to see to discern an uplift, and the sample size you’ll need to reach to consider analyzing your test.

It sounds like a lot, but you can do all of this with the help of an online calculator.

I actually like to use a spreadsheet that I found on the Optimizely knowledge base (here’s a link to the spreadsheet as well). It visually shows you how long you’d have to run a test to see a specific effect size, depending on the amount of traffic you have to the page and the baseline conversion rate.

You can also use Evan Miller’s Awesome A/B testing tools. Or, CXL has a bunch of them as well. Search Discovery also has a calculator with great visualizations.

6. Run the test for full business cycles, accounting for naturally occurring data cycles

One of the first and common mistakes everyone makes when they start A/B testing is calling a test when it “reaches significance.” This, in part, must be because in our daily lives, the term “significance” means “of importance” so it sounds final and deterministic.

Statistical significance (or the confidence level) is just an output of some simple math that tells you how unlikely a result is given the assumption that both variants are the same.

Huh?

We’ll talk about p-values later, but for now, let’s talk about business cycles and how days of the week can differ.

Image Source

" data-medium-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/dayoftheweek-1-1.jpg?fit=300%2C174&ssl=1" data-large-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/dayoftheweek-1-1.jpg?fit=713%2C414&ssl=1" class="size-full wp-image-616" src="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/dayoftheweek-1-1.jpg?resize=713%2C414&ssl=1" alt="" width="713" height="414" srcset="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/dayoftheweek-1-1.jpg?w=713&ssl=1 713w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/dayoftheweek-1-1.jpg?resize=300%2C174&ssl=1 300w" sizes="(max-width: 713px) 100vw, 713px" data-recalc-dims="1" />

Image Source

The days of the week tend to differ quite a bit. Our goal in A/B testing is to get a representative sample of our population, which general involves collecting enough data that we smooth out for any jagged edges, like a super Saturday where conversion rates tank and maybe the website behavior is different.

Website data tends to be non-stationary (as in, they change over time) or sinusoidal – or rather, it looks like this:

Image Source

" data-medium-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/1024px-Sinusoidal_temperatures.svg-1024x588-1.png?fit=300%2C172&ssl=1" data-large-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/1024px-Sinusoidal_temperatures.svg-1024x588-1.png?fit=840%2C482&ssl=1" class="size-large wp-image-617" src="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/1024px-Sinusoidal_temperatures.svg-1024x588-1-1024x588.png?resize=840%2C482&ssl=1" alt="" width="840" height="482" srcset="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/1024px-Sinusoidal_temperatures.svg-1024x588-1.png?w=1024&ssl=1 1024w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/1024px-Sinusoidal_temperatures.svg-1024x588-1.png?resize=300%2C172&ssl=1 300w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/1024px-Sinusoidal_temperatures.svg-1024x588-1.png?resize=768%2C441&ssl=1 768w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

While we can’t reduce the noise to zero, we can run our tests for full weeks and business cycles to try to smooth things out as much as possible.

7. Run the test for the full time period you had planned

Back to those pesky p-values. As it turns out, an A/B test can dip below a .05 p-value (the commonly used rule to determine statistical significance) at many points during the test, and at the end of it all, sometimes it can turn out inconclusive. That’s just the nature of the game.

Image Source

" data-medium-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/LdZNk-1-1.jpg?fit=300%2C155&ssl=1" data-large-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/LdZNk-1-1.jpg?fit=840%2C435&ssl=1" class="size-large wp-image-618" src="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/LdZNk-1-1.jpg?resize=840%2C435&ssl=1" alt="" width="840" height="435" srcset="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/LdZNk-1-1.jpg?resize=1024%2C530&ssl=1 1024w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/LdZNk-1-1.jpg?resize=300%2C155&ssl=1 300w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/LdZNk-1-1.jpg?resize=768%2C398&ssl=1 768w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/LdZNk-1-1.jpg?w=1064&ssl=1 1064w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

Anyone in the CRO space will tell you that the single most common mistake people make when running A/B tests is ending the test too early. It’s the ‘peaking’ problem. You see that the test has “hit significance,” so you stop the test, celebrate, and launch the next one. Problem? It may not have been a valid test.

The best post written about this topic, aptly titled, is Evan Miller’s “How Not To Run An A/B Test.” He walks through some excellent examples to illustrate the danger with this type of peaking.

Essentially, if you’re running a controlled experiment, you’re generally setting a fixed time horizon at which you view the data and make your decision. When you peak before that time horizon, you’re introducing more points at which you can make an erroneous decision and the risk of a false positive goes wayyy up.

Image Source

" data-medium-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/peeking-ab-fig-2-1.jpeg?fit=300%2C260&ssl=1" data-large-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/peeking-ab-fig-2-1.jpeg?fit=480%2C416&ssl=1" class="size-full wp-image-619" src="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/peeking-ab-fig-2-1.jpeg?resize=480%2C416&ssl=1" alt="" width="480" height="416" srcset="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/peeking-ab-fig-2-1.jpeg?w=480&ssl=1 480w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/peeking-ab-fig-2-1.jpeg?resize=300%2C260&ssl=1 300w" sizes="(max-width: 480px) 100vw, 480px" data-recalc-dims="1" />

Image Source

8. Stick to testing only one variant (unless you’re correcting for it…)

Here we’ll introduce an advanced topic: the multiple comparisons problem.

When you test several variants, you run into a problem known as “cumulative alpha error.” Basically, with each variant, sans statistical corrections, you risk a higher and higher probability of seeing a false positive. KonversionsKraft made a sweet visualization to illustrate this:

This looks scary, but here’s the thing: almost every major A/B testing tool has some built in mechanism to correct for multiple comparisons. Even if your testing tool doesn’t, or if you use a home-brew testing solution, you can correct for it yourself very simply using one of many methods:

However, if you’re not a nerd and you just want to test some shit and maybe see some wins, start small. Just one v one.

When you do feel more comfortable with experimentation, you can and should look into expanding into A/B/n tests with multiple variants.

This is a core component of Andrew Anderson’s Discipline Based Testing Methodology, and if I can, I’ll wager to say it’s because it increases the beta of the options, or the differences between each one of the experiences you test. This, at heart, decreases your reliance on hard opinions or preconceived ideas about “what works” and opens you up to trying things you may not of in a simple A/B test.

But start slowly, keep things simple.

9. Be skeptical about numbers that look too good to be true

If there’s one thing CRO has done to my personality, it’s heightened my level of skepticism. If anything looks too good to be true, I assume something went wrong. Actually, most of the time, I’m poking at prodding at things, seeing where they may have been broken or setup incorrectly. It’s an exhausting mentality, but one that is necessary when dealing with so many decisions.

Ever see those case studies that proclaim a call to action button color change on a web page led to a 100%+ increase in conversion rate? Almost certainly bullshit. If you see something like this, even if you just get a small itch where you think, “hmm, that seems…interesting,” go after it. Also second guess data, and triple guess yourself.

As the analytics legend Chris Mercer says, “trust but verify.”

And read about Twyman’s Law here.

10. Don’t shut off a variant mid test or shift traffic allocation mid test

I guess this is sort of related to two previous rules here: run your test for the full length and start by only testing one variant against the control.

If you’re testing multiple variants, don’t shut off a variant because it looks like it’s losing and don’t shift traffic allocation. Otherwise, you may risk Simpson’s Paradox.

Intermediate A/B Testing Issues: A Whole Lot More You Should Maybe Worry About

Control for external validity factors and confounding variables
Pay attention to confidence intervals as well as p-values
Determine whether your test is a Do No Harm or a Go For It test, and set it up appropriately.
Consider which type of test you should run for which problem you’re trying to solve or answer you’re trying to find (sequential, one tail vs two tail, bandit, MVT, etc)
QA and control for “flicker effect”
Realize that the underlying statistics are different for non-binomial metrics (revenue per visitor, average order value, etc.) – use something like the Mann-Whitney U-Test or robust statistics instead.
Trigger the test only for those users affected by the proposed change (lower base rates lead to greater noise and underpowered tests)
Perform an A/A test to gauge variance and the precision of your testing tool
Correct for multiple comparisons
Avoid multiple concurrent experiments and make use of experiment “swim lanes”
Don’t project precise uplifts onto your future expectations from those you see during an experiment.
If you plan on implementing the new variation in the case of an inconclusive test, make sure you’re running a two-tailed hypothesis test to account for the possibility that the variant is actually worse than the original.
When attempting to improve a “micro-conversion” such as click through rate, make sure it has a downstream effect and acts as a causal component to the business metric you care about. Otherwise, you’re just shuffling papers.
Use a hold-back set to calculate the estimated ROI and performance of your testing program

Intermediate A/B Testing Issues: Explained

1. Control for external validity factors and confounding variables

Well, you know how to calculate statistical significance, and you know exactly why you should run your test for full business cycles in order to capture a representative sample.

This, in most cases, will reduce the chance that your test will be messed up. However, there are plenty more validity factors to worry about, particularly those outside of your control.

Anything that reduces the representativeness or randomness of your experiment sample can be considered a validity factor. In that regard, some common ones are:

Bot traffic/bugs
Flicker effect
PR spikes
Holidays and external events
Competitor promotions
Buggy measurement setup
Cross device tracking
The weather

I realize this tip is frustrating, because the list of potential validity threats is expansive, and possibly endless.

However, understand: A/B testing always involves risks. All you need to do is understand that and try to document as many potential threats as possible.

You know how in an academic paper, they have a section on limitations and discussion? Basically, you should do that with your tests as well. It’s impossible to isolate every single external factor that could affect behavior, but you can and should identify clearly impactful things.

For instance, if you raised a round of capital and you’re on the front page of TechCrunch and Hacker News, maybe that traffic isn’t exactly representative? Might be a good time to pause your experiments (or exclude that traffic from your analysis).

2. Pay Attention to Confidence Intervals as Well as P-Values

While it’s common knowledge among experimenters that one should strive to call a test “significant” if the p-value is below .05. This, while technically arbitrary, ensures we have a certain level of risk in our decision making and it never rises above an uncomfortable point. We’re sort of saying, 5% of experiments may show results purely due to chance, but we’re okay with that, in the long run.

Many people, however, fail to understand or use confidence intervals in decision making.

What’s a confidence interval in relation to A/B testing?

Confidence intervals are the amount of error allowed in A/B testing – the measure of the reliability of an estimate. Here’s an example outlined by PRWD:

Image Source

" data-medium-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/optimizely_360-1-1-1.jpg?fit=300%2C103&ssl=1" data-large-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/optimizely_360-1-1-1.jpg?fit=360%2C124&ssl=1" class="size-full wp-image-624" src="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/optimizely_360-1-1-1.jpg?resize=360%2C124&ssl=1" alt="" width="360" height="124" srcset="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/optimizely_360-1-1-1.jpg?w=360&ssl=1 360w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/optimizely_360-1-1-1.jpg?resize=300%2C103&ssl=1 300w" sizes="(max-width: 360px) 100vw, 360px" data-recalc-dims="1" />

Image Source

Basically, if your results, including confidence intervals, overlap at all, then you may be less confident that you have a true winner.

John Quarto-vonTivadar has a great visual explaining this:

Image Source

" data-medium-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/example-AB-test-confidence-300x147-1-1.jpg?fit=300%2C147&ssl=1" data-large-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/example-AB-test-confidence-300x147-1-1.jpg?fit=300%2C147&ssl=1" class="size-full wp-image-625" src="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/example-AB-test-confidence-300x147-1-1.jpg?resize=300%2C147&ssl=1" alt="" width="300" height="147" data-recalc-dims="1" />

Image Source

Of course, the greater your sample size, the lower the margin of error becomes in an A/B test. As is usually the case with experimentation, high traffic is a luxury and really helps us make clearer decisions.

3. Determine whether your test is a Do No Harm or a Go For It test, and set it up appropriately.

As you run more and more experiments, you’ll find yourself less focused on an individual test and more on the system as a whole. When this shift happens, you begin to think more in terms or risk, resources, and upside, and less in terms of how much you want your new call to action button color to win.

A fantastic framework to consider comes from Matt Gershoff. Basically, you can bucket your test into two categories:

Do No Harm
Go For It

In a Do No Harm test, you care about the potential downside and you need to mitigate it or avoid it. In a Go For It test, we have no additional cost to making a Type 1 error (false positive), so there is no direct cost invoked when making a given decision.

In the article, Gershoff gives headline optimization as an example:

“Each news article is, by definition, novel, as are the associated headlines.

Assuming that one has already decided to run headline optimization (which is itself a ‘Do No Harm’ question), there is no added cost, or risk to selecting one or the other headlines when there is no real difference in the conversion metric between them. The objective of this type of problem is to maximize the chance of finding the best option, if there is one. If there isn’t one, then there is no cost or risk to just randomly select between them (since they perform equally as well and have the same cost to deploy). As it turns out, Go For It problems are also good candidates for Bandit methods.”

Highly suggested that you read his full article here.

4. Consider which type of test you should run for which problem you’re trying to solve or answer you’re trying to find (sequential, one tail vs two tail, bandit, MVT, etc)

The A/B test is sort of the gold standard when it comes to online optimization. It’s the clearest way to infer a difference between a given element or experience. Though there are other methods to learning about your users.

Two in particular that are worth talking about:

Multivariate testing
Bandit tests (or other algorithmic optimization)

Multivariate experiments are wonderful for testing multiple micro-components (e.g. a headline change, CTA change, and background color change) and determining their interaction effects. You find which elements work optimally with each other, instead of a grand and macro-level lift without context as to which micro-elements are impactful.

Image Source

" data-medium-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/mvt_table-1.png?fit=300%2C283&ssl=1" data-large-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/mvt_table-1.png?fit=380%2C358&ssl=1" class="size-full wp-image-627" src="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/mvt_table-1.png?resize=380%2C358&ssl=1" alt="" width="380" height="358" srcset="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/mvt_table-1.png?w=380&ssl=1 380w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/mvt_table-1.png?resize=300%2C283&ssl=1 300w" sizes="(max-width: 380px) 100vw, 380px" data-recalc-dims="1" />

Image Source

In my anecdotal experience, I’d say good testing programs usually run one or two multivariate tests for every 10 experiments run (the rest being A/B/n).

Bandit tests are a different story, as they are algorithmic. The hope is that the minimize “regret” or the amount of time you’re exposing your audience to a suboptimal experience. So it updates in real time to show the winning variant to more and more people over time.

Image Source

" data-medium-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/Screen-Shot-2015-09-04-at-4.24.17-PM-1-568x424-1.jpg?fit=300%2C224&ssl=1" data-large-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/Screen-Shot-2015-09-04-at-4.24.17-PM-1-568x424-1.jpg?fit=568%2C424&ssl=1" class="size-full wp-image-628" src="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/Screen-Shot-2015-09-04-at-4.24.17-PM-1-568x424-1.jpg?resize=568%2C424&ssl=1" alt="" width="568" height="424" srcset="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/Screen-Shot-2015-09-04-at-4.24.17-PM-1-568x424-1.jpg?w=568&ssl=1 568w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/Screen-Shot-2015-09-04-at-4.24.17-PM-1-568x424-1.jpg?resize=300%2C224&ssl=1 300w" sizes="(max-width: 568px) 100vw, 568px" data-recalc-dims="1" />

Image Source

In this way, it sort of “automates” the a/b testing process. But bandits aren’t always the best option. They sway with new data, so there are contextual problems associated with say, running a bandit test on an email campaign.

However, bandit tests tend to be very useful in a few key circumstances:

Headlines and Short-Term Campaigns (e.g. during holidays or short term, perishable campaigns)
Automation for Scale (e.g. when you have tons and tons of tests you’d like to run on thousands of templatized landing pages)
Targeting (we’ll talk about predictive targeting in “advanced” stuff)
Blending Optimization with Attribution (i.e. testing, while at the same time, determining which rules and touch points contribute to the overall experience and goals).

5. QA and control for “flicker effect”

Flicker effect is a very special type of A/B test validity threat. It’s basically when your testing tool causes a slight delay on the experiment variation, briefly flashing the original content before serving the variation.

There are tons of ways to reduce flicker effect that I won’t go into here (read this article instead). A broader point is simply that you should “measure twice, cut once,” and QA your test on all major devices and categories before serving it live. Better to be prudent and get it right than to fuck up your test data and waste all the effort.

6. Realize that the underlying statistics are different for non-binomial metrics (revenue per visitor, average order value, etc.) – use something like the Mann-Whitney U-Test instead of a Z test.

When you run an A/B test with the intent to increase revenue per visitor or average order value, you can’t just plug your numbers into the same statistical significance calculator as you would with conversion rate tests.

Essentially, you’re looking at a different underlying distribution of your data. Instead of a binomial distribution (did convert vs. didn’t convert), you’re looking at a variety of order sizes, and that introduces the concept of outliers and variance into your calculations. It’s often the case that you’ll have a distribution affected by a very small amount of bulk purchasers, who skew a distribution to the right:

Image Source

" data-medium-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/16-1.png?fit=300%2C175&ssl=1" data-large-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/16-1.png?fit=840%2C490&ssl=1" class="size-full wp-image-634" src="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/16-1.png?resize=840%2C490&ssl=1" alt="" width="840" height="490" srcset="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/16-1.png?w=913&ssl=1 913w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/16-1.png?resize=300%2C175&ssl=1 300w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/16-1.png?resize=768%2C448&ssl=1 768w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

In these cases, you’ll want to use statistical test that does not make the assumption of a normal distribution, such as Mann-Whitney U-Test.

7. Trigger the test only for those users affected by the proposed change (lower base rates lead to greater noise and underpowered tests)

Only those affected by the test should be bucketed and included for analysis. For example, if you’re running a test on a landing page, where a modal pops up after scrolling 50%, you’d only want to include those who scroll 50% in the test (those who don’t would never have been the audience intended for the new experience anyway).

The mathematical reasoning for this is that filtering out unaffected users can improve the sensitivity (statistical power) of the test, reducing noise and making it easier for you to find effects/uplifts.

Most of the time, this is a fairly simple solution involving triggering an event at the moment where you’re looking to start analysis (at 50% scroll depth in the above example).

8. Perform an A/A test to gauge variance and the precision of your testing tool

While there’s a constant debate as to whether A/A tests are important or not, it sort of depends on your scale and what you hope to learn.

The purpose of an A/A test – testing the original vs the original – is mainly to establish trust in your testing platform. Basically, you’d expect to see statistically significant results – despite the variants being the same – about 5% of the time with a p-value of < .05.

In reality, A/A tests often open up and introduce you to implementation errors like software bugs. If you truly operate at high scale and run many experiments, trust in your platform is pivotal. An A/A test can help provide some clarity here.

This is a big topic. Ronny Kohavi wrote a great paper on it, which you can find here.

9. Correct for multiple comparisons whenever applicable

We’ve talked a bit of about the multiple comparisons problem, and how, when you’re just starting out, it’s best to just run simple A/B test. But you’re eventually going to get curious, and you’ll eventually want to run a test with multiple variants, say an A/B/C/D/E test. This is good, and you can often get more consistent results from your program when you test a greater variety of options. However, you do want to correct for multiple comparisons when doing this.

It’s fairly simple mathematically. Just use Dunnett’s test or the Sidak correction.

You also need to keep this multiple comparisons problem in mind when you do post-test analysis on segments. Basically, if you look at enough segments, you’ll find a statistically significant result. The same principle applies (you’re increasing the risk of a false positive with every new comparison).

When I do post-test segmentation, I often use it more as a tool to find research questions than to find answers and insights to based decisions on. So if I find a “significant” lift in a given segment, say Internet Explorer visitors in Canada, I note that as an insight that may or may not be worth testing. I don’t just implement a personalization rule, as doing that each time would certainly lead to organizational complexity, and would probably result in many false positives.

10. Avoid multiple concurrent experiments and make use of experiment “swim lanes”

Another problem that comes with scale is running multiple concurrent experiments. Basically, if you run two tests, and they’re being run on the same sample, you may have interaction effects that ruin the validity of the experiment.

Best case scenario: you (or your testing tool) creates technical swim lanes where a group can only be exposed to one experiment at a time. It prevents, automatically, this sort of cross-pollination, and reduces sample pollution.

A scrappier solution, one more fit for those running fewer tests, is to run your proposed experiments through a central team who gives the green-light and can see, at a high level, where there may be interaction effects, and avoid them.

11. Don’t project precise uplifts onto your future expectations from those you see during an experiment.

So, you got a 10% lift at 95% statistical significance. That means you get to celebrate that win in your next meeting. You do want to state the business value of an experiment like this, of course – what’s a 10% relative lift mean in isolation – so you also include a projection of what this 10% lift means for the business. “We can expect this to bring us 1,314 extra subscriptions per month,” you say.

While I love the idea of tying things back to the business, you want to tread lightly in matters of pure certainly, particularly when you’re dealing with projections.

An A/B test, despite misconceptions, can only truly tell you the difference between variants during the time we’re running the experiment. We do hope that differences between variants expand past the duration of the test itself, which is why we go through so much trouble in our experiment design to make sure we’re randomizing our sample and testing on a representative sample.

But a 10% lift during the test does not mean you’ll see a 10% lift during the next few months.

If you do absolutely need to project some sort of expected business results, at least do so using confidence intervals or a margin of error.

“We can expect, given the limitations of our test, to see X more subscriptions on the low side, and on the high side, we may see as many as Y more subscriptions, but there’s a level of uncertainty involved in making these projections. Regardless, we’re confidence our result is positive and will result in an uptick in subscriptions.”

Nuance may be boring and disappointing, but expectation setting is cool.

12. If you plan on implementing the new variation in the case of an inconclusive test, make sure you’re running a two-tailed hypothesis test to account for the possibility that the variant is actually worse than the original.

One-tail vs. two-tail a/b testing. This can seem like a somewhat pedantic debate in many cases, but if you’re running an A/B test where you expect to roll out the variant even if the test is inconclusive, you will want to protect your downside with a two-sided hypothesis test.

Read more on the difference between one-tail and two-tail A/B tests here.

13. When attempting to improve a “micro-conversion” such as click through rate, make sure it has a downstream effect and acts as a causal component to the business metric you care about. Otherwise, you’re just shuffling papers.

Normally, you should choose a metric that matters to your business. The conversion rate, revenue per visitors, activation rate, etc.

Sometimes, however, that’s not possible or feasible, so you work on moving a “micro-conversion” like click through rate or improving the number of people who use a search function. Often, these micro-conversions are correlative metrics, meaning they tend to associate with your important business metric, but aren’t necessarily causal.

Increased CTR might not increase your bottom line (Image Source)

" data-medium-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/20577069f99e607460563dc0563edaa3.jpg?fit=300%2C300&ssl=1" data-large-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/20577069f99e607460563dc0563edaa3.jpg?fit=640%2C640&ssl=1" class="size-full wp-image-636" src="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/20577069f99e607460563dc0563edaa3.jpg?resize=640%2C640&ssl=1" alt="" width="640" height="640" srcset="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/20577069f99e607460563dc0563edaa3.jpg?w=640&ssl=1 640w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/20577069f99e607460563dc0563edaa3.jpg?resize=150%2C150&ssl=1 150w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/20577069f99e607460563dc0563edaa3.jpg?resize=300%2C300&ssl=1 300w" sizes="(max-width: 640px) 100vw, 640px" data-recalc-dims="1" />

Increased CTR might not increase your bottom line (Image Source)

A good example is if you find a piece of data that says people who use your search bar purchase more often and at higher volumes than those who don’t. So, you run a test that tries to increase the amount of people using that search feature.

This is fine, but make sure, when you’re analyzing the data, that your important business metric moves. So you increased people who use the search feature – does that also increase purchase conversion rate and revenue? If not, you’re shuffling papers.

14. Use a hold-back set to calculate the estimated ROI and performance of your testing program

Want to know the ROI of your program? Some top programs make use of a “holdback set” – keeping a small subset of your audience on the original version of your experience. This is actually crucial when analyzing the merits of personalization/targeting rules and machine learning-based optimization systems, but it’s also valuable for optimization programs overall.

A universal holdback – keeping say 5% of traffic as a constant control group – is just one way to try to parse out your program’s ROI. You can also do:

Victory Lap – Occasionally, run a split test combining all winning variants over the last 3 months against a control experience to confirm the additive uplift of those individual experiments.
Re-tests – Re-test individual, winning tests after 6 months to confirm that “control” still underperforms (and the rate at which it does).

If you’re only running a test or two per month, these system-level decisions may be less important. But if you’re running thousands of tests, it’s important to start learning about program effectiveness as well as the potential “perishability” or decay of any given test result.

Here are a bunch of other ways to analyze the ROI of a program (just don’t use a simple time period comparison, please).

Advanced A/B Testing Issues – Mostly Fringe Cases That Some Should Still Consider

Look out for sample ratio mismatch.
Consider the case for a non-inferiority test when you only want to mitigate potential downsides on a proposed change
Use predictive targeting to exploit segments who respond favorably to an experience.
Use a futility boundary to mitigate regret during a test
When a controlled experiment isn’t possible, estimate significance using a bayesian causal model

Advanced A/B Testing Issues: Explained

1. Look out for sample ratio mismatch.

Sample Ratio Mismatch is a special type of validity threat. In an A/B test with two variants, you’d hope that your traffic would be randomly and evenly allocated among both variants. However, in certain cases, we see that the ratio of traffic allocation is off more than would be natural. This is known as “sample ratio mismatch.”

This, however, is another topic I’m going to politely duck out of explaining, and instead, link to the master, Ronny Kohavi, and his work.

He also has a handy calculator so you can see if your test is experiencing a bug like this.

2. Consider the case for a non-inferiority test when you only want to mitigate potential downsides on a proposed change

Want to run a test solely to mitigate risk and avoid implementing a suboptimal experience? You could try out a “non-inferiority” test (as opposed to the normal “superiority” test) in the case of easy decision tests and tests with side benefits outside of measurement capability (e.g. brand cohesiveness).

This is complicated topic, so I’ll link out to a post here.

3. Use predictive targeting to exploit segments who respond favorably to an experience.

A/B testing is cool, as is personalization. But after a while, your organization may be operating at such as scale that it isn’t feasible to manage, let alone choose, targeting rules for all those segments you’re hoping to reach. This is a great use case for machine learning.

Solutions like Conductrics have powerful predictive targeting engines that can find and target segments who respond better to given experience than the average user. So Conductrics (or another solution) may find that rural visitors using smartphones convert better with Variant C. You can weigh the ROI of setting up that targeting rule and do so, managing it programmatically.

Image Source

" data-medium-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/ConductricsMLTree-1.png?fit=300%2C109&ssl=1" data-large-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/ConductricsMLTree-1.png?fit=840%2C306&ssl=1" class="size-large wp-image-637" src="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/ConductricsMLTree-1.png?resize=840%2C306&ssl=1" alt="" width="840" height="306" srcset="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/ConductricsMLTree-1.png?resize=1024%2C373&ssl=1 1024w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/ConductricsMLTree-1.png?resize=300%2C109&ssl=1 300w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/ConductricsMLTree-1.png?resize=768%2C280&ssl=1 768w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/ConductricsMLTree-1.png?w=1600&ssl=1 1600w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

4. Use a futility boundary to mitigate regret during a test

This is basically a testing methodology to improve efficiency and allow you to stop A/B tests earlier. I’m not going to pretend I fully grok this one or have used it, but here’s a guide if you’d like to give it a try. This is something I’m going to look into trying out in the near future.

5. When a controlled experiment isn’t possible, estimate significance using a bayesian causal model

Often, when you’re running experiments, particularly those that are not simple website changes like landing page CTAs, you may not be able to run a fully controlled experiments. I’m thinking of things like SEO changes, campaigns you’re running, etc.

In these cases, I usually try to estimate how impactful my efforts were using a tool like GA Effect.

It appears my SEO efforts have paid off marginally

" data-medium-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/Screen-Shot-2018-11-08-at-2.38.34-PM-1.png?fit=300%2C158&ssl=1" data-large-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/Screen-Shot-2018-11-08-at-2.38.34-PM-1.png?fit=840%2C443&ssl=1" class="size-large wp-image-638" src="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/Screen-Shot-2018-11-08-at-2.38.34-PM-1.png?resize=840%2C443&ssl=1" alt="" width="840" height="443" srcset="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/Screen-Shot-2018-11-08-at-2.38.34-PM-1.png?resize=1024%2C540&ssl=1 1024w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/Screen-Shot-2018-11-08-at-2.38.34-PM-1.png?resize=300%2C158&ssl=1 300w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/Screen-Shot-2018-11-08-at-2.38.34-PM-1.png?resize=768%2C405&ssl=1 768w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/11/Screen-Shot-2018-11-08-at-2.38.34-PM-1.png?w=1600&ssl=1 1600w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

It appears my SEO efforts have paid off marginally

Conclusion

As I mentioned up front, by its very nature, A/B testing is a statistical process, and statistics deals with the realm of the uncertainty. Therefore, while rules and guidelines can help reduce errors, there is no decision tree that can result in the perfect, error-less testing program.

The best weapon you have is your own mind, inquisitive, critical, and curious. If you come across a fringe issue, discuss it with colleagues or Google it. There are tons of resources and smart people out there.

I’m not done learning about experimentation. I’ve barely cracked the surface. So I may reluctantly come to find out in a few years that this list is naive, or ill-suited for actual business needs. Who knows.

But that’s part of the point: A/B testing is difficult, worthwhile, and there’s always more to learn about it.

Key Sources:

Also, thanks to Erik Johnson, Ryan Farley, Joao Correia, Shanelle Mullin, and David Khim for reading this and adding suggestions before publication.

The post What is A/B Testing? An Advanced Guide + 29 Guidelines appeared first on Alex Birkett.

Content Optimization: How to Make Content Better

Alex Birkett — Fri, 23 Mar 2018 20:39:12 +0000

They say the third lever of content marketing growth is content optimization.

Content creation, content promotion, and content optimization.

Who’s they? Bloggers, speakers, thought leaders – you know the lot

Because of my background in conversion optimization, and just a general desire to improve and optimize things, content optimization is exciting to me.

Optimization implies an improved ROI, efficiency, scale, and a continuous and compound ROI over time (and at scale).

Content optimization means (presumably) that we can spend less time creating and distributing our work, and get more value from what we’re putting out there.

That’s the theory, anyway.

What doesn’t get talked about as much is how the hell one optimizes old content in the first place.

Well, it’s something I’ve thought a lot about and done even more of.

So that’s what this article will cover: how to look back at content you’ve already launched into the world and improve it to rank higher in the search engine or go further on social media or improve conversions, systematically and at scale.

My case is that content optimization strategy should be a core part of your marketing strategy, especially if you’ve published at scale already.

Content Optimization: Two Different Approaches

When looking back at old content, you can look at things two different ways (both valid and valuable):

Find high traffic but low converting posts and increase the conversion rate.
Find low traffic but high search volume/potential posts and increase search engine rankings or distribution.

The first method, in my opinion, is easier, at least from a prioritization standpoint.

You can very easily build out a model using your total traffic and your historical conversion rate metrics to calculate, with some degree of accuracy, how much value you can expect. This is basically a “what if?” analysis and I’ll walk you through how to build one out in a minute.

It’s also easier because we usually “set it and forget it” when it comes to conversion offers with content. With a little care and thought it’s usually pretty easy to optimize this part.

The second method (search engine rankings) usually has a higher ceiling in terms of how much value you can squeeze out of it. Terms like “great content” and “high-quality content” and even user experience are all somewhat subjective in search engine optimization, so it’s harder to know exactly how to update a piece and how much extra value you can get from it.

The difference between clicks on the first, second, third, and all the other SERP results is astounding, and if you can lift your rankings you can gain a lot of traffic. Similarly, even the top SEO company have tons of pages ranking from 5-20, and with a bit of effort, it’s always possible to lift those.

Image Source

" data-medium-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/click-distribution-2017-1.gif?fit=300%2C249&ssl=1" data-large-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/click-distribution-2017-1.gif?fit=782%2C650&ssl=1" class="size-full wp-image-356" src="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/click-distribution-2017-1.gif?resize=782%2C650&ssl=1" alt="" width="782" height="650" data-recalc-dims="1" />

Image Source

The first method is mostly going to involve strategic work.

You’ll run an analysis of top opportunities, calculate the upside, and then go through the process of optimizing the acquisition pathways on each page you deem worth it. That last step, the optimizing of the acquisition pathways, is a ton of hard work and takes a talented hand to do so. It takes creativity, empathy, skill (i.e. good marketing).

The second method also takes a lot of analysis work, but it’s generally a bit easier to understand how you can improve a page if you’ve got a decent understanding of SEO. It’s usually some combination of content quality, internal linking or site architecture, external linking work, or some low hanging fruit like H1/H2/title tag optimization.

I’ll walk through each of these things in depth, to the point that it may get strenuous to read this guide if you’re only interested in one of the methods. Realistically, you probably should focus on one of these at a time as each step will require a ton of trial and error will rarely be easy or clean in practice.

This post is like a darn book. To that end, here’s a table of contents to help you jump around as you please.

Option 1: CRO & Maximizing Conversions on High Traffic Content
Option 2: Lifting Traffic Where There’s Potential
Conclusion

CRO & Maximizing Conversions on High Traffic Content

If you’re doing content marketing or digital marketing at all, it’s very likely your content follows a power law: most of your traffic comes from a few posts. That’s the way it was at CXL, and it’s that way at HubSpot, too. Most content powerhouses deal with this type of distribution.

Image Source

" data-medium-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/content-marketing-power-law1-1.jpg?fit=300%2C189&ssl=1" data-large-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/content-marketing-power-law1-1.jpg?fit=550%2C346&ssl=1" class="size-full wp-image-357" src="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/content-marketing-power-law1-1.jpg?resize=550%2C346&ssl=1" alt="" width="550" height="346" srcset="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/content-marketing-power-law1-1.jpg?w=550&ssl=1 550w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/content-marketing-power-law1-1.jpg?resize=300%2C189&ssl=1 300w" sizes="(max-width: 550px) 100vw, 550px" data-recalc-dims="1" />

Image Source

The wrong way to look at this (as many “analysts” have) is that you should produce less content. That’s not a solution, that’s the table measuring the ruler.

You can’t guess what’s going to be a massive success ahead of time (though you can increase the probability with good strategy and execution).

No, the point of this is to say that you’re going to have some posts that have way more traffic than other posts. However — these posts will often have a much lower conversion rate than lower traffic pages.

This may or may not be true of your site, but I’ve seen this firsthand from every site I’ve worked with.

(There’s also a power law with the # of blog posts that deliver the highest percentage of leads/conversion usually as well. Sometimes there’s overlap between high traffic & high conversion blog posts, and that’s just magical).

The most common explanation is that the top post is so top-of-funnel that users aren’t converting as high on the same offers as on your bottom-of-funnel posts.

The second most common explanation is that you hit a high traffic topic that’s slightly outside your niche (if you sell commercial kitchen supplies to restaurants in Austin, an infographic on the top coffees in town may or may not be super relevant to conversion).

Image Source

" data-medium-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/download-2.jpeg?fit=265%2C190&ssl=1" data-large-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/download-2.jpeg?fit=265%2C190&ssl=1" class="size-full wp-image-358" src="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/download-2.jpeg?resize=265%2C190&ssl=1" alt="" width="265" height="190" data-recalc-dims="1" />

Image Source

In both cases, the fix is to align your offers on-page with your visitors’ intent and customer journey stage. You have to match the incoming temperature of your visitor – don’t offer them an ebook if they want a demo, and don’t offer them a dress suit if they just want a first time visitor discount (and maybe a tie).

And if you have no conversion points on your page, well, add one. Easy fix, there.

How to Find High Traffic/Low Conversion Pages

There are many analytics platforms. The most ubiquitous of the analytics tools is Google Analytics, so even though you can probably grab insights from HubSpot or Sumo or whatever lead capture tool you use, we’ll use GA here.

We’ll pull a quick report that will give us an approximation of the conversion rates of different posts. This assumes that you have goals set up for your “conversion,” which could be an email collection form or otherwise. This report also relies on “landing pages” as the variable, so we may be missing out on some nuance with people who view lots of blog posts or navigate your site from somewhere else, but then convert on a specific blog post.

Anyway, we want useful, not perfect.

Go to Behavior > Site Pages > Landing Pages. Then use the “comparison” view instead of the table view. Change the metric that you’re comparing from “Sessions” to “Goal Conversion Rate.” It should look like this:

You can also use a filter like “/blog/” or whatever you use to distinguish your blog posts from non-blog posts (sometimes you’ll have a specific View for your blog, in which case just use the whole report).

From there, you can find which high traffic blog posts are converting much lower than the site average. I talk more about how to do this on my post in content marketing analytics, by the way.

You can also pull this data to Excel in raw format and do a similar analysis, but it usually suffices to just focus on the highest traffic, lowest converting posts, and you can see that starkly with this report.

If you’re doing it in Excel, pull your data over and use conditional formatting to highlight blog posts that convert less than the average. Then use a filter to only look at those:

Quick point: I love Google Analytics as much as the next guy, but it actually may be easier to use the analytics from your marketing tool in this case.

At least in the case of HubSpot, the CTAs tool has great reporting and you can compare side-by-side with all of your CTAs (or export the data and analyze it elsewhere). It shows which pages are converting best that are using the same CTAs and it also aggregates CTA conversion rates so you can compare apples to apples.

Image Source

" data-medium-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/cta-analytics-1.png?fit=300%2C113&ssl=1" data-large-file="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/cta-analytics-1.png?fit=840%2C317&ssl=1" class="size-large wp-image-360" src="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/cta-analytics-1.png?resize=840%2C317&ssl=1" alt="" width="840" height="317" srcset="https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/cta-analytics-1.png?resize=1024%2C386&ssl=1 1024w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/cta-analytics-1.png?resize=300%2C113&ssl=1 300w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/cta-analytics-1.png?resize=768%2C289&ssl=1 768w, https://i0.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/cta-analytics-1.png?w=1600&ssl=1 1600w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

Now you have a good idea of which blog posts represent the biggest opportunities, at least from a bird’s eye view. Next you need to prioritize which ones you’ll focus on first and how much lift you can expect.

How to Prioritize and Size Opportunities

We’ll need to dump our data into Excel for this. We only need three basic variables: blog post title (or URL), page views (try to do an average monthly count from a spread of 3-6 months), and conversion rate (same thing with the average).

Where you get this doesn’t matter. You can pull it from Google Analytics, your marketing automation tool, or your analyst’s magic crystal ball (just not from your imagination).

Just make darn sure you have good quality data and that you trust it.

What you’re about to do is a common planning and projection analysis used to see what the upside of certain actions is (a watered down version of it, anyway). If the data isn’t right, your projections aren’t going to be worth much.

So, pull your data to Excel. On first strike, I like to only pull the top ten trafficked posts that are below the site average. You can find those using the above Google Analytics report, or by bringing your data to Excel and using conditional formatting to show those below average.

Then use a filter to only show those that are highlighted:

Once you have those, build out some additional columns for your projected values. You can get more precise with this, but to keep things simple, I like to use the site average to project out numbers. The assumption is that, if that’s the average, we can probably get any post there with some optimization effort (obviously that simplifies things, but it’s good for prioritization):

From there, it’s extremely to see which opportunities are the biggest. You can even project these numbers out over a longer time period (such as a year) to see what the potential upside could be.

This type of modeling helps especially when you have to make tradeoffs. For instance, if you have enough content resources to either invest in this type of conversion optimization, in net new content creation, or in SEO projects to lift current content to get more traffic, then you can see which one merits the prioritization.

Note: this is but one way to model things out.

Also, “all models are wrong, but some are useful.” The point here isn’t to project your exact amount of conversions you’ll get, but rather to choose between projects when you have a set amount of resources.

Even within this list, it helps you choose which articles you should focus the most attention on.

How to Gauge Intent of Visitors and Align Your Offer

In PPC advertising, there’s a popular notion that takes into account the “temperature” of a target audience. A display ad may be reaching completely cold traffic, so your offer shouldn’t be something bottom of the funnel like a demo. Maybe it should be an e-book, or something that pushes them down the funnel until they’re a warmer temperature.

Image Source

" data-medium-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/ppc-channels-1024x1024-1.jpg?fit=300%2C300&ssl=1" data-large-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/ppc-channels-1024x1024-1.jpg?fit=840%2C840&ssl=1" class="wp-image-366 size-full" src="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/ppc-channels-1024x1024-1.jpg?resize=840%2C840&ssl=1" alt="" width="840" height="840" srcset="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/ppc-channels-1024x1024-1.jpg?w=1024&ssl=1 1024w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/ppc-channels-1024x1024-1.jpg?resize=150%2C150&ssl=1 150w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/ppc-channels-1024x1024-1.jpg?resize=300%2C300&ssl=1 300w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/ppc-channels-1024x1024-1.jpg?resize=768%2C768&ssl=1 768w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

People don’t talk about this as much with organic search traffic, but it’s the same case: people land on your site with widely varying levels of intent.

How do you determine the intent and user journey stage?

There are many ways to doing so, but they all start with understanding what channels people are coming from and what keywords they’re searching. To analyze your marketing channels is simple. Log into Google Analytics and go to your Acquisition > All Traffic > Source/Medium report.

Start digging around and asking questions. What are your highest performing channels? Lowest performing? If you’re running campaigns, what ones are doing well and what ones are doing not so well?

Explore the data the bit.

Specific to SEO traffic, you need to analyze what keywords are bringing users to your pages. To do that, enter the URL of a blog post in Ahrefs and click on “Organic Keywords” (you can also get this info from Search Console or many other SEO tools):

What *did* happen to Alex and ROI?!

Then you need to classify these keywords into a temperature state: are they warm, ready to buy visitors, or are they cold, barely know your brand visitors? This helps define your offer and conversion pathway:

If you’re a nerd like me, you might be interested in running clustering and classification algorithms to place keywords in user journey state buckets (read this on how to do that). (Disclosure: I’m still working on doing this in a way that I trust and that doesn’t take lots of tinkering and tweaking. Work in progress but promising)
If you’re not, you may have just as much success using common sense to bucket keywords into user state (read this on how to do that).

Running Content Experiments and Converting Visitors to Leads or Customers

Content experiments are tricky because they are at an increased risk of being affected by things like seasonality and other validity threats. Google’s algorithm changes a ton, people search with different intent at different times of the year, and it’s hard to test on a truly representative sample.

However, you can still test, and you should still test – the same way you would with any other website element or experience.

I like to test at the bottom of the funnel. Don’t worry about things like time on page or bounce rate, use something like conversions as your metric to optimize against.

Most lead capture tools allow you to do this on their platform (if they don’t, get a new one). You still have to adhere to the same statistics principles you would with any other A/B test (and time period comparisons are still a bad methodology, as is always the case when trying to infer causality).

I’ve written a million articles on A/B testing at this point, but these three will cover everything to get you started:

Lifting Traffic Where There’s Potential

There’s another side of this content optimization coin: lifting up traffic. If you have lots of content already, it’s likely you rank for some stuff, don’t rank for other stuff, and rank on the second or third pages for the rest.

Content optimization is all about lifting those high value pages that aren’t ranking, and especially those that are almost ranking page one, to the front.

How to Find Articles That Are Losing Traffic

Here’s a sad fact marketers have to grapple with: even if you build a great piece of content and it ranks well, eventually it may start to lose traffic.

That could happen for a variety of reasons:

Competitors start to create content that outranks you
Google’s SERP changes (adding feature snippets, ads, etc.)
Search volume for your keywords drops

There’s not much you can do about the third one, but knowing what the issue is (and that there is an issue) helps you move forward on a potential plan. Competitors outranking you? Beef up your content and build links. SERP changes? Optimize your content to get that feature snippet, carousel, or whatever else.

Image Source

" data-medium-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/blog-image-dogDays_1024x1024.jpg?fit=300%2C184&ssl=1" data-large-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/blog-image-dogDays_1024x1024.jpg?fit=840%2C516&ssl=1" class="size-large wp-image-370" src="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/blog-image-dogDays_1024x1024.jpg?resize=840%2C516&ssl=1" alt="" width="840" height="516" srcset="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/blog-image-dogDays_1024x1024.jpg?w=1024&ssl=1 1024w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/blog-image-dogDays_1024x1024.jpg?resize=300%2C184&ssl=1 300w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/blog-image-dogDays_1024x1024.jpg?resize=768%2C472&ssl=1 768w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

First step, though, is to find out if you’re losing traffic (and which posts are losing the most). Here’s how you do that.

Log into Google Analytics, pick a period of time (let’s say 3 months) from last year (let’s say from January 1 – March 1 2017).

Then, go to Behavior > Site Content > Landing Pages and set your time range. Also, set your filter so that you’re only analyzing the property you care to analyze (e.g. /blog/).

You could get a high level view from here, but I prefer to narrow down to only organic traffic. To do that, set up a secondary dimension of “Default Channel Grouping.”

Then set up an advanced filter that only includes “organic search.”

Next, include all rows (scroll to the bottom and adjust the number where it says “show rows”) and export this data to CSV.

Open your spreadsheet and name the first tab whatever month and year it is (Jan – Mar 2017). Then delete all the data you don’t need (leave only the URL and the Sessions columns):

Go back to GA and change the date range to the current period. Make sure it’s the same time period and same start and stop dates, but for this year. This helps iron out traffic differences due to seasonality (always compare apples to apples). In this case, it means we need to set our date range from Jan 1 – March 1 2018.

Export to CSV, and bring it to tab 2 of your spreadsheet. Again, delete all data except for the URL and Sessions. Then rename the tab to something like Jan – Mar 2018.

Now add another column (Column C) to tab #1 and name it something like “Sessions 2018” (also rename Column B to something like “Sessions 2017”). Now do a Vlookup, like the following (in Column C) where ‘tab 2’ is the title of your tab:

=VLOOKUP(A2, ‘[tab 2]’!A:B, 2, FALSE)

Should look like this:

Now we’re going to see if there has been a significant drop. You can do whatever percentage you think is significant, but this example we’ll flag anything that has dropped by 20%.

Add column D and title it “20%+ decline?” then insert this formula in D7:

=IF(C2<(B2-(B2*0.2)),TRUE,FALSE)

Looks like this:

That formula asks if the number in Column C is 20% or greater less than the number in Column B. Then you can do conditional formatting to highlight those where that is the case.

Note: the data I’m using is from Google’s merchandising store so it’s kind of boring. It’s way more interesting if you’re using blog data because of the natural fluctuations in rankings and traffic over time. But alas, my personal site doesn’t have enough organic traffic and HubSpot probably wouldn’t love it if I shared screenshots of GA data, so Google demo account it is ¯\_(ツ)_/¯

The next question, if you’re losing organic traffic over time, is why? There are a few common culprits:

You’ve fallen in rankings
The SERP experience has changed (feature snippets, carousels, etc., have been added)
Your click-through-rate has changed
Search volume for your keywords has dropped

So, you need to triangulate. Tracking rankings is easy. Every SEO tools does it and you can also do it in Google Search Console.

If you haven’t dropped rankings, has your CTR fallen? Again, you can track this in Search Console.

If your CTR hasn’t fallen, has the SERP changed? If there are feature snippets, carousels, ads, etc., can you capture those spots without a herculean amount of effort?

If the answer is no to all those, it’s likely search volume for the keywords you were ranking for has fallen. You can get an approximation of this effect in Search Trends by looking at your position over time and your impressions over time, but it still won’t be precise: you don’t know which long tail keywords you may have been ranking for that dropped off, and the trends are approximate and averaged.

What should you do in that case?

My advice: Drink a glass of wine and take your dog to the park. Maybe learn a new language. Life isn’t all about SEO and marketing.

How to Find Articles That Are Almost Ranking Well

The best way to grow your traffic may be to publish net new articles. That’s true especially if you’re starting out. But it’s more likely, especially if you have a lot of content already published, that you’re almost ranking well for a ton of high value keywords. You’ve just gotta find ‘em, analyze them, and optimize them.

There are a few ways to do that. I’ll show you one of those ways (one that assumes you have an Ahrefs account, which you totally should have).

First, log into Ahrefs and enter the domain that you’d like to analyze.

It’s possible, too, that you just want to analyze a specific subfolder or subdomain if your site is set up that way (e.g. site.com/blog). Whatever the case, enter that in the domain explorer.

I’ll use CXL as an example since my personal site has virtually zero traffic (you can analyze any property you want in Ahrefs – pretty neat for competitor analysis or client work, but that’s another story).

You’ll see a variety of interesting numbers on your dashboard and features on the side. Ignore them all except for “Organic keywords” on the top. Click on the number (in this case “113K’”). That will bring you to a dashboard that shows all the keywords you’re ranking for in the search engine and the corresponding URLs.

From here, you’ll want to filter things down. It depends on what rankings you’d like to isolate, but I consider anything in the 10-21 range worthy of optimization (another nice set could be from 6-10 if you really wanna inch up on the results page, or 11-21, or really whatever range you want. These are arbitrary numbers for the most part).

So click on “Position” and choose which rankings you want to filter for.

After that, set up a filter for volume. Again, this depends on what you consider a worthy amount of volume. I try to optimize for keywords above 1k, but let’s set the bar at 200 for now.

This will allow us to combine similar keywords later in Excel to get a better picture of the overall opportunity (e.g. if “Customer Satisfaction Surveys” ranks for both “how to measure customer satisfaction” and “satisfaction survey template,” we want to include both of those in our opportunity analysis).

Now export your file to CSV.

Cut down the columns you don’t care about (historical rankings, etc.). You now have raw data, and actually, you can get a pretty good picture of which opportunities exist from a qualitative look at this data:

Especially if you add conditional formatting to the volume and difficult (or CPC) columns, you can see which blog posts represent the bigger opportunities for optimization.

However, my favorite thing to do here is to create a Pivot Table. Doing so can allow you to combine the volume of two or more keywords that a single blog post is ranking for.

For example, if Blog Post X is ranking for in position 12 for Keyword A (500 volume) and position 14 for Keyword B (1000 volume), then we can see that the average ranking for this URL is 13 and it’s got a potential of 1500 search volume (note: you don’t have to use average position. It can be confusing, but it helps me size the ease of an opportunity). This makes it easier to look at absolute opportunities.

Here’s how I set that up in Excel:

If you’d like, you can then pull these entries to a different sheet and order them by traffic potential. If we do that, we can see that the top 10 opportunities represent a search volume potential of about 500,000:

From there, you can head back over to your raw data sheet and check out which keywords correspond to the URL which high search potential. Here are the keywords for my example URL (on cognitive biases written by my past colleague, the super talented Shanelle Mullin).

What can they do from here? Well, a few things, depending on the context.

The first thing I would do is type in each of these keywords into a) Google and b) Ahrefs and see what is currently ranking and the backlink profiles and competitiveness of the other sites ranking.

Let’s try that with “list of cognitive biases,” for which CXL is ranking #20.

It’s not a shock that many of the currently ranking articles are informational and come from top sites, like Wikipedia, Mental Floss, and Business Insider.

Another thing to note is that they’re more general than the CXL title, as they relate to all applications of cognitive bias and not just CRO. Realistically, it’s a better branding play for CXL to include the focus on CRO, but it may be limited the search traffic and intent, something to consider in optimization.

Next, I would look at how these results stack up from a competitive perspective. Plug in your own URL into Ahrefs and get your baseline data on quantity of backlinks, domain rating, URL rating, etc.

Then plug in the keyword you’re trying to rank for (reminder “list of cognitive biases”) in the keyword explorer tool:

Scroll all the way to the bottom of this report and look at the current rankings. You can see, side by side, the backlink counts, Domain Rating, URL Rating, and “Ahrefs Rank” (a sort of aggregate metric that attempts to tell you how strong your search capability is).

Learning from a quick scan: Wikipedia is a monster and won’t be fucked with, but the others are all subject to be overtaken.

It would take a bit more effort to analyze the quality of each of the articles on that list (and I won’t walk you through that), but you essentially want to match the search intent (clearly a list of cognitive biases), and you want to optimize on-page for that and build links).

Optimizing on-page is a huge topic, so I’ll defer to the master on that topic: On-Page SEO: Anatomy of a Perfectly Optimized Page

You can also use a nice tool like SEMrush’s SEO Writing Assistant or other content optimization software like Surfer SEO.

Finally, you can work on Click-through-Rate to squeeze out even more traffic out of your rankings. Here’s a good article from Wordstream on how to do that.

So, to optimize this piece of content, we have a) a possible page title change b) some on-page optimization, c) internal linking d) some beefing up of the content to make it more thorough than the others and e) link building.

I won’t go into link building fully, as I’ve done that in a previous article on content promotion. But I want to briefly go over how to optimize your content to make it easier to build links (by building in linkable assets).

6 “Hooks” for Rankable and Linkable Content

One way to create linkable content is to genuinely write the best thing on the internet on that topic. It may sound grandiose, but that was the explicit content strategy we held at CXL.

Outside of that, there are other more tactical things you can do to help out with link acquisition and social media shares. There are a variety of these, but in my experience, it comes down to a few really effective ones. Scott Tousley and I call them “content hooks”:

Original data & stats
Original Images
Charts and Graphs
Quotes from influencers
Frameworks
Pros and Cons Tables

The mindset here is that you work backwards and think, “given the target sites I’d like to get links from, how can I craft my content to make it easier to acquire those links?” In the marketing world, if you have original data, new fancy frameworks, or original images or charts, it makes things leagues easier to add value.

A brief walk through these, with examples, is in order.

1. Original data & stats

This one is a bit of heavy lifting in terms of costs, but if you can pull legit, impressive data and publish it, you’re going to have a competitive advantage. Certain companies really excel at this, including CXL with their UX studies.

Image Source

" data-medium-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.08.37-AM.png?fit=300%2C181&ssl=1" data-large-file="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.08.37-AM.png?fit=840%2C506&ssl=1" class="size-large wp-image-394" src="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.08.37-AM.png?resize=840%2C506&ssl=1" alt="" width="840" height="506" srcset="https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.08.37-AM.png?resize=1024%2C617&ssl=1 1024w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.08.37-AM.png?resize=300%2C181&ssl=1 300w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.08.37-AM.png?resize=768%2C463&ssl=1 768w, https://i1.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.08.37-AM.png?w=1600&ssl=1 1600w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

Buzzsumo also does this really well with their huge content analyses.

Image Source

" data-medium-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.11.21-AM.png?fit=300%2C161&ssl=1" data-large-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.11.21-AM.png?fit=840%2C452&ssl=1" class="size-large wp-image-395" src="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.11.21-AM.png?resize=840%2C452&ssl=1" alt="" width="840" height="452" srcset="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.11.21-AM.png?resize=1024%2C551&ssl=1 1024w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.11.21-AM.png?resize=300%2C161&ssl=1 300w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.11.21-AM.png?resize=768%2C413&ssl=1 768w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.11.21-AM.png?w=1600&ssl=1 1600w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

HubSpot has a whole research program dedicated to original insights.

2. Original Images

True story, I was recently at a conference where I saw that some original images we created to explain A/B testing had been used by a keynote speaker (w/o crediting us, by the way).

People search for images, especially when creating content (blog, conference talk, or otherwise), and if your images come up when they search, you get a link (as long as they actually credit you).

My line of thought is, if you’re going to use images, why not try to create your own wherever that is possible? We did that for HubSpot with our NPS survey image:

This is an especially helpful tactic if you can create a visualization for a complicated topic, like segmentation or multivariate testing.

3. Charts and Graphs

This one is sort of a hybrid between “original images” and “original data,” but essentially you want to give some impressive data visualization to explain concepts or insights. It’s a big trend for bloggers to write data-driven posts, and images like these give the impression of using data to support your claims (doesn’t matter if the chart is bullshit, it’s going to get links anyway).

Here’s an example of a CSAT journey map I put together in R for a HubSpot post:

I’m no master of data visualization, and things can get super sophisticated, especially when you start to implement interactive visualizations. Ryan Farley did a great job of this with his interactive retention visualization:

4. Quotes from influencers

Roundups are usually boring, but quotes from smart people help you a) create better content and b) promote that content on social media once it’s published. Working with smart people to put together content also helps you build relationships and support smart voices by giving them a platform.

I certainly have an affinity for BigCommerce when they feature my opinions in their articles:

Image Source

" data-medium-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.14.40-AM.png?fit=300%2C180&ssl=1" data-large-file="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.14.40-AM.png?fit=840%2C504&ssl=1" class="size-large wp-image-402" src="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.14.40-AM.png?resize=840%2C504&ssl=1" alt="" width="840" height="504" srcset="https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.14.40-AM.png?resize=1024%2C615&ssl=1 1024w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.14.40-AM.png?resize=300%2C180&ssl=1 300w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.14.40-AM.png?resize=768%2C461&ssl=1 768w, https://i2.wp.com/www.alexbirkett.com/wp-content/uploads/2018/03/Screen-Shot-2018-03-21-at-10.14.40-AM.png?w=1600&ssl=1 1600w" sizes="(max-width: 840px) 100vw, 840px" data-recalc-dims="1" />

Image Source

There may not be a direct route to a link here, but there is a pathway through increased social shares and distribution that usually leads to natural links. Plus, as I mentioned, if you curate your features well, it can help you create better content. Matt Gershoff, CEO of Conductrics, has certainly made my articles smarter than I could have made them on my own:

Roundups can work, too, if they don’t suck. Peep put together an awesome one on new GA features. Luiz Centenaro put together a nice one as well on community building:

5. Frameworks

When in doubt, invent a framework. Bonus points if it’s actually useful. I’ve done it a bunch at HubSpot:

This framework is admittedly not that useful. I just made an acronym out of the process for running customer satisfaction surveys. Who knows, though, maybe it helps someone remember the information better.

A better example is something like PXL, an A/B test prioritization framework that is undeniably useful. It’s something that I’ve used with clients to help prioritize experiments:

Brian Dean, however, is the king of this tactic. He not only uses this technique all the time, popularizing terms like Skyscraper Technique, but he also named the technique of naming techniques. Meta! His frameworks genuinely help explain SEO concepts in a simple and actionable way, so they catch on.

The best thing you can do is create a framework that truly helps fill a knowledge gap or helps people put a concept to use. I think Brian Dean, CXL, WiderFunnel, Reforge, and others have done this really well.

6. Pros and Cons Tables

The world is a confusing place. If you can help visitors clear up confusion on a given topic or set of solutions, you deserve a link. For example, there are lots of customer feedback survey types, so we listed pros and cons of each one to help people choose the appropriate type for their scenario:

We also created original images of these tables, combining that tactic as well.

Any way you can visualize or simplify comparisons or pros and cons can help users make decisions. Can you do it with software or pricing? Conversion Rate Experts did that really well with A/B test software comparisons:

Optimize On-Page SEO

There are tools now like Clearscope and Surfer that help you figure out what the gaps are in your SEO content.

Basically, you can plug in a target keyword, your text, and then get a score and recommendations to better position yourself to rank in search engines. They reverse engineer ranking factors and can help you find relevant keywords to use and generally make the article more seo-friendly to match the searchers intent. Here’s a screenshot of this very piece in Clearscope:

This will give you keyword, word count, subheadings, and readability recommendations. Outside of that, you can make marginal gains from improving title tags, alt tags, meta tags, etc. Same thing with internal linking and other HTML updates. At scale on a large enough website, they can move the needle, but on any one given piece, they’re smaller potatoes.

Sometimes, you need to re-sculpt the article to rank for an entirely different keyword. This happens when content doesn’t match the search intent of the search query driving people to the post.

Figuring that out requires some keyword research. You want to see, other than your target keyword / primary keyword, what search terms you already rank for and some new ideas for search terms you could target directly. These will likely be search terms that you don’t rank on the first page for.

For this piece, I rank for terms like “email blast examples,” but the piece is currently written as a generic high level guide. So I could re-write the piece to be more focused on the examples intent.

Relaunch: How to Get Back Off the Ground

After you beef up your content with some on-page optimization and add some link hooks, you should relaunch it. Give it a little velocity. It’s a new and improved piece, afterall. Why not give it some content promo love?

Basically, you can launch the thing like it’s new again. After all, it kind of is. As with most things content & SEO related, Brian Dean is the master and he’s already written a great guide/case study that covers how to do this. Check it out here.

Conclusion

Content optimization is important, often talked about, and rarely understood. How do you optimize content? What’s that even mean?

Here we’ve laid out two paths to doing so: improving conversion paths and improving traffic growth. Within those two paths there are multiple tactics for analyzing, prioritizing, and optimizing content for increased traffic, conversions, and whatever else you’re chasing after.

One can never truly encompass a topic and all the creative tactics that are possible, though. For that reason, I leave it as an open question: what am I missing? Any creative ways to surface optimization opportunities, uses of personalization, or otherwise? Feel free to comment or shoot me an email or whatever.

The post Content Optimization: How to Make Content Better appeared first on Alex Birkett.

Data Archives - Alex Birkett

The 9 Best Business Automation Software Tools in 2025

What is business automation software?

The 9 Best Business Automation Software Tools

1. Make (Formerly Integromat)

2. ​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​Workato

3. ​​​​​​​​​​​​​​​​​​​​​Zapier

4. ​​​​​​​ActiveCampaign

5. ​​​​​​​​​​​​​​HubSpot‌

6. ​​​​​​​SendInBlue

7. ​​​​​​​​​​​​​​​​​​​​​Hootsuite

8. ​​​​​​​​​​​​​​​​​​​​​​​​​​​​Google Data Studio​​​​​​​

9. ​​​​​​​​​​​​​​Airtable

Conclusion

The 15 Best Website Analytics Tools in 2025

The 15 Best Website Analytics Tools in 2025

Recommendations Based on Use Case:

1. Google Analytics

2. Snowplow Analytics

3. Heap Analytics

4. Adobe Analytics

5. HubSpot

6. Matomo

7. Yandex Metrica

8. Amplitude

9. Mixpanel

10. Fullstory

11. Woopra

12. HotJar

13. LuckyOrange

14. Mouseflow

15. Medallia

Conclusion

What’s the Ideal A/B Testing Strategy?

A/B Testing is Not One Size Fits All

A/B Testing is Inherently Strategic

A/B testing Always Has Associated Costs

What Leads to Better A/B Testing Ideas

Summary on A/B testing Strategy Assumptions

What Are the Goals of A/B Testing?

The Three Levers of A/B Testing Strategy Success

1. A/B testing frequency – Number of Variants

2. A/B testing win rate

3. Effect Size of A/B testing Wins

Reducing the Cost of Experimentation Increases Expected Value, Always

Conclusion

The 7 Pillars of Data-Driven Company Culture

What is a “Data-Driven Company Culture,” Anyway?

How Data-Driven Companies Cross the Street

The 7 Pillars of a Data-Driven Company Culture

1. Ensure Data is Precise, Accessible, and Trustworthy

2. Invest in Data Literacy for Everyone

3. Define Key Success Metrics

4. Kill Success Theater

5. Be Comfortable with Uncertainty (Say “I Don’t Know”)

6. Build and Invest in Tooling

7. Empower Autonomy and Experimentation

Conclusion

Data Literacy: 10 Lesser Known But Super Important Concepts To Know

What is Data Literacy?

1. Understand database querying and underlying infrastructure

2. Know Different Data Types

3. Utility vs. Precision Tradeoff

4. Bias vs. Variance

5. Signal vs. Noise

6. Correlation vs. Causation

7. Correlation vs. Correlation

8. Narrative Fallacies and Common Biases

9. Twyman’s Law

10. Leaky Abstractions in Analytics Tools

Conclusion (and Further Data Literacy Resources)

The 21 Best CRO Tools in 2025

The 21 Best CRO (Conversion Rate Optimization) Tools

1. Google Analytics (GA4)

2. Google Tag Manager

3. Amplitude

4. R & SQL

5. HotJar

6. Qualaroo

7. TypeForm

2. Workato

3. Zapier

4. ActiveCampaign

5. HubSpot‌

6. SendInBlue

7. Hootsuite

8. Google Data Studio

9. Airtable