When to A/B Test

"Just ship it" or Experiment?

Jun 10, 2023

∙ Paid

In Today’s Newsletter:

Lessons for Builders Out of Apple’s WWDC
🔒 Paid: When to A/B Test in Product

What You Need to Know: WWDC

I pre-released every major announcement from Apple’s Worldwide Developers Conference (WWDC) the day before the event. Nevertheless, the event was still packed with details that matter for builders.

As gate keeper to the 1B+ Apple iPhone ecosystem, 100M+ Watch ecosystem, and 100M+ Mac ecosystem, Apple is the most important tech company on Earth. So when Apple releases its first new category in a decade, it’s worth studying.

I watched the 2 hour keynote, read the thought pieces, and pondered the implications in my dreams, so you can focus on your day job. Here’s the three talking points you should walk away with:

1: Great AI features don’t need AI branding

The headline out of WWDC in the mainstream press was that Apple didn’t mention or release an AI feature. They missed two right under their eyes: transformer-enabled autocorrect and dictation.

Apple released its own proprietary LLMs that run natively on your device and personalize to your usage over time. Craig said it in under 2 minutes - because that’s how packed WWDC was.

The big takeaway is: focus on the value you deliver to your consumers via AI, not the fact that it uses AI. Yes, you might land up in the AI press, but after the initial push of traffic, you’re not going to have retention unless you solve a problem users already have.

In the case of the Apple OS, the big problem users already have is input: typing and dictating fast without error. In fact, that’s the big interface problem Apple is taking on as a company with products like Vision Pro, as well. So launching something in this space is the perfect problem for Apple to apply the latest in generative AI technology.

Apple was never going to rush to building a chatbot. Instead, it found its own perfect way to deploy LLMs

2: Great positioning is about making everything your own

Apple didn’t go agile. It didn’t release a MVP and then iterate.

Tim Cook and co. allocated 3,000 people to Vision Pro over five plus years. It was a massive investment, one that some even within Apple weren’t privy to.

How did Apple introduce the results of 5 years of massive secret investment? As the future of spatial computing.

This is clever because Apple already is the leader in spatial computing. Between the Watch, iPhone, and Mac, it’s got computing leaders in every category. So it benefits Apple to describe the Vision Pro as “just another computer.”

There was an alternative, more accurate way to introduce the Vision Pro: as the best augmented reality device out there. But Meta already has a major lead in that market, with Oculus commanding an 80% market share.

Apple positioned its product in a category it leads.

This decision has implications for every team from product to marketing:

On the product side, Apple chose to prioritize remote work use cases over gaming use cases. If you’re going really on the offensive, positioning yourself as a product of one in a category you lead is very powerful.
On the marketing side, it didn’t compare its product to anyone. Apple didn’t even allude to the Quest. Under Tim Cook, Apple tends to just focus on itself and its users.

As to the hot topic of whether we have entered the post-iPhone era with the Vision Pro? I tend to think too many forget the Apple Pippin or HomePod, both failed devices. It remains to be seen whether the Vision Pro will succeed.

3: Copy ruthlessly - for users

The average American receives 16 a month. Apple released an amazing weapon against that spam epidemic: automatic live voicemail transcription.

If a spammer wants to get in touch with you, they’ll have to say useful things in the beginning of their voicemail. That’ll be quite the barrier. Today, there spam calls don’t actually connect until after you say, “Hello." This will stop many of those spammers.

Live voicemail is also great if you’re in a meeting and want to see if the call is really urgent. So, all in all, it’s a lovely feature.

But - this is a feature that has existed for years on Android. Apple regularly copies Android a few years later, without any mention of the original feature. In product building, there is no attribution.

So copy ruthlessly, in the interest of your users, as Apple always has (while steering clear of infractions of IP law).

ICYMI - My Other Writing:

Welcome to over 2,000 new subscribers since last week 🙌

When to Use Different Testing Methods

It’s one of the most hotly debated topics in product development:

Should we test it?

Some people are zealots that you must test everything. Others go with the flow. And most just don’t know.

There’s scant content

I can understand why most just don’t know. There’s no course or video on this. Heck, there’s no article on this.

It’s a sad state of affairs. I didn’t realize the content landscape was so barren until my friend Carl Velotti, who also came up with the idea for the (super well received) impact sizing piece, flagged it to me.

After going deep on the web, I did dig up some good guidance. The problem? Google and Bing search powering ChatGPT just don’t index them highly enough.

The Mount Rushmore of product canon on this topic are:

Emily Robinson’s Guidelines for A/B Testing (2341 words)
Tal Raviv’s Please, Please Don’t A/B Test That (2168 words)
Lenny Rachitsky’s When NOT to Run an Experiment (1318 words)

What you’ll notice about these three pieces, compared to Product Growth deep-dives, is they are relatively short. They get into the high-level tactics. But, simply due to size constraints, they don’t go into the hairy details of edge cases. Or provide a first principles framework of when to experiment.

It’s a really important topic

As a VP of Product, I was actually worried for all the junior PMs out there when I did the research on that. Because there are a couple harsh truths about A/B testing every PM needs to know:

Most A/B tests are inconclusive
A/B testing is the best way to prove your impact
A/B testing has a real and tangible cost in speed
Truly great changes win in A/B tests and the bottom-line metrics
Many teams make mistakes running and interpreting results of A/B tests

Together, these harsh truths make it quite important for PMs to have the ability to have an opinion on when to test. Some execs will want you to test things you shouldn’t. You need to have the knowledge to push back. Other execs will want you just ship something - and you have to be able to explain why it needs to be a test.

Today’s Piece

That’s where today’s piece comes in.

I’ve been involved in 100+ A/B tests in my career (thanks to PMs on my teams). And I’ve been involved in even more features shipped without a test. So, in today’s piece, I cover:

The spectrum of POVs
Easy heuristics to start with
Myths to forget ~ A/B testing
Features of a great experiment
The first principles framework to organize it
Making the call on 5 example features for texture

It’s 6,399 words filled to the brim with tactics and takeaways. Let’s get into it.

The spectrum of POVs

At each end of the spectrum lie some interesting viewpoints. Let’s go through those first. They help illustrate where you could end up. Then we’ll address the messy middle.

One end of the spectrum: Scientific or it’s Questionable (SQ)

Default Tendency: Experiment
Prominent Companies: Spotify, Indeed, Meta
~30% of PM roles

On one end of the spectrum is SQ culture.

This tends to come from product leaders who excel at demonstrating impact. They don’t claim a cent more impact than they have actually driven. When it comes to meetings with skeptical analysts, they walk out unscathed - with the same impact as originally claimed.

They tend to have come from big company backgrounds, where claiming impact based on a pre-/post-analysis doesn’t pass snuff. Over years of career success, they’ve been trained out of the ability to ship something not as an experiment. For them, the answer to “should we test it?” is always yes.

Roughly 15% of the product leaders you encounter are SQ. But because they lean to bigger companies, SQ culture represents reality for ~20% of PMs.

The other end of the spectrum: We have Big, Global Launches (BG)

Default Tendency: Just Ship It
Prominent Companies: Apple, Tesla
~20% of PM roles

On the other end of the spectrum is BG culture.

The product leaders who drive BG teams tend to come from highly marketing-driven environments. The big launch of product features, at an event like WWDC, is everything (why I opened with that).

So instead of focusing on testing, these leaders tend to have a high bar for the product development process itself. They want teams to validate problems are real with data, and then user test prototypes extensively before shipping to everyone.

Roughly 20% of product leaders are BG. And, as a result, about 20% of PMs live in a BG world.

The rest of folks: Different Tools for Different Times (DT)

Default Tendency: It Depends
Prominent Companies: Google, Netflix, Amazon
~50% of PM roles

The rest live somewhere in the middle. This can be a messy space. Some people test most of the time. Others “just ship” most of the time.

But the most common model is one where leaders empower PMs: PMs have the infrastructure to test, but the decision is up to them. For these companies, you need to have a method for sorting out what to test and what to just ship.

That’s where the this piece comes in. We’re going to walk through a first principles framework to help you decide for every feature you are working on.

Even if you’re in a SQ or BG environment, today’s piece will help you rightly push back on your leaders when taking the other approach can help you achieve more impact 🚀

Keep reading with a 7-day free trial

Subscribe to Product Growth to keep reading this post and get 7 days of free access to the full post archives.