A/B Testing Email Subject Lines: A Systematic Approach to Higher Open Rates

Subject line A/B testing is one of the most widely practised activities in email marketing — and one of the most widely done wrong. Most brands treat it as a curiosity exercise: run two variations, pick the winner, move on. Without structure, minimum sample sizes, or a testing log, this produces opinions dressed as data.

A systematic subject line testing programme, done correctly, compounds over time. Each test teaches you something real about your audience. Applied consistently, these learnings raise open rates and, more importantly, raise revenue per recipient — which is the metric that actually matters.

What Makes a Good Subject Line Test

A good A/B test isolates one variable and has a clear hypothesis. This is the rule most brands break first.

One variable at a time

“We tested ‘Hurry — 48 hours left’ against ‘Your skin deserves better. Shop the sale’” is not a useful test. The two subject lines differ in length, tone, urgency approach, personalisation, and structure. When one wins, you have no idea which element caused the difference.

A useful test isolates one element:

Curiosity vs direct: “Why your skin stops improving after 30” vs “Our bestselling serum — now restocked”
Personalisation vs generic: “Rachel, your new arrivals are here” vs “New arrivals are here”
Short vs long: “Still thinking about it?” vs “Your cart is still waiting — and so is this offer”
Emoji vs no emoji: “New season just dropped 🌸” vs “New season just dropped”

When a single variable is isolated, the winning variant tells you something you can apply beyond this one campaign.

A meaningful hypothesis

Before running the test, write down what you expect to happen and why. “I believe curiosity-style subject lines will outperform direct subject lines for our audience because our subscribers tend to be information-seeking” is a hypothesis. “Let’s see what happens” is not.

The hypothesis doesn’t have to be right — often the result is the opposite of what you expected, and that’s valuable information. The point is to start with a prediction so you can learn from whether reality matches it.

Minimum Sample Size for Statistical Significance

This is where most email A/B tests fail silently. A list of 3,000 subscribers testing two subject lines with a 50/50 split sends each variant to 1,500 people. If variant A gets 32% open rate and variant B gets 28% open rate, is that a real difference or random variation?

With 1,500 recipients per variant, the margin of error on a typical open rate is approximately ±3 percentage points at 95% confidence. That means a 4-point difference might be real — or it might be noise.

General guidance on sample size

For a subject line test at 95% confidence, assuming a baseline open rate of around 35% and a minimum detectable effect of 5 percentage points:

You need approximately 2,500–3,000 recipients per variant to detect a real difference reliably
For smaller effects (2–3 percentage points), you need 7,000+ per variant
For a list of 10,000 or fewer, most individual campaign tests will not reach statistical significance for small differences

This doesn’t mean testing is pointless on smaller lists. It means you need to:

Test for larger differences (aim to detect 5+ point improvements, not 1–2 point differences)
Run tests consistently over time and look for directional patterns rather than single-test conclusions
Consider using Klaviyo’s “send winning variant automatically” feature only when your send size is large enough to trust the result

How to Set Up Subject Line A/B Tests in Klaviyo

In Klaviyo, subject line A/B testing is available natively in the campaign builder.

When creating a campaign, select “A/B test” and choose “Subject line” as the variable. Add your two variants. Then configure:

Test duration: Klaviyo needs time to determine a winner before sending to the remaining list. A test window of 4–8 hours is standard for most campaigns. Longer is generally better for accuracy.

Winning metric: Choose your winner metric carefully. “Open rate” is the default — but post-iOS 15, open rate is inflated and unreliable for a meaningful portion of your audience (Apple users with Mail Privacy Protection enabled). Where possible, use click rate as the winning metric instead. It’s a lower number but a more reliable signal.

Sample size: Decide what percentage of your list sees the test and what percentage gets the winning variant sent automatically. A 20% test (10% per variant) followed by 80% winner send is a common structure for larger lists.

What to Test: A Priority Order

Not all subject line variables are equally worth testing. Here’s a useful priority order based on observed impact:

High-impact tests

Curiosity vs direct: Does your audience respond better to intriguing, open-loop subject lines, or to clear, direct statements of what’s inside? This is one of the most fundamental splits and should be tested early.

Personalisation: Name personalisation is the simplest form, but more interesting tests include personalising by purchase behaviour (“Since you loved X…”) or location (“Shipping to London just got faster”).

Urgency framing: Timer-based urgency (“12 hours left”) vs benefit-based urgency (“Only 3 left in your size”) vs social proof urgency (“Selling fast”). These can produce meaningfully different results by audience.

Medium-impact tests

Length: Short subject lines (under 35 characters) tend to work well on mobile, where preview space is limited. Longer lines (55–70 characters) can convey more context. Test both for your audience.

Question vs statement: “Is your moisturiser actually working?” vs “The moisturiser that actually works.” Questions engage curiosity differently than statements — results vary by brand tone and audience.

Lower-impact tests

Emoji vs no emoji: Emojis can increase attention in a crowded inbox but can also feel off-brand or frivolous depending on your audience. Worth testing once, but rarely a major mover.

Capitalisation style: Sentence case vs title case. Minor impact for most brands, but some audiences respond differently.

Interpreting Results: Look Beyond Open Rate

This is the second place most brands go wrong. A subject line test that lifts open rate by 5 percentage points looks like a win — but if the variant that got more opens produced fewer clicks and less revenue, it’s not a win at all.

The full result picture includes:

Open rate (directional, unreliable post-iOS 15)
Click rate (more reliable engagement signal)
Revenue per recipient (the ultimate arbiter)

A subject line that draws opens on a false premise — implying a discount that doesn’t exist, or being so vague it attracts curious openers who don’t convert — can increase open rate while damaging revenue and trust.

When to override the automatic winner

If Klaviyo selects a winning variant based on open rate and you can see that the “winner” has a lower click rate than the other variant, consider whether open rate really was the right proxy. For revenue-generating campaigns, a small open rate advantage that comes with a lower click rate is often a net loss.

Building a Subject Line Testing Log

The value of a systematic testing programme accumulates in the log. A simple spreadsheet with the following columns is sufficient:

Campaign date
Test variable (what you changed)
Variant A (text)
Variant B (text)
Hypothesis
Winner
Open rate (A vs B)
Click rate (A vs B)
Revenue per recipient (A vs B)
Learning (one sentence)

After 20–30 tests, patterns emerge. “Our audience responds consistently better to curiosity subject lines for new product launches but prefers direct subject lines for promotional campaigns.” This is audience intelligence that you can’t buy and can only build through consistent, structured testing.

The Compounding Return of Systematic Testing

A single well-structured subject line test might lift your open rate by 3–5 points. Compounded across 12 months of regular testing, the learning accumulates into a genuine strategic advantage: a tested understanding of what your specific audience responds to, applied consistently to every send.

Most brands don’t have this. They guess, run ad hoc tests, and never capture the learnings. A systematic approach is a genuine differentiator.

At Excelohunt, we build and run structured A/B testing programmes as part of our ongoing email management service — ensuring every test is properly structured, every result is interpreted correctly, and every learning is applied.

Looking to implement these strategies with expert support?

A/B Testing — learn how we implement this for clients Book a free strategy call with Excelohunt →

A/B Testing Email Subject Lines: A Systematic Approach to Higher Open Rates