Blog/How-to

How to run A/B tests in Aimfox for LinkedIn outreach

How to set up and run A/B tests in Aimfox to compare LinkedIn connection request messages, track acceptance rates per variant, and scale winning copy.

James Whitfield

Lead gen agency owner, 50+ campaigns/month · Updated June 23, 2026

Last updated: July 2026 · James Whitfield, Lead gen agency owner, 50+ campaigns/month

TL;DR — 5 things to know before reading

Aimfox A/B testing splits your prospect list between two or more message variants and tracks which one gets a higher acceptance or reply rate
You need at least 100 prospects per variant before the data is meaningful enough to declare a winner
Test one variable at a time: connection note tone, opening line, or call-to-action positioning
The winning variant should run at full scale before you move on to testing the next variable
Run A/B tests on connection notes first, then follow-up messages — improving acceptance rates has a compounding effect on all downstream sequence performance; pair your tested campaigns with verified contacts from Quarvio for clean data

Our take

Most LinkedIn outreach campaigns run one message variant until the campaign ends, then move on. The problem is that you never know if a different note would have produced 50% more acceptances. A/B testing removes that uncertainty by systematically comparing variants against each other on the same audience.

Aimfox allows you to split your prospect list between message variants and track performance separately for each. The winner scales; the loser is retired. Over several testing cycles, your connection notes and follow-up messages improve to the point where they consistently outperform industry averages. This guide covers how to set up an A/B test in Aimfox, what to test, how to read the results correctly, and what to do with the results to maximize LinkedIn outreach performance at scale.

Why A/B testing LinkedIn outreach matters

Acceptance rates on LinkedIn connection campaigns vary significantly based on how the note is written. A generic note to the same audience might achieve 15% acceptance while a specific, personalised variant achieves 35%. That difference compounds through the entire sequence:

Metric	Generic note (15% acceptance)	Specific note (35% acceptance)
Requests sent	500	500
Connections accepted	75	175
Follow-up replies (at 10%)	7	17
Conversations generated	7	17

Source: LinkedIn's official connection and outreach policy and Aimfox reviews on G2 — verified June 2026

A 20-percentage-point improvement in connection note acceptance more than doubles the number of conversations generated from the same prospect list. This is why testing connection notes is the highest-leverage place to start.

The case for systematic A/B testing becomes even stronger when you consider the alternative: optimizing by feel. Most outreach practitioners make message changes based on anecdotal impressions ("this note feels better") without the data to confirm whether the change actually improved results. After six months of anecdotal optimization, they have no reliable knowledge of what is driving their results. After six months of systematic A/B testing, they have a validated library of what works with their specific audience, tested against real performance data. The difference in performance is significant and cumulative.

Understanding what makes A/B test results valid

Before running tests, understanding what makes a result valid prevents two common errors: declaring a winner too early and drawing false conclusions from invalid comparisons.

Statistical significance in the context of LinkedIn outreach:

Statistical significance is the concept that a result is unlikely to have occurred by chance. In LinkedIn A/B testing, you are comparing two acceptance rates and asking whether the difference between them reflects a real performance difference or just random variation in who happened to accept on a given day.

The practical guideline for LinkedIn outreach A/B testing: a difference of 5 percentage points or more that is consistent across the entire test period (not driven by a single day's results) on a sample of 100+ per variant is meaningful enough to act on. Below this threshold, the result is not reliable enough to declare a winner.

Why small samples are misleading:

Imagine Variant A sends 20 requests on day 1 and gets 6 acceptances (30%). Variant B sends 20 requests and gets 3 acceptances (15%). Is Variant A better? You cannot tell — 20 requests per variant is far too small a sample. The next day might reverse entirely. Only after 100+ sends per variant do the daily fluctuations average out enough to see a reliable signal.

The common mistake of evaluating mid-test:

Looking at A/B test results on day 3 of a 14-day test and making decisions based on what you see is a reliability trap. Early results are dominated by whoever happened to log into LinkedIn and check their connection requests in the first few days. This is a biased sample. Wait for the full test period before evaluating.

What you can test in Aimfox

Element	Examples
Connection note tone	Professional vs conversational
Opening line	Role-specific vs company-specific
Note length	2 sentences vs 4 sentences
Call-to-action	Soft question vs direct ask
Follow-up step 1	Value-focused vs curiosity-focused
Follow-up step 2	Specific ask vs open-ended check-in

Test one variable at a time. If you change both the tone and the length in the same test, you cannot isolate which change drove the result.

The full testing sequence: connection note first, then follow-up messages

The optimal A/B testing sequence follows the order in which prospects experience the campaign. Improving connection note acceptance rate is the highest-leverage place to start because it expands the audience that sees all subsequent follow-up messages. Once acceptance rate is optimized, improvements to follow-up messages apply to a larger base.

Phase 1: Connection note optimization (Month 1–2)

Test the connection note first. Run 2–3 testing cycles focused exclusively on the note:

Test 1: Note length (short 40 words vs medium 60 words)
Test 2: Opening approach (role-specific vs company-specific vs mutual connection reference)
Test 3: Call-to-action style (soft question vs no CTA vs direct ask)

After 3 cycles, you have a validated connection note optimized for your ICP. Use this as the fixed baseline for all subsequent follow-up testing.

Phase 2: Follow-up message 1 optimization (Month 2–3)

With a fixed, optimized connection note in place, begin testing Follow-up Message 1. This is the first message sent after a connection is accepted. Test:

Test 4: Opening of follow-up 1 (problem-first vs value-first vs direct ask)
Test 5: Length of follow-up 1 (2 sentences vs 4 sentences)
Test 6: CTA in follow-up 1 (meeting request vs information offer vs question)

Phase 3: Full sequence optimization (Month 3+)

With connection note and Follow-up 1 optimized, test Follow-up Messages 2 and 3. At this point, you have a fully tested and validated outreach sequence that has been systematically improved from first touch to final follow-up.

This sequential approach means that by month 3, the campaign is significantly better than month 1 across every element, and the improvements are evidence-based rather than intuition-based.

Step 1: Create two campaign variants in Aimfox

In Aimfox, create two separate campaigns targeting the same audience type. Use the same LinkedIn search URL or a split of the same prospect list — ensure the two audiences are as similar as possible to isolate the message as the variable being tested.

Name the campaigns clearly: "Q3 SaaS Founders — Variant A" and "Q3 SaaS Founders — Variant B". This makes it easy to compare results in the Analytics section without confusion.

Create a tracking document for the test — even a basic spreadsheet — that records:

Campaign name and variant label
Test start date
Element being tested and what differs between variants
Target sample size per variant
Check-in dates for monitoring
Results at test completion

Without this documentation, it is easy to confuse which variant tested which element after running multiple cycles.

Step 2: Write your two message variants

Write two distinct versions of the element you are testing. For a connection note test:

Variant A — Role-specific note: Reference the prospect's specific job title and what people in that role typically face. Keep it to 40–50 words.

Variant B — Company-specific note: Reference something specific about the prospect's company — industry position, recent news, or company stage. Keep it to 40–50 words.

Everything else should be identical between the two campaigns: the same LinkedIn account sending, the same daily limit, the same working hours, the same follow-up sequence.

Writing notes that actually differ:

A common mistake is writing two "different" variants that are actually just word-swapped versions of the same approach. "Hi [Name], I help VP Sales professionals at SaaS companies improve their outbound" and "Hello [Name], I work with VP Sales leaders in SaaS to improve outbound" are not meaningfully different. Aimfox's algorithm will produce nearly identical results on both because the prospect's response is driven by the same underlying message.

Write variants that represent genuinely different approaches: different persona framing, different problem references, different conversational register. If you cannot explain in a single sentence what is different about each variant and why you expect them to perform differently, the test will not generate useful data.

Step 3: Split your prospect list evenly

Divide your prospect source evenly between the two campaigns. If you have a LinkedIn search with 1,000 profiles, run 500 through Variant A and 500 through Variant B. Aimfox lets you set a prospect limit per campaign, so you can control this precisely.

Avoid testing on lists smaller than 200 total (100 per variant). Below that, random variation in who happens to see your note on a given day can produce misleading results.

How to split the list for comparability:

If using a LinkedIn search, do not simply send Variant A to the first 500 results and Variant B to the next 500 results. LinkedIn search results are ordered and the first 500 may be systematically different from the next 500 (newer connections, more active users, etc.). A cleaner approach is to use a third-party sourcing tool or Quarvio to get the prospect list first, then randomly split it before importing into Aimfox.

If you are sourcing directly from LinkedIn's search in Aimfox, accept that there may be some ordering bias and account for it by running the test long enough for both variants to reach the same random mix of the audience.

Step 4: Set identical campaign parameters

Configure both campaigns with identical parameters except for the message being tested:

Same LinkedIn account
Same daily limit
Same working hours and timezone
Same follow-up sequence (if testing connection notes) or same connection note (if testing follow-up messages)
Same start date

Why identical parameters matter:

If Variant A runs Monday–Friday and Variant B runs Monday–Sunday, any difference in acceptance rate may be caused by the weekend sends rather than the message. If Variant A has a daily limit of 20 and Variant B has a limit of 30, Variant B may reach different prospect types due to the extended reach. Every parameter difference is a potential confounding variable that makes the test results uninterpretable.

Step 5: Run the test for at least 2 weeks

Let both campaigns run simultaneously for at least 2 weeks before evaluating results. This ensures both variants have sent a comparable number of requests and reduces the influence of day-of-week variation (some days have higher response rates on LinkedIn than others).

Monitor but do not adjust the campaigns mid-test. Changing a variable mid-way invalidates the comparison.

What to monitor during the test (without acting on it):

Check the campaigns every 3–4 days to confirm both are running correctly: both are sending at the configured daily rate, no technical errors, both following the same schedule. If one campaign pauses due to a technical issue and the other continues, the test data is compromised for that period. Note any interruptions in the tracking document and adjust the planned test end date to compensate.

Do not check the performance metrics until the test period is complete. Checking intermediate results and then continuing creates a temptation to end the test early if one variant looks like a clear leader — but early-stage data is not reliable data, and ending the test early risks acting on a false lead.

Step 6: Evaluate results correctly

After 2 weeks or once each variant has at least 100 requests sent, compare:

Acceptance rate: Percentage of requests that were accepted
Reply rate: Percentage of accepted connections who replied to any sequence step
Conversations generated: Total number of active two-way conversations produced

The variant with a statistically meaningful advantage on acceptance rate wins. "Meaningful" means a consistent difference of 5 percentage points or more across the full test period, not a single-day spike.

How to read Aimfox analytics:

In Aimfox's analytics dashboard, each campaign shows connection request count, acceptance count, acceptance rate, reply count, and reply rate. For an A/B test evaluation, open both campaign analytics views simultaneously and compare:

Total requests sent per variant — confirm they are comparable (within 10% of each other)
Acceptance rate per variant — the primary metric for connection note tests
Reply rate per variant — the primary metric for follow-up message tests
Any notable spikes or drops in either campaign — if one campaign had a technical interruption, its data for that period is less reliable

Handling inconclusive results:

If the two variants produce results within 3 percentage points of each other after a full test, the result is inconclusive. This does not mean the test failed — it means the two variants perform similarly for your audience. Retire both and write two new variants that differ more fundamentally. An inconclusive result is valuable information: it tells you that the specific variable you tested does not significantly affect acceptance rate for your audience, which narrows the search space for future tests.

Step 7: Scale the winner and retire the loser

Pause the losing campaign variant. Scale the winning variant to your full prospect list. Run it as your standard campaign until the next testing cycle.

After the winning variant has run for 3–4 weeks at full scale, design the next A/B test. Iterate one variable at a time. Over 3–4 testing cycles, your connection notes and follow-up messages will be significantly better-tuned to your specific audience than any generic template.

The multi-variable testing framework: testing beyond connection notes

After optimizing connection notes and immediate follow-up messages, the most impactful variables to test are often structural rather than copy-based:

Sequence length testing:

Compare a 3-step sequence (connection note + 2 follow-ups) against a 4-step sequence (connection note + 3 follow-ups). Measure total conversations generated per 100 connection requests. This tells you whether the additional follow-up step is generating conversations or just increasing send volume without proportional return.

Sequence timing testing:

Compare a tight sequence (follow-up sent 2 days after connection) versus a spaced sequence (follow-up sent 5 days after connection). Acceptance rate will be the same since timing does not affect connection note performance, but reply rate to follow-up messages may differ based on when the prospect receives the follow-up relative to the recency of the connection.

Segmentation approach testing:

Compare a single campaign targeting a broad audience (VP Sales + Director of Sales + CRO) against three segmented campaigns, each with tailored messaging for the specific role. The segmented campaigns may produce higher acceptance and reply rates if the messaging is genuinely differentiated for each role, or they may perform similarly if the audience's concerns are homogeneous enough that role-specific targeting does not add value.

Channel combination testing:

For prospects who receive both Aimfox LinkedIn outreach and Instantly cold email, test whether LinkedIn-first or email-first produces better total conversion. Run LinkedIn outreach on Cohort A, email outreach on Cohort B, and LinkedIn + email on Cohort C. The multi-channel cohort typically outperforms either single-channel approach by 40–60% per Woodpecker's multichannel outreach research, but the testing confirms this for your specific audience.

Connecting LinkedIn A/B test results to email outreach improvements

LinkedIn A/B test data reveals more than just which LinkedIn message performs better — it tells you what messaging your ICP responds to. These insights transfer directly to cold email copy.

If Variant A's role-specific framing outperformed Variant B's company-specific framing in LinkedIn tests: Apply the role-specific framing to cold email personalization. Write email templates that reference what people in the prospect's role deal with rather than what you know about the prospect's company.

If the problem-first follow-up message outperformed the value-first message: Use problem-first framing in email body copy. Open cold emails with the problem the reader is experiencing rather than the outcome your solution delivers.

If shorter connection notes (40 words) outperformed longer ones (60 words): Apply brevity to cold email. If prospects are accepting shorter LinkedIn messages at higher rates, they likely prefer shorter emails as well. Test shorter email copy in parallel with your LinkedIn findings.

LinkedIn and cold email are different channels with different conventions, but they share the same audience (your ICP) and therefore share the same underlying preferences about how they like to receive outreach. LinkedIn A/B test data is free audience research that applies across channels.

What practitioners say

"We A/B tested connection note variants in Aimfox over a 3-week period on the same target audience. Variant B — which referenced the prospect's company industry rather than their job title — produced a 31% acceptance rate versus 18% for Variant A. That improvement changed our entire campaign baseline." — G2 reviewer, Aimfox reviews on G2

Aimfox holds a 4.6/5 rating on G2. Woodpecker's multichannel outreach research shows that combining optimised LinkedIn messages with email outreach produces 40–60% higher reply rates than either channel alone — A/B testing ensures both channels are running optimised copy.

Configuration reference for Aimfox A/B testing

Campaign parameter settings for a valid A/B test

Parameter	Required setting	Why
Daily connection request limit	Same for both campaigns	Different limits create different audience reach patterns
Working hours	Identical (e.g., Mon–Fri 9am–5pm)	Prevents day-of-week bias from different schedules
LinkedIn account	Same account for both campaigns	Different accounts have different network reach
Follow-up sequence	Identical (when testing connection note)	Isolates the connection note as the only variable
Connection note	Identical (when testing follow-up messages)	Isolates follow-up messages as the variable
Start date	Same day	Prevents temporal bias from different market conditions

Sample size requirements by desired confidence level

Confidence level	Minimum sends per variant	Minimum difference to detect
Indicative (act with caution)	100 per variant	8+ percentage points
Reliable	200 per variant	5+ percentage points
High confidence	500 per variant	3+ percentage points
Very high confidence	1,000+ per variant	1–2 percentage points

For most LinkedIn outreach A/B tests, the "Reliable" threshold (200 per variant, 5+ point difference) is the appropriate standard. Tests below this threshold should inform future testing rather than immediately change campaign strategy.

Metrics to track during Aimfox A/B tests

Metric	When to track	What it tells you
Requests sent	Daily (monitoring only)	Confirms both variants are running at similar pace
Acceptance rate	After test completion	Primary metric for connection note tests
Reply rate	After test completion	Primary metric for follow-up message tests
Conversation rate	After test completion	Overall efficiency metric (conversations per 100 requests)
Pending acceptance	After test completion	How many requests are still awaiting response

Pending acceptances — requests that have been sent but not yet accepted or declined — should be excluded from the acceptance rate calculation when comparing results, unless both variants have the same pending-to-completed ratio.

A/B test result interpretation guide

Result pattern	Interpretation	Action
Variant A outperforms by 5+ points consistently	Clear winner	Scale Variant A immediately, retire B
Variant B outperforms by 5+ points consistently	Clear winner	Scale Variant B immediately, retire A
Within 3 points of each other	Inconclusive	Design a new test with more different variants
Early lead reversed by end of test	False early lead	Use end-of-test data only; early data was noise
One variant performed better on acceptance, other on reply rate	Split result	Keep connection note from acceptance winner; use follow-up from reply-rate winner

Troubleshooting common A/B test problems in Aimfox

Problem 1: Variant A is clearly winning but Variant B is barely sending requests

Symptoms: After 10 days, Variant A has sent 180 requests and Variant B has sent only 45 requests. The acceptance rate comparison shows Variant A at 28% and Variant B at 31%, but the sample size for Variant B is too small to trust.

Cause: The most common cause is that Variant B's campaign has a lower daily limit configured, or Variant B's LinkedIn search returned fewer prospects than expected. If LinkedIn's search for Variant B's defined audience is smaller, the campaign exhausts the list faster and then idles.

Fix: Check Variant B's campaign configuration to confirm the daily limit matches Variant A. If the prospect pool was smaller than expected, add additional prospects to Variant B's campaign from the same audience type. Do not restart the test — just add prospects and allow Variant B to catch up. Extend the test period by however many days it takes Variant B to reach 100+ sends.

Problem 2: Acceptance rates are unusually low for both variants (under 10%)

Symptoms: Both Variant A and Variant B are showing acceptance rates of 6–8%, well below the typical 15–35% range. The messages seem well-written.

Cause: The acceptance rate problem is most likely not the message — it is the audience quality. Connection acceptance rates this low usually indicate that a large portion of the prospect list has LinkedIn profiles that are inactive or not regularly monitored. Many LinkedIn accounts remain active in LinkedIn's search database but are checked infrequently by the user.

Fix: Check the profile quality of the prospect list used in the test. Look for: low profile completion scores, no recent activity (no recent posts or engagement), job titles that suggest the account is a placeholder rather than an active user. Replace low-quality profiles with higher-engagement contacts from Quarvio. Acceptance rates in the 15–35% range indicate an active, engaged prospect pool — if rates are consistently below 10% on tested messaging, the audience quality is the constraint, not the messaging.

Problem 3: Test ran successfully but results were inconsistent week-over-week

Symptoms: Reviewing the weekly data in Aimfox analytics, Variant A outperformed Variant B in Week 1 (32% vs 19%), then Variant B outperformed in Week 2 (25% vs 38%), leaving no clear winner over the full two-week period.

Cause: This pattern typically reflects audience heterogeneity: the prospect list contains two distinct audience types that respond differently to each variant. The LinkedIn search may have returned a mix of company sizes, seniority levels, or industries that have different preferences, and each variant happened to reach a different mix in each week.

Fix: Segment the prospect list more narrowly before the next test. Split by company size (small vs mid-market), seniority level (VP vs Director), or industry (SaaS vs services). Run separate A/B tests for each segment. The audience-specific results will be more reliable than results across a heterogeneous mix.

Problem 4: Reply rate is high for Variant B but acceptance rate is higher for Variant A

Symptoms: Variant A has 32% acceptance rate. Variant B has 24% acceptance rate. But the prospects who accepted from Variant B are replying at 22%, while Variant A's accepted connections reply at only 9%.

Cause: Variant A's connection note is more accessible or appealing but may set lower expectations, attracting connections who are not genuinely interested in the topic. Variant B's connection note is more selective — it attracts fewer but more engaged connections, producing a higher reply rate among those who do accept.

Fix: Calculate the conversations per 100 requests for each variant. Variant A: 100 requests × 32% acceptance × 9% reply = 2.9 conversations. Variant B: 100 requests × 24% acceptance × 22% reply = 5.3 conversations. Variant B generates more conversations per request despite having a lower acceptance rate. In this scenario, the higher reply rate variant is the functional winner — use conversation rate (not acceptance rate alone) as the decision metric.

Problem 5: Both variants are now at 100+ sends, but the winning variant is getting worse over time

Symptoms: At 100 sends per variant, Variant A was winning 33% to 19%. At 250 sends, the gap has narrowed to 29% to 24%. At 400 sends, both are at around 27%.

Cause: The early test audience within the LinkedIn search may have been the most active and engaged segment of the total population — these prospects accepted at a higher rate because they are consistently active on LinkedIn. As the campaign reaches deeper into the search results, the audience quality declines and both variants' performance converges.

Fix: The correct interpretation here is that the campaign is reaching audience quality saturation, not that the message variants are equal. The quality of the remaining prospect pool is the binding constraint, not the messaging. Rather than continuing the test, use both variants at the current best performance level and add fresh, high-quality prospects from Quarvio to restart on a cleaner audience sample.

Problem 6: Aimfox paused one campaign automatically during the test

Symptoms: Aimfox paused Variant B's campaign for 3 days mid-test. During those 3 days, Variant A continued running. Now the sends per variant are 310 vs 145.

Cause: LinkedIn safety features or Aimfox's internal safety controls paused the campaign. This may be due to the LinkedIn account approaching the connection request limit for the current account warmup level, a login issue, or a LinkedIn security check.

Fix: Restart Variant B and extend the test period to allow it to catch up to Variant A's send count. Document the pause dates in the tracking document. Do not use data from the overlap period (when A was running and B was paused) in the final comparison. Evaluate results only from the period when both campaigns were running simultaneously.

Problem 7: Testing is showing no improvement after 5 cycles of optimization

Symptoms: After 5 A/B testing cycles over 4 months, the best acceptance rate achieved is 22%, and the most recent test cycles are producing variants within 2–3 points of each other.

Cause: The campaign has likely reached the performance ceiling for the current audience type and LinkedIn outreach format. The connection note and follow-up message are as optimized as testing can achieve with this audience.

Fix: The next performance improvement requires changes outside the message: audience expansion (target a related but different ICP segment), channel expansion (add Instantly cold email as a parallel channel to the same prospects), or offer development (change what the conversation is about rather than just how it is initiated). A/B testing optimizes the path to a conversation — if the ceiling has been reached, the next variable to test is the offer itself, not the mechanics of how it is delivered.

Problem 8: Results look great in Aimfox analytics but no meetings are being booked

Symptoms: Aimfox shows 28% acceptance rate and 15% reply rate, which look like strong numbers. But over 8 weeks of running, only 2 meetings have been booked from LinkedIn outreach.

Cause: High reply rate with low meeting rate means the conversations are not converting. Most replies are likely "not interested" responses or neutral acknowledgments that do not progress toward a meeting. This is a different problem from what A/B testing of the message variants addresses — it is a conversation quality problem, not a message quality problem.

Fix: Review the content of the replies coming in through Aimfox Unibox. If replies are mostly declines, the ICP targeting may be off — the people accepting and replying are not the right buyers. If replies are neutral ("thanks, I'll keep it in mind"), the conversation is being ended too early without advancing to a meeting. The fix is likely in the Unibox conversation handling (how you respond to interested replies) or in the ICP targeting (using more selective audience criteria to focus on prospects with genuine buying intent). A/B testing the initial message will not solve a conversion problem at the conversation stage.

Advanced A/B testing tactics for Aimfox campaigns

Building a quarterly test plan before the quarter starts

Instead of running tests reactively ("let's test a new message next month"), plan the full quarter's test sequence at the start. A quarterly plan for a mid-volume LinkedIn operation:

Month 1: Connection note test (role-specific vs problem-specific opening) Month 2: Follow-up message 1 test (value-first vs question-first) Month 3: Follow-up sequence length test (2-step vs 3-step) + timing test (2-day vs 5-day gap)

This approach ensures continuous optimization throughout the quarter without gaps between tests, and prevents optimization from stalling because no one planned the next test.

Segment-specific A/B testing for multi-ICP campaigns

Most B2B outreach teams target multiple ICP segments simultaneously (e.g., VP Sales at SaaS companies AND VP Marketing at agencies). Running the same A/B test across both segments averages the results, which may mask that Variant A is much stronger for SaaS VP Sales while Variant B is stronger for Marketing at agencies.

Run separate A/B tests for each ICP segment with at least 100 prospects per variant per segment. Record the winning variant for each segment separately and configure Aimfox campaigns with segment-specific messages rather than a single winner applied to all.

The control campaign: protecting against seasonal variation

When running A/B tests across multiple months, an external factor (LinkedIn algorithm change, industry news event, seasonal variation in LinkedIn activity) can affect acceptance rates independently of any message change. A control campaign — an always-running campaign using the best-performing message from all previous tests — provides a stable baseline for comparison.

If the control campaign's acceptance rate drops 8 points in January, and your current test variant also drops 8 points, the decline is likely seasonal. If the control holds steady but the new variant declines, the new variant is performing worse than the baseline — retire it.

Using LinkedIn follow-up test data to inform Instantly email sequences

When testing follow-up message timing in Aimfox (2 days vs 5 days after connection), the results reveal something about your ICP's "recency sensitivity" — whether they respond better when outreach is timely (2 days) or spaced out (5 days). This preference likely extends to cold email sequences as well.

If 2-day spacing outperforms 5-day in Aimfox follow-up testing, consider testing shorter sequence intervals in Instantly email sequences for the same ICP. Cross-channel preference alignment means the email sequence benefits from the LinkedIn A/B data without running separate email timing tests.

Building a cumulative performance record across testing cycles

Most testing programs treat each test as independent. A better approach maintains a cumulative performance record that shows how each successive test moved the overall campaign baseline:

Test cycle	Element tested	Variant A result	Variant B result	Winner	Baseline improvement
1	Note length	18% acceptance	26% acceptance	B (longer)	+8 points
2	Opening approach	26%	34%	B (problem-first)	+8 points
3	CTA	34%	37%	A (soft question)	+3 points
4	Follow-up timing	37%	40%	B (2-day gap)	+3 points

This table shows cumulative progress: 18% acceptance at start to 40% acceptance after 4 test cycles. That compounded improvement represents the real ROI of systematic testing.

Our actual stack

Need	Tool	Notes
Verified B2B contacts	Quarvio	One-time purchase, no subscription
Email inboxes	Inframail	Microsoft 365 inboxes, auto DNS
Cold email sending	Instantly	Sequences, warm-up, reply tracking
LinkedIn outreach	Aimfox	Connection campaigns, Unibox

Frequently asked questions

How many prospects do I need to run a meaningful A/B test in Aimfox?

At minimum 100 prospects per variant — 200 total. Below 100 per variant, random variation in individual acceptance behaviour produces misleading results. 200–300 per variant gives clearer data. If your prospect list is smaller than 200 total, run the full list on one variant first, then test the next campaign cycle.

How long should I run an Aimfox A/B test before evaluating results?

At least 2 weeks, or until each variant has sent at least 100 connection requests. Running for less than 2 weeks introduces day-of-week bias (acceptance rates vary across the working week). Do not evaluate results mid-test based on early data — wait for the test to complete.

Can I test multiple elements at the same time in Aimfox?

You can run multiple simultaneous campaigns testing different elements, but you should not change two variables within a single campaign pair. Test the connection note in one A/B setup and the follow-up Step 1 message in a separate A/B setup on a different audience. Mixing multiple variables in one test makes it impossible to attribute the result to a specific change.

What is the most important element to A/B test first in a LinkedIn campaign?

Start with the connection note. It is the first thing the prospect sees, and its performance (acceptance rate) determines the size of the audience available for all subsequent follow-up steps. A 15-percentage-point improvement in acceptance rate on a 500-prospect list creates roughly 75 additional prospects for follow-up, which compounds through the entire sequence.

What is the minimum acceptable acceptance rate for a LinkedIn connection campaign?

For cold LinkedIn outreach to B2B prospects, an acceptance rate below 15% indicates a problem — either with the connection note or the prospect list quality. The average acceptance rate for optimised connection notes to well-targeted audiences is 25–35% per G2 reviews of Aimfox. If your rate is below 15%, run an A/B test comparing a fundamentally different note approach rather than adjusting the existing note.

How do I handle A/B testing when I have multiple LinkedIn accounts in Aimfox?

If Aimfox is managing multiple LinkedIn accounts, run both variants from the same account to prevent account-level performance differences from confounding the test. If you want to test account-level performance (one account vs another), run that as a separate test with identical messages on each account, not in combination with message testing.

Can I use A/B test data from one ICP segment to infer what will work for another?

With caution. If you find that problem-first opening in connection notes significantly outperforms role-specific opening for VP Sales at SaaS companies, this finding may transfer to VP Sales at enterprise software companies (similar role and concerns) but may not transfer to Head of Recruiting at those same companies (different role, different concerns). Use cross-segment inference as a hypothesis for the next test, not as a conclusion that skips the test.

Should I keep testing once I have a strong result (e.g., 35% acceptance rate)?

Yes, but switch from message optimization to structural testing. At 35% acceptance rate, you have likely optimized the message itself to a near-ceiling. The next performance gains come from sequence structure (timing, length), audience segmentation (narrower ICP targeting), and channel coordination (adding email outreach). Continue testing, but broaden the scope of what you test.

How does Aimfox Unibox integrate with A/B testing?

Aimfox Unibox aggregates all conversation replies from all campaigns into a single inbox. During an A/B test, replies from both Variant A and Variant B appear in Unibox. Tag incoming replies by the campaign variant they came from (either manually or using Aimfox labels) so you can compare not just acceptance rate but conversation quality across variants. Variant A may have a higher acceptance rate but produce more low-quality "not interested" replies; Variant B may produce fewer acceptances but more substantive conversations. Unibox data captures this quality dimension that the acceptance rate metric alone misses.

What should I do when a new test variant outperforms the previous champion?

When the new variant wins, update the champion in your tracking document and retire the previous champion from active campaign use. Begin designing the next test using the new champion as the baseline. Resist the temptation to run the previous champion alongside the new one "just to be sure" — this wastes prospect list and delays the next optimization cycle. Trust the test result and move to the next variable.

How do I know if my A/B testing program is generating compounding improvements?

Track your campaign baseline acceptance rate and conversation rate at the start of each month. If the metrics are not improving across testing cycles, review whether the tests are truly testing meaningfully different approaches or just variations within the same pattern. Compounding improvement requires each test cycle to find a genuine winner that outperforms the previous champion — if tests are consistently inconclusive, the variants are not different enough to generate data.

Can I use Aimfox A/B test results to inform LinkedIn Ads creative testing?

Yes. If a specific problem framing or audience persona outperforms others in connection note A/B tests, the same creative direction is worth testing in LinkedIn Sponsored Content. The A/B test data shows you what messaging resonates with your exact ICP on LinkedIn, which translates directly into ad creative strategy. The formats are different (single-person message vs paid ad) but the audience and messaging preferences are shared.

Test and optimise — then scale with verified contacts

A/B testing improves your message, but the right prospect list is what makes the result matter. Quarvio delivers pre-verified B2B contacts as a one-time purchase — no subscription, no recycled data — so your optimised Aimfox campaigns reach accurate decision-makers at scale.

Start your order on Quarvio →

Aimfox A/B testLinkedIn message split testAimfox AB testingLinkedIn outreach optimization

← Back to blog