A/B Testing for Ecommerce: How to Run Split Tests That Actually Improve Sales
How A/B Testing Works
An A/B test divides your traffic into two groups. Group A sees the original version of a page (the control), and Group B sees a modified version (the variation). Both groups interact with the page naturally while the testing tool tracks a predefined conversion metric, typically purchases, add-to-cart actions, or click-throughs. After collecting enough data to be statistically meaningful, you compare the conversion rates of the two groups to determine whether the change made a real difference or whether the observed difference could be explained by random chance.
The key concept is statistical significance. A test showing a 5 percent improvement after 50 visitors per variation is meaningless because random variation at that sample size easily produces swings of 20 percent or more. You need hundreds or thousands of conversions per variation before the result is reliable. Most testing tools calculate significance automatically and display it as a confidence level, typically requiring 95 percent confidence before recommending a winner. This means there is only a 5 percent probability that the observed difference is due to chance rather than a real effect of your change.
A/B testing is different from simply making a change and comparing this week's numbers to last week's. Time-based comparisons are unreliable because traffic volume, visitor quality, seasonality, promotions, and dozens of other factors change from week to week. A/B testing controls for all of these variables because both groups are exposed to the same external conditions simultaneously. The only difference between the groups is the specific change you are testing, which is why the results are trustworthy.
Step by Step: Running Your First A/B Test
Start with pages that receive the most traffic and are closest to the purchase decision. Your product pages, cart page, and checkout page are almost always better testing targets than your homepage or category pages because changes to high-funnel pages affect every downstream conversion. Within the page, choose one specific element to test: the add-to-cart button (color, size, text, placement), the product image layout, the price display format, the shipping information placement, or the product description structure. Do not test multiple changes simultaneously in your first experiments because you will not be able to determine which change drove the result. Use your heatmap data and analytics to identify which element is most likely contributing to the drop-off you want to fix.
A hypothesis is not a guess. It is a structured statement based on data. Use this format: "Based on [observation], we believe that [specific change] will [improve specific metric] because [reasoning]." Example: "Based on session recordings showing that 35 percent of visitors on mobile scroll past the add-to-cart button without seeing it because it sits below a long product description, we believe moving the add-to-cart button above the description fold will increase mobile add-to-cart rate because visitors will encounter the purchase action before they start scrolling through content." This format forces you to connect the change to observed behavior and a measurable outcome, which makes the test meaningful regardless of whether it wins or loses.
Before launching the test, determine how many visitors each variation needs to detect a meaningful difference. Free sample size calculators (available from Optimizely, VWO, and Evan Miller's tools) require three inputs: your current conversion rate (the baseline), the minimum improvement you want to detect (the minimum detectable effect, usually 10 to 20 percent relative improvement), and the statistical significance level (usually 95 percent). If your product page converts at 3 percent and you want to detect a 15 percent relative improvement (from 3 to 3.45 percent), you need approximately 14,000 visitors per variation. If your page gets 1,000 visitors per day, that test needs about 28 days to complete. Knowing this upfront prevents the common mistake of ending tests too early.
The CRO tools guide covers specific platforms, but the general setup process is the same across tools. Create the experiment, define the original page as the control, build the variation with your one specific change, set the traffic allocation (50/50 is standard), define the primary conversion metric (the one that determines the winner), and add any secondary metrics you want to monitor. Set the test duration based on your sample size calculation. Double-check that the variation renders correctly on all device types by previewing the page on desktop, tablet, and mobile before launching.
This is where discipline matters most. The first few days of data will look dramatic, with one variation often showing a 30 or 40 percent improvement or decline. These early numbers are statistical noise, not real results. Resist the urge to call the test early. Let it run for at least two complete weeks to capture day-of-week variation in visitor behavior (shopping patterns differ between weekdays and weekends), and do not stop until you reach the sample size calculated in step 3. If your testing tool shows "statistical significance" before the minimum runtime, let it continue running anyway, because premature significance claims from small samples frequently reverse as more data accumulates.
When the test reaches your predetermined sample size and shows at least 95 percent significance, examine the results across segments. Check whether the winning variation performs consistently across devices (desktop versus mobile), traffic sources (organic versus paid versus direct), and visitor types (new versus returning). If the variation wins across all major segments, implement it permanently by making the change live for all visitors. If it wins overall but loses for a specific segment (for example, improves desktop but hurts mobile), you may need a device-specific implementation. Document the full result in a testing log including the hypothesis, the data that inspired it, the measured outcome, and the lesson learned. This log becomes increasingly valuable as you build institutional knowledge about what works for your specific audience.
What to Test First
Product page add-to-cart buttons are the most common first test for a reason: the button is the critical conversion point, variations are easy to create, and the metric (add-to-cart rate) accumulates data quickly. Test the button text ("Add to Cart" versus "Buy Now" versus "Add to Bag"), color (high-contrast versus your current color), size (larger versus current), and position (above fold versus below description). Each of these is a separate test, not one combined change.
Product image presentation is another high-impact first test. Try larger images versus your current size, lifestyle images versus white-background product shots, adding video to the image gallery versus static-only, or showing the image gallery in a grid versus a slider. Product images are the most viewed element on any product page, and changes to image presentation frequently produce add-to-cart rate improvements of 10 to 25 percent. The product page design guide covers the specific image formats and layouts that testing data shows work best for ecommerce.
Checkout page simplification consistently tests as a winner. If your checkout has multiple pages, test a single-page checkout against the multi-step version. If you require account creation, test offering guest checkout. If you show the order summary in a sidebar, test moving it inline above the payment form on mobile. If your forms have many fields, test removing optional fields (like "Company Name" or "Address Line 2") and only showing them when requested. Each of these changes reduces friction at the highest-intent stage of your funnel, where every percentage point of improvement translates directly to completed sales.
Common A/B Testing Mistakes
Testing on pages with insufficient traffic produces inconclusive results that waste weeks of effort. A page with 100 visitors per day cannot reliably test subtle conversion rate differences because the required sample size would mean running the test for months. Focus your testing on pages with at least 1,000 daily visitors, or accept that you can only detect large differences (20 percent or more relative improvement) on lower-traffic pages.
Testing trivial elements like button colors or font sizes before addressing fundamental page problems wastes your testing capacity. If your product page lacks reviews, has a single low-quality image, and buries the price below the fold, testing whether the add-to-cart button should be green or blue is optimizing the wrong thing. Fix the obvious, fundamental issues first (through direct implementation or testing), then test the subtle elements once the foundation is solid.
Running multiple tests on the same page simultaneously creates interaction effects that invalidate both tests. If you are testing a new product image layout and a new add-to-cart button at the same time on the same page, neither result is reliable because the two changes may amplify or cancel each other's effects. Run one test per page at a time, or use multivariate testing (MVT) tools that are specifically designed to handle multiple simultaneous variables, which require significantly larger sample sizes to reach significance.
Ignoring losing tests is a missed learning opportunity. When a test loses, the data tells you something valuable about your audience, specifically, that the change you hypothesized would help actually does not resonate with them. A well-documented losing test refines your understanding of your customers and prevents you from repeatedly testing similar ideas that your audience has already rejected. Over time, your testing log of wins and losses builds a detailed profile of what your specific customers respond to and what they do not.
