top of page
Search

A/B Test Analysis Using Chi-Square & Python Visualizations

  • Writer: Katie Wojciechowski
    Katie Wojciechowski
  • Sep 1
  • 3 min read

In this project, I designed a theoretical A/B testing framework to evaluate the potential impact of personalized outreach on student re-engagement and re-enrollment. The goal was to simulate how an education-focused organization might structure an experiment to optimize email campaigns targeting “stopped-out” students—those who left college before earning a degree.


The Challenge

Many students who leave college temporarily never return. Re-engaging these students requires thoughtful outreach that motivates them to take actionable steps—clicking links, booking advising calls, and ultimately re-enrolling.


My goal was to test whether personalized emails would outperform standard outreach emails in driving these key actions.


The Experiment Setup

To explore this question, I structured a simulated A/B test with the following design:


  • Group A (Control Group): Would receive a standard, non-personalized outreach email—generic language, no mention of the student's background or academic history.

  • Group B (Test Group): Would receive a theoretically personalized email—addressed by name, referencing their last known major, total credits completed, and emphasizing how close they are to a degree.


This design mirrors real-world experiments where personalization is used to increase user engagement by making communication more relevant and motivating.


Simulated Metrics Tracked

To simulate the full conversion journey, I tracked three key stages of engagement:

  1. Click-Through Rate (CTR): Percentage of students who clicked the email’s link.

  2. Booking Rate: Percentage of those who clicked who then booked an advising call.

  3. Re-Enrollment Rate: Percentage of those who booked a call who theoretically re-enrolled.

This mimics a funnel model frequently used in education and marketing analytics to measure engagement effectiveness.


Data Generation & Analysis


The code in this project performs the following:

  • Data Simulation using Numpy and Pandas

    • Generates a dataset of 500 stopped-out students with random attributes such as:

      • Age

      • Last known major

      • Credits completed

      • Days since last enrollment

    • Randomly assigns each student to Group A or Group B

    • Simulates behavioral outcomes based on group assignment (e.g., Group B has a higher chance of clicking and re-enrolling)

    • Produces the CSV file with the simulated dataset

  • Data Cleaning & Aggregation

    • Uses a Pandas dataframe to group and summarize conversion behavior by email group

    • Calculates key funnel metrics: click-through rate, call booking rate, and re-enrollment rate

  • Statistical Testing

    • Builds contingency tables for each funnel stage

    • Runs Chi-Square tests (via scipy.stats) to assess whether group differences are statistically significant

    • Outputs p-values and test interpretations for each metric

  • Data Visualization

    • Constructs a stacked bar chart to visualize the conversion funnel per group using matplotlib and seaborn

    • Annotates bars with percentage values for clarity

    • Customizes labels, colors, and layout for professional presentation

The full project is reproducible and environment-independent, using only open-source libraries: pandas, numpy, scipy, matplotlib, and seaborn.

Visualization

ree

Interpretation (Theoretical):

  • Personalized emails (Group B) led to a higher initial click-through rate (+4.7 percentage points), suggesting that tailored messaging is more effective at grabbing attention and prompting students to take action.

  • Booking rates were nearly identical between the two groups, indicating that once students clicked, both messages were similarly effective in motivating them to schedule a call.

  • Surprisingly, the re-enrollment rate was higher for Group A (63.0% vs. 48.5%), which may suggest that students who responded to the standard email were more committed—or that personalization raised interest but not necessarily follow-through.


Insights from the Framework

While this was a simulated scenario, building the framework gave me practice in:

  • How to structure controlled experiments for outreach optimization

  • How to think about metrics that may or may not be actionable for educational re-engagement

  • How statistical testing can validate (or disprove) assumptions about communication effectiveness

  • How data visualization plays a role in communicating results to non-technical stakeholders


Applications & Extensions

This framework can be adapted for:

  • Real-world campaigns targeting re-enrollment

  • Testing different messaging channels (e.g., SMS vs. email)

  • Experimenting with advisor follow-up strategies

  • Automation of student segmentation and messaging personalization


 
 
 

Comments


©2025 by Katie Hayes. Proudly created with Wix.com

bottom of page