A/B Test Analysis Using Chi-Square & Python Visualizations

Katie Wojciechowski
Sep 1
3 min read

In this project, I designed a theoretical A/B testing framework to evaluate the potential impact of personalized outreach on student re-engagement and re-enrollment. The goal was to simulate how an education-focused organization might structure an experiment to optimize email campaigns targeting “stopped-out” students—those who left college before earning a degree.

The Challenge

Many students who leave college temporarily never return. Re-engaging these students requires thoughtful outreach that motivates them to take actionable steps—clicking links, booking advising calls, and ultimately re-enrolling.

My goal was to test whether personalized emails would outperform standard outreach emails in driving these key actions.

The Experiment Setup

To explore this question, I structured a simulated A/B test with the following design:

Group A (Control Group): Would receive a standard, non-personalized outreach email—generic language, no mention of the student's background or academic history.
Group B (Test Group): Would receive a theoretically personalized email—addressed by name, referencing their last known major, total credits completed, and emphasizing how close they are to a degree.

This design mirrors real-world experiments where personalization is used to increase user engagement by making communication more relevant and motivating.

Simulated Metrics Tracked

To simulate the full conversion journey, I tracked three key stages of engagement:

Click-Through Rate (CTR): Percentage of students who clicked the email’s link.
Booking Rate: Percentage of those who clicked who then booked an advising call.
Re-Enrollment Rate: Percentage of those who booked a call who theoretically re-enrolled.

This mimics a funnel model frequently used in education and marketing analytics to measure engagement effectiveness.

Data Generation & Analysis

The code in this project performs the following:

Data Simulation using Numpy and Pandas
- Generates a dataset of 500 stopped-out students with random attributes such as:
  - Age
  - Last known major
  - Credits completed
  - Days since last enrollment
- Randomly assigns each student to Group A or Group B
- Simulates behavioral outcomes based on group assignment (e.g., Group B has a higher chance of clicking and re-enrolling)
- Produces the CSV file with the simulated dataset
Data Cleaning & Aggregation
- Uses a Pandas dataframe to group and summarize conversion behavior by email group
- Calculates key funnel metrics: click-through rate, call booking rate, and re-enrollment rate
Statistical Testing
- Builds contingency tables for each funnel stage
- Runs Chi-Square tests (via scipy.stats) to assess whether group differences are statistically significant
- Outputs p-values and test interpretations for each metric
Data Visualization
- Constructs a stacked bar chart to visualize the conversion funnel per group using matplotlib and seaborn
- Annotates bars with percentage values for clarity
- Customizes labels, colors, and layout for professional presentation

The full project is reproducible and environment-independent, using only open-source libraries: pandas, numpy, scipy, matplotlib, and seaborn.

Visualization

Interpretation (Theoretical):

Personalized emails (Group B) led to a higher initial click-through rate (+4.7 percentage points), suggesting that tailored messaging is more effective at grabbing attention and prompting students to take action.
Booking rates were nearly identical between the two groups, indicating that once students clicked, both messages were similarly effective in motivating them to schedule a call.
Surprisingly, the re-enrollment rate was higher for Group A (63.0% vs. 48.5%), which may suggest that students who responded to the standard email were more committed—or that personalization raised interest but not necessarily follow-through.

Insights from the Framework

While this was a simulated scenario, building the framework gave me practice in:

How to structure controlled experiments for outreach optimization
How to think about metrics that may or may not be actionable for educational re-engagement
How statistical testing can validate (or disprove) assumptions about communication effectiveness
How data visualization plays a role in communicating results to non-technical stakeholders

Applications & Extensions

This framework can be adapted for:

Real-world campaigns targeting re-enrollment
Testing different messaging channels (e.g., SMS vs. email)
Experimenting with advisor follow-up strategies
Automation of student segmentation and messaging personalization