A/B Test Analysis Using Chi-Square & Python Visualizations
- Katie Wojciechowski
- Sep 1
- 3 min read
In this project, I designed a theoretical A/B testing framework to evaluate the potential impact of personalized outreach on student re-engagement and re-enrollment. The goal was to simulate how an education-focused organization might structure an experiment to optimize email campaigns targeting “stopped-out” students—those who left college before earning a degree.
The Challenge
Many students who leave college temporarily never return. Re-engaging these students requires thoughtful outreach that motivates them to take actionable steps—clicking links, booking advising calls, and ultimately re-enrolling.
My goal was to test whether personalized emails would outperform standard outreach emails in driving these key actions.
The Experiment Setup
To explore this question, I structured a simulated A/B test with the following design:
Group A (Control Group): Would receive a standard, non-personalized outreach email—generic language, no mention of the student's background or academic history.
Group B (Test Group): Would receive a theoretically personalized email—addressed by name, referencing their last known major, total credits completed, and emphasizing how close they are to a degree.
This design mirrors real-world experiments where personalization is used to increase user engagement by making communication more relevant and motivating.
Simulated Metrics Tracked
To simulate the full conversion journey, I tracked three key stages of engagement:
Click-Through Rate (CTR): Percentage of students who clicked the email’s link.
Booking Rate: Percentage of those who clicked who then booked an advising call.
Re-Enrollment Rate: Percentage of those who booked a call who theoretically re-enrolled.
This mimics a funnel model frequently used in education and marketing analytics to measure engagement effectiveness.
Data Generation & Analysis
The code in this project performs the following:
Data Simulation using Numpy and Pandas
Generates a dataset of 500 stopped-out students with random attributes such as:
Age
Last known major
Credits completed
Days since last enrollment
Randomly assigns each student to Group A or Group B
Simulates behavioral outcomes based on group assignment (e.g., Group B has a higher chance of clicking and re-enrolling)
Produces the CSV file with the simulated dataset
Data Cleaning & Aggregation
Uses a Pandas dataframe to group and summarize conversion behavior by email group
Calculates key funnel metrics: click-through rate, call booking rate, and re-enrollment rate
Statistical Testing
Builds contingency tables for each funnel stage
Runs Chi-Square tests (via scipy.stats) to assess whether group differences are statistically significant
Outputs p-values and test interpretations for each metric
Data Visualization
Constructs a stacked bar chart to visualize the conversion funnel per group using matplotlib and seaborn
Annotates bars with percentage values for clarity
Customizes labels, colors, and layout for professional presentation
The full project is reproducible and environment-independent, using only open-source libraries: pandas, numpy, scipy, matplotlib, and seaborn.
Visualization

Interpretation (Theoretical):
Personalized emails (Group B) led to a higher initial click-through rate (+4.7 percentage points), suggesting that tailored messaging is more effective at grabbing attention and prompting students to take action.
Booking rates were nearly identical between the two groups, indicating that once students clicked, both messages were similarly effective in motivating them to schedule a call.
Surprisingly, the re-enrollment rate was higher for Group A (63.0% vs. 48.5%), which may suggest that students who responded to the standard email were more committed—or that personalization raised interest but not necessarily follow-through.
Insights from the Framework
While this was a simulated scenario, building the framework gave me practice in:
How to structure controlled experiments for outreach optimization
How to think about metrics that may or may not be actionable for educational re-engagement
How statistical testing can validate (or disprove) assumptions about communication effectiveness
How data visualization plays a role in communicating results to non-technical stakeholders
Applications & Extensions
This framework can be adapted for:
Real-world campaigns targeting re-enrollment
Testing different messaging channels (e.g., SMS vs. email)
Experimenting with advisor follow-up strategies
Automation of student segmentation and messaging personalization






Comments