Overview
The objective of this project was to evaluate the impact of a training program on employee performance. By comparing pre-training and post-training scores, I aimed to determine if the program led to any significant improvements in the assessed competencies.
Objectives
- Determine statistical significance in the difference in scores.
- Visualize the distribution of scores pre- and post-training for easy comparison.
- Assess the performance of employees before and after undergoing training.
Data Visualization

Methodology
- Data Simulation: Generated a simulated dataset mimicking scores of employees before and after they undergo training.
- Paired t-test: Executed a paired t-test on the simulated data to ascertain if there’s a significant difference in the scores.
- Data Visualization: Data was reshaped for a side-by-side comparison of pre-training and post-training scores. Boxplots, created with the
ggplot2
package in R, were used to visualize score distributions.
Tools & Technologies
- R: For statistical analysis and data visualization.
- ggplot2: Employed for creating plots to compare pre- and post-training scores.
Key Results
- There was a noticeable difference in the scores of the simulated employees after undergoing training. The resulting p-value was less than 0.05 (p = 0.0002188), indicating a statistically significant difference between the pre-training and post-training scores.
- The average improvement in scores after the training was approximately 5.48 points.
Code & Resources
data_1 <- data.frame(
employee_id = 1:100,
pre_training_score = rnorm(100, 50, 10),
post_training_score = rnorm(100, 75, 10)
)
t.test(data_1$pre_training_score, data_1$post_training_score, paired = TRUE)
data_1_long$Training_Phase <- factor(data_1_long$Training_Phase, levels = c("pre_training_score", "post_training_score"))
ggplot(data_1_long, aes(x = Training_Phase, y = Score, fill = Training_Phase)) +
geom_boxplot() +
labs(y="Score", x="Training Phase", title="Distribution of Pre vs Post Training Scores", fill="Training Phase") +
theme_minimal() +
scale_fill_manual(values = c("pre_training_score" = "#F8766D", "post_training_score" = "#00BFC4")) +
scale_x_discrete(labels = c("pre_training_score" = "Pre-training", "post_training_score" = "Post-training"))