Statistics and Research Design

Overview

The objective of this project was to evaluate the impact of a training program on employee performance. By comparing pre-training and post-training scores, I aimed to determine if the program led to any significant improvements in the assessed competencies.

Objectives

  • Determine statistical significance in the difference in scores.
  • Visualize the distribution of scores pre- and post-training for easy comparison.
  • Assess the performance of employees before and after undergoing training.

Data Visualization

Methodology

  • Data Simulation: Generated a simulated dataset mimicking scores of employees before and after they undergo training.
  • Paired t-test: Executed a paired t-test on the simulated data to ascertain if there’s a significant difference in the scores.
  • Data Visualization: Data was reshaped for a side-by-side comparison of pre-training and post-training scores. Boxplots, created with the ggplot2 package in R, were used to visualize score distributions.

Tools & Technologies

  • R: For statistical analysis and data visualization.
  • ggplot2: Employed for creating plots to compare pre- and post-training scores.

Key Results

  • There was a noticeable difference in the scores of the simulated employees after undergoing training. The resulting p-value was less than 0.05 (p = 0.0002188), indicating a statistically significant difference between the pre-training and post-training scores.
  • The average improvement in scores after the training was approximately 5.48 points.

Code & Resources

data_1 <- data.frame(
  employee_id = 1:100,
  pre_training_score = rnorm(100, 50, 10),
  post_training_score = rnorm(100, 75, 10)
)
t.test(data_1$pre_training_score, data_1$post_training_score, paired = TRUE)


data_1_long$Training_Phase <- factor(data_1_long$Training_Phase, levels = c("pre_training_score", "post_training_score"))

ggplot(data_1_long, aes(x = Training_Phase, y = Score, fill = Training_Phase)) +
  geom_boxplot() +
  labs(y="Score", x="Training Phase", title="Distribution of Pre vs Post Training Scores", fill="Training Phase") +
  theme_minimal() +
  scale_fill_manual(values = c("pre_training_score" = "#F8766D", "post_training_score" = "#00BFC4")) +
  scale_x_discrete(labels = c("pre_training_score" = "Pre-training", "post_training_score" = "Post-training"))