Description:

  • This project consist of marketing data such as different type of test groups: advertisements and public service announcements.
  • Total amount of "test group" the customers have seen. (total ads)
  • The day where the customer have seen the "test group" the MOST. (most ads day)
  • The time where the customer have seen the largest amount of "test group". (most ad hours)
  • Conversion of customer that have bought the product. (converted)


  • Questions to ask:

  • Which variables have associataion with the customer's conversion rate?
  • What sort of hypothesis test or visualization to analyze the variable?
  • What are the interpretation of my findings?


  • Breakdown

    1. Preprocessing

    • Started off with cleaning the data, by looking for nulls, empty data, duplicated data and removing outliers with IQR method.
    • After cleaning, I proceeded with separating continuous and categorical variables.

    2. Visualization

    • I proceeded to plot out my continuous variable to see the distribution of the data in a histogram, which helped me understand the overall shape of the data distribution.
    • I also plot out categorical variables on a pie chart in terms of percentage format to understand the efficacy of conversion rate.

    3. Chi-Square Test for Independence

      Purpose:
    • Chi-Square Test is used to determine if there is an association between two categorical variables.
    • Residuals are utilised to determine observations and expectations, while Cramér's V is for the strength of association.

    • Interpretations:
    • There is a association between the day users see the ads and their conversion likelihood. However, cramer's V suggest that this association is relatively weak.
    • Users seeing ads most frequently on Monday tend to convert more, while Saturday seems to have the least conversion.
    • Proceeding, I did a feature where I categorise the time into "Morning", "Afternoon", "Evening", "Night".
    • This will then be tested as well to determine the association between timing and conversion rate.
    • Interpretations:
    • There is a statistically significant relationship between time slots and conversion rates, although the strength of this association is weak.
    • The Afternoon time slot has the highest conversion rates, suggesting it may be the best time to reach users.

    4. A/B Test for equality of proportion

    Purpose:
  • The purpose of this test is to evaluate the efficacy of different test groups (categorical variable).
  • The test groups consist of advertisements and public service announcements. I want to determine which group performs better.
  • This is where the Z-test for two proportions is applied.
  • Explanation:
  • Before I start with the Z-test, I need to check how many samples I need to assume normality.
  • This is when NormalIndPower() comes into the play.
  • Without an appripriate amount of sample size, this could lead to Type I or Type II errors.
  • Interpretations:
  • A higher Z-statistic (positive) suggests that the first group (success_ad) has a higher proportion compared to the second group (success_psa).
  • This indicates that there is a statistically significant difference in the proportions of conversions between the ad group and the PSA group. The difference in the proportions is not due to random chance.
  • However, we have to look into odds ratio to determine the likelihood of converting.
  • Interpretations:
  • There is a statistical significant difference in conversion rates between the ad and psa groups.
  • The conversion rate in ad group is slightly higher than in the PSA group, however the odds ratio suggest that the users exposed to ad have a lower odds of converting than those exposed to PSA.
  • While the ad group had a higher conversion percentage, the odds ratio indicates that users in the PSA group had a better relative chance of converting compared to the ad group, when accounting for odds rather than raw proportions.

    5. Continuous Variable Hypothesis Tests

    Explanation:
  • Before considering what sort of test I will be using, levene's and shapiro test will be implemented to determine the homogenous of variance and normality of the data.
  • Assumption of normality and equal homogeneity of variance: Independent t-test
  • Assumption of normality and not equal variances: Welch's t-test
  • When both assumptions fail: Mann-Whitney U Test
  • Interpretation:
  • Thus, Mann-Whitney U Test is utilised to determine the significant difference between total ads that the customers seen and conversion rate.
  • Following, effect-size is implemented to look at the relationship between the variables.
  • Strong inverse relationship between conversion and total ads: Users who did not convert saw more ads than those who converted.
  • 6. Conclusion

  • Ads vs. PSAs: While ads showed a marginally higher raw conversion rate, PSAs demonstrated a stronger relative likelihood of conversion based on the odds ratio.
  • Day and time of exposure are critical factors in conversion rates, indicating that strategically timed marketing campaigns could enhance user engagement and attraction.
  • Total ads test indicate that increased ad frequency leads to diminishing conversion rates. Reducing ad exposure while focusing on message quality could improve overall effectiveness.