Queen's University MMAI "Capstone Project" Team Project

HelloFresh – AI-Driven Visitor Intent Prediction

Optimizing Conversion Rates Through Clickstream Data Analysis & Machine Learning

Date: July 20th, 2022

Overview & Business Challenge:

For any online business, particularly subscription services, maximizing the conversion rate of website/app visitors into paying customers is crucial for ROI and sustainable growth. A common challenge is understanding and mitigating visitor abandonment before a desired action (e.g., subscription) is completed. This project aimed to address this by developing predictive models using clickstream data to identify, in real-time, a user’s likelihood to convert, thereby enabling targeted interventions to improve conversion rates and reduce wasted marketing spend.

Approach & AI/ML Methodology:

  • Data Foundation: Analyzed 2.5 million rows of raw clickstream data (from BigQuery) representing 26,000 unique customer journeys on a leading subscription service platform.
  • Extensive Data Engineering: Performed comprehensive data cleaning and feature engineering to transform raw user “hits” into meaningful features. This included:
    • Indexing unique customer journeys.
    • Dimensionality reduction for high-cardinality features (e.g., “screen_name”).
    • Creating aggregated statistical features (for shallow models) and sequential features (for LSTM) representing user behavior patterns.

Predictive Modeling (Dual Track):

  • Shallow Machine Learning (Adjusted Statistical Behavior Predictive Method – ASBPM):
    • Utilized ensemble models like CatBoost on engineered statistical features.
    • Achieved a strong ROC AUC of 0.8715, demonstrating the ability to predict conversion intent without data leakage.
  • Deep Learning (LSTM Method):
    • Developed a Bidirectional LSTM (BiLSTM) model to capture the nuances of sequential user behavior.
    • The BiLSTM model (using 55 hit steps) achieved an F1-macro score of 87% and ROC AUC of 0.875 (earlier tests showed 0.905 with 50 steps), indicating high predictive accuracy, especially when addressing data imbalance with SMOTE.

Rigorous Model Evaluation: Assessed models using ROC AUC, Precision-Recall AUC, and F1-Macro scores to ensure robustness.

Key Outcomes & Business Impact:

  • High-Accuracy Intent Prediction: Developed models that accurately predicted a user’s likelihood to convert or abandon their journey.
  • Real-Time Intervention Capability: The models provide probabilities of abandonment, enabling businesses to:
    • Identify at-risk users in real-time.
    • Deploy targeted interventions (e.g., personalized chatbot offers, support outreach) at optimal points in the user journey (e.g., “Checkout” page for non-converting users, as identified by the LSTM).

Significant Projected Revenue Uplift:

  • The CatBoost model projected a potential 8% revenue increase with targeted interventions.
  • The LSTM model projected a potential 12% revenue increase, demonstrating the value of more complex sequential modeling.

Optimized Marketing Efficiency:

Allows for more intelligent resource allocation by focusing retention efforts on users most likely to abandon.

Key AI Concepts & Technologies Applied:

  • Machine Learning: CatBoost, Random Forest, Ensemble Methods.
  • Deep Learning: Bidirectional LSTM (BiLSTM) for Sequence Classification.
  • Clickstream Data Analysis: Processing large-scale user interaction data.
  • Advanced Feature Engineering: For both statistical and sequential modeling.
  • Big Data Technologies: Google BigQuery.
  • Predictive Modeling: Forecasting user conversion intent.
  • Data Imbalance Techniques: SMOTE.