Queen's University MMAI "DEEP LEARNING" Team Project

Mailman – AI-Powered Email Classification

A Deep Learning Solution for Enhanced Productivity

Date: March 13, 2022

Overview:

Mailman is an AI-driven productivity tool designed to automatically classify incoming emails into designated folders using deep learning and Natural Language Processing (NLP). Conceptualized as an Outlook add-in compatible with desktop and Office 365, its primary goal is to mitigate email overload and enhance organizational efficiency within enterprise environments.

The Business Problem:

Despite various communication tools, email remains a critical but often overwhelming communication channel for enterprises. Congested inboxes lead to lost time, reduced productivity, and challenges in navigating essential information. Existing email sorting tools often lack the user-friendliness or reliability required to manage high email volumes, with manual management estimated to consume around 80 hours per employee annually.

The Solution & AI Approach:

The core technical process involved:

  • Dataset: Utilized the Enron Email Dataset, a substantial corpus of real-world emails.
  • Data Preprocessing: Cleaned and prepared email text (subject and message content) through tokenization, lemmatization, and removal of stop words, punctuation, and digits. Explored email metadata like senders, recipients, and subjects.
  • Word Embeddings:
    • Experimented with Word2Vec (CBOW architecture) and pre-trained GloVe embeddings to convert text into numerical vectors, capturing semantic meaning.
    • Fine-tuned GloVe embeddings on the Enron dataset for improved contextual relevance.
  •  Clustering & Feature Engineering (Hybrid Approach):
    • Initially used K-means clustering on GloVe word embeddings to identify thematic groups representing potential email categories/departments.
    • Engineered features by counting words associated with these derived departments for each email, effectively creating labeled data for the subsequent supervised classification task. This involved augmenting department keyword lists with synonyms found via cosine distance from the full GloVe dataset.

Deep Learning Classification

  • Evaluated both LSTM (Long Short-Term Memory) and Convolutional Neural Network (CNN) architectures for multi-class email classification.
  • Conducted extensive hyperparameter tuning (e.g., convolution layers, kernel size, learning rate, dropout rates, batch size, embedding dimensions, padding length) to optimize model performance.
  • The final, best-performing model was a CNN architecture using GloVe embeddings fine-tuned on the Enron dataset.

Model Performance & Projected Impact:

  • The optimized CNN model achieved 91.14% accuracy on the test set in classifying emails into their respective department folders.
  • The confusion matrix indicated strong performance in accurately classifying emails across all defined categories.
  • Automated classification through Mailman is projected to save approximately 80 hours per employee annually, reducing cognitive load, minimizing distractions, and enabling employees to focus on critical, value-adding tasks.

Deployment Plan:

Envisioned using TensorFlow Serving for scalable and maintainable model deployment, with potential for TF Lite for mobile applications.

TensorFlow service is a high-performance model deployment system that makes it easy to maintain and update the model over time in a production environment.

Continuous Training: Acknowledged the necessity of continuous model training (e.g., online learning) to adapt to data drift, concept drift (new business cases or terminology), and evolving user behavior.

Key AI Concepts & Technologies Applied:

  • Deep Learning (CNN, LSTM)
  • Natural Language Processing (NLP)
  • Word Embeddings (Word2Vec, GloVe)
  • K-means Clustering (for initial topic discovery and feature engineering)
  • Feature Engineering from Unstructured Text
  • Supervised Learning (Multi-class Classification)
  • TensorFlow (planned for deployment using TensorFlow Serving and TF Lite for mobile)