Hi, welcome to my project portfolio!

My name is Wouter and I am a Machine Learning Engineer and LLM enthusiast. At work I build production ML and LLM systems for large-scale business problems. Outside of that, I pick up projects out of genuine curiosity or practical need. This site documents both, with a focus on what was actually hard to build and why the architecture ended up the way it did. Feel free to reach me at wouterwijffels@gmail.com or via LinkedIn.

Background

My educational background is in industrial engineering and logistics optimization, where I learned to think about systems and how to improve them mathematically. My working career started at the intersection of workflow automation, analytics, and data science. After a few years, my interest shifted toward machine learning and language modelling, which is where most of my focus sits today. Outside of work I go to open source events, industry conferences, and meetups in the fields I follow.

Projects

Seven years of projects, from first educational data science projects to current ML systems serving millions of customers. Dates mark when a project started; most are still running or have evolved since. Impact figures for production projects are measured at scale; for MVP and hobby projects, numbers are early estimates or personal assessments. Click any card for the full business context, solution design, and technical challenges.

ProductionMVP / impact uncertainHobby

Customer Intent Engine

Live in internal beta with CS Analytics and Customer Research

AI agent that answers analytical questions about customer service calls and survey data. Fast MCP server with per-topic tools, AWS Agent Core with memory for campaign and brand context. Aggregated-only outputs by design for GDPR compliance.

AWS Agent Core, Fast MCP, AWS Bedrock, Snowflake, MLflow, CloudWatch

Content Personalization Engine

80% time reduction in content writing · 1 A/B test, still being evaluated

Snowflake Cortex reads marketing automation segments and generates a persona profile per group. AWS Bedrock then produces personalized campaign copy for each segment. All output is human-reviewed before go-live.

Snowflake Cortex, AWS Bedrock, AWS Lambda, Marketing Automation

2026

Customer Service Conversation Wrap-Up Automation Pipeline

10% reduction in call handling time · 1.3M conversations/year

Bedrock extracts 5 summary texts and 10 classification features per conversation, with PII masked before any LLM call. Validated output auto-files the CRM wrap-up and feeds downstream analytics. Processing 1.3M conversations per year, cutting agent handling time by 10%.

AWS Bedrock, Lambda, Snowflake, MLflow, Python

Model Performance Monitoring & Alerting System

>20 models · >400 pipelines · 12 data scientists

Central MLflow tracking server on SageMaker with S3 artifact store, provisioned via Terraform. SageMaker Pipeline evaluation steps compare each run against a hard threshold and a benchmark run; Alertmanager deduplicates and throttles before publishing to SNS. Teams channels subscribe directly for production alerts.

Terraform, MLflow, S3, AMP, SNS, GitLab, SageMaker Pipelines

Procurement Benchmark Webapp

80% time saving · testing with 1 buyer, not yet evaluated

A three-level hierarchical taxonomy clusters industrial PO line items by semantic similarity. Buyers search by part description or upload a supplier invoice for automatic line-by-line price comparison. Hybrid BM25 and dense vector search with cross-encoder reranking surfaces cluster benchmarks.

Sentence Transformers, BERTopic, OpenAI API, ChromaDB, Streamlit

2025

A/B Test Evaluation and Uplift Modelling Pipeline

~35% uplift improvement · monthly in 5 countries

Historical A/B experiment data trains an uplift model that predicts incremental campaign effect per customer. Results feed a Tableau dashboard used by marketers to target the segments most likely to respond. Retention rates increased 35% since the system went live.

SageMaker Pipelines, Snowflake, Tableau

Agentic Apartment Rental Scraper

Used to find my own apartment

Started as a HuggingFace agents course project, grew into a real tool I used to find my apartment. smolagents orchestrates scrapers across rental sites, checks for new listings every morning, and lets you query it over WhatsApp.

smolagents, Selenium, Python, WhatsApp API

2024

International Churn Model Deployment

40x better than random selection · monthly in 5 countries

Single XGBoost churn model deployed across 5 countries via SageMaker and GitLab CI/CD, each with its own data warehouse and marketing system. Churn indication is 40x better than random and 8x better than the best benchmark. Established the ML platform integration pattern now used by 20+ models.

AWS SageMaker, GitLab CI/CD, Snowflake, XGBoost, MLflow

Commodity & Gas Index Mapping

NLP pipeline maps public commodity and gas price indices to the raw material components in supplier contracts. The lag and magnitude between index movements and supplier price requests are measured and surfaced. Used as a verification tool during active supplier negotiations.

Python, NLP, Snowflake, Public Gas & Commodities APIs

2023

Supplier Offer Assessment Tool

>90% time reduction · >2000 negotiations/yr

Supplier price offers are captured via structured Excel templates, competitor prices are web-scraped into Snowflake, and margin analysis is surfaced per supplier, product, and category in Google Sheets. Buyers use the benchmark during live negotiations to assess whether an offer is competitive.

Snowflake, Google Sheets, Google Apps Script

2022

Time Series & Regression Supply Forecasting Model

8.5/10 research grade · company invested to productionize internationally

Exponential smoothing combined with regression on COVID-19 signals forecasts brewery supply demand under uncertainty. The model achieved 15% lower MAPE than incumbent projections and was presented to the Chief Procurement Officer. Extended to glass and carton supply categories after initial validation.

Time Series, Regression, Our World in Data

2021

Warehouse Stock Scraper

Saves 20 min/day for 5 operations teams · status uncertain

Python script parses HTML stock update emails from the logistics partner and writes structured Excel output. Replaced 20 minutes of daily manual copy-paste across operations teams in 4 countries. Eliminates transcription errors from copying HTML tables by hand.

Python, BeautifulSoup, Gmail API, openpyxl

2020

DataQuest.io: Data Science Foundations

A series of guided projects that covered the full ML stack from scratch: data cleaning and EDA, feature engineering, hyperparameter tuning, regularization, regression, and tree-based models. The capstone was optical character recognition on the USPS handwritten digit dataset using a PyTorch neural net built from scratch.

Python, Pandas, Scikit-learn, PyTorch

2019