About

Hey there! I am a Staff Machine Learning Engineer @ Meta. I have 13 years of industry experience, and delivered more than 20 innnotive data projects and products. I am also the founder of an open source project uniframe.io, a String-Matching-as-a-Service product.

I was a freelancer between November 2020 and July 2022, where I delivered data consulting and training. I do not do any consulting project for now.

Summary

Latest Professional Experiences

  • 2022.08 – present, Staff Machine Learning, Meta
  • 2021.01 – present, Founder, uniframe.io
  • 2020.11 – 2022.07, Lead Machine Learning Engineer (Contractor), Nike
  • 2018.10 – 2020.10, Data Science and Data Engineering Consultant, Amazon Web Services
  • 2015.11 – 2018.09, Senior Data Scientist, ING Wholesale Banking Advanced Analytics Team

Projects Experiences

  • [Banking][Prod] Next Best Action Model Migration and Productization (ML Ops on AWS)
  • [Banking][Prod] Personal Identifiable Information Detection (NLP, classification, Spark, Airflow)
  • [Banking][Prod] Company Similarity Detection (graph network embedding, similarity computation, Airflow)
  • [Banking][Prod] Large Scale Fuzzy Entity Matching (Spark, NLP, similarity computation, Airflow)
  • [Banking][Prod] Customer Segment Leads Detection (classification, Spark, name matching)
  • [Banking][Prod] Mortgage Arrears Repayment Classification (imbalance learning, Random Forest)
  • [Banking] Company Sales Prediction (time series forecasting, Seq2Seq)
  • [Banking][Prod] Company Financial Insight Dashboard (data pipeline, Spark, front-end, back-end, name matching, dashboarding)
  • [Banking] Data Lake Proof of Concept (AWS data lake, S3, Glue, Kafka, DynamoDB, ElasticSearch)
  • [Manufactory][Prod] Instagram Influencer Sale and Trend Prediction (classification, computer vision)
  • [Manufactory] Pill Images Classification on IoT Device (IoT, image classification)
  • [Energy][Prod] Dirty Cars and Smoking Person Detection in Petrol Station (IoT, image classification)
  • [Energy] Broken Device Detection on Electricity Transmission Network (object detection, multi-GPU)
  • [Energy][Prod] Customer Churn Prediction Model Data Pipeline (Spark, S3, Glue, SageMaker, AWS security)
  • [Retail][Prod] Advance Analytics Platform: managed JupyterHub and Airflow on Kubernetes (AWS EKS, Helm, Spark, Airflow, DevOps)
  • [Telecom] Real-time Streaming Data Ingestion and Personalized Recommendation PoC (Kafka, Glue, S3, DynamoDB, Lambda, Amazon Personalized)
  • [Telecom][Prod] Data Science Model Platform (SageMaker, Glue, StepFunction, Lambda, API Gateway)
  • [HR] Job Categorical Classification from 500K Job Description (NLP Classification, multi-class multi-label)
  • [Environment] Predictive Maintenance on Dike Sensor Data (time series clustering, Dynamic Time wrapping)
  • [Lottery][Prod] Data Platform (AWS S3, Redshift, DMS, Glue, CloudWatch)
  • [Media] Oracle Database Event-driven automated Migration (AWS DMS, Lambda, Cloudformation)

Open Source Contributions

  • Fast way to get top n results from sparse matrix multiplication, co-author (link), ~15000 daily download, 285 stars
  • Factorization Machine on Spark, co-author (link), 210 stars
  • Time Series Generator, co-author, (link), 66 starts
  • Industry Code Embedding, co-author (link)
  • Time Series Forecasting in PyTorch, author (link)
  • Amazon SageMaker Examples, contributor (link)

Blogs

  • Boosting the Selection of the Most Similar Entities in Large Scale Datasets (link)
  • Industry2vec: A Novel Approach to Get Industry Code Embedding (link)
  • Accurately Labeling Subjective Question-Answer Content Using BERT (link)
  • Some AWS SageMaker related blogs: (link), (link)

Talks (selected)

  • Machine Learning Industrialization, AWS User Group the Netherlands, Nov 2019 (link)
  • ML Ops and ML Industrialization, GoDataFest 2019, Amsterdam, Oct 2019 (link)
  • Peer Detection in Massive Payment Transaction Network, Open Data Science Conference, London, August 2018 (link)
  • Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Streaming, Spark+AI Summit 2018, San Francisco, June 2018 (link)

Kaggle Competition

  • Gold medal (6th in 1571 teams): Google Quest Q&A Labelling, 2020 (link)
  • Silver medal (43th in 924 teams): Give Me Some Credit, 2010 (link)

Publication

  • Classification System for Mortgage Arrear Management, Proceeding of IEEE Computational Intelligence for Financial Engineering and Economics 2014 (link)

Certifications