Hi there!šŸ‘‹šŸ¼

I’m Tri Watanasuparp, a data scientist specializing in machine learning, NLP, and building intelligent systems that turn complex data into actionable insights.

Currently pursuing an M.S. in Data Science at Northeastern University

watanasuparp.t@northeastern.edu

experience
Lightcast
Sep 2025 – Present
Data Analyst (Applied NLP)

• Built and optimized NLP pipelines to classify Thai-language job titles, skills, and occupations across large-scale labor market datasets
• Designed rule-based and statistical parsing systems using Python, SQL, and RegEx, improving classifier precision and recall by ~20%
• Addressed challenges in Thai text processing (tokenization, normalization, and ambiguity) to improve model accuracy and consistency
• Partnered with engineering, product, and linguistics teams to deliver scalable, production-ready language processing solutions

Cognex Corporation
Jan 2023 – June 2023
IT Business Applications Co-op

• Analyzed enterprise-scale Salesforce and SAP datasets to identify inefficiencies and recommend data-driven process improvements
• Translated business requirements into technical workflows, improving system usability and reporting clarity
• Led documentation, tracking, and delivery planning for global initiatives, improving execution efficiency and cross-team communication

Silicon Valley Bank
Jan 2022 – June 2022
Digital Marketing & Analytics Co-op

• Designed and analyzed A/B tests for email campaigns, improving click-through rates (CTR) by ~10–15%
• Built dashboards in Google Analytics to track conversion rates, engagement metrics, and user behavior
• Delivered actionable insights that informed data-driven marketing strategy and campaign optimization

projects
TasteMatch CA — Hybrid Recommender System (NLP + RAG)
  • • Built a hybrid restaurant recommender system combining collaborative filtering (ALS), content-based filtering (TF-IDF), and Retrieval-Augmented Generation (RAG)
  • • Developed NLP-based query parser to extract user intent (cuisine, price, location)
  • • Improved recommendation relevance by using feature engineering
  • • Evaluated performance using Precision@5 (0.82), NDCG (0.83), and MRR (0.85)
  • • Deployed an interactive Streamlit application for real-time recommendations
Python NLP RAG TF-IDF ALS ChromaDB Streamlit

View Code

MassWeatherHub — Real-Time Weather Data Dashboard
  • • Built a real-time data pipeline integrating multiple weather APIs (Open-Meteo, OpenWeather)
  • • Processed and merged geospatial datasets (TIGER/Line Shapefiles)
  • • Developed interactive dashboards and maps using Flask and Folium
  • • Improved data processing efficiency by ~30% through optimized transformations
  • • Enabled exploration of real-time and historical weather data
Python Flask REST APIs Folium Pandas Geospatial Data
Exoplanet Habitability Prediction (Machine Learning)
  • • Developed ML models (SVM, Random Forest, Logistic Regression) for habitability prediction
  • • Achieved F1-score up to ~0.78 through feature engineering and tuning
  • • Performed cross-validation and hyperparameter optimization
  • • Addressed class imbalance and multicollinearity
  • • Built visualizations to communicate insights
Python Scikit-learn Pandas NumPy SVM Random Forest EDA
skills
Core
  • Python
  • SQL (MySQL)
  • R
  • Machine Learning
  • Statistical Inference
  • A/B Testing
NLP / AI
  • Natural Language Processing
  • RegEx
  • NLTK
  • Embeddings
  • RAG
  • LLMs
Data & Visualization
  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Tableau
  • EDA
Tools & Systems
  • PyTorch
  • Apache Spark
  • Flask
  • Streamlit
  • DBT
  • AWS
  • Git