I am a data scientist with strong skills in software engineering, natural language
processing, and applied machine learning. Born and raised in Texas to Brazilian
immigrants, I'm a recent computer science graduate, having completed my Master's
degree at the University of Massachusetts Amherst in May 2026 and before that my
undergraduate degree at Texas A&M University in May 2024. Outside of computer/data
work, my other interests are in music, running, and reading.
I've recently completed my Master's degree and accepted a full-time position as a
data scientist at Walmart Global Tech! I will be relocating and based in the San
Francisco Bay Area starting in July. If you're interested in learning more about my
work, feel free to take a look at my
resume (last updated June 2026). If you would like to
connect with me, please feel free and reach out,
I'd love to talk!
Experience
Data Scientist III
|
Walmart Global Tech
Jul '26 - Present | Sunnyvale, CA
Incoming data scientist at Walmart Global Tech.
Data Scientist Intern
|
Walmart Global Tech
Jun '25 - Aug '25 | Sunnyvale, CA
Built a scalable attribute value extraction (AVE) pipeline for
Walmart product titles, improving missing Google Merchant Center
attribute coverage to support a projected 3% increase in organic
CTR and $10M growth in GMV.
Created a 20M+ sample AVE training dataset across thousands of
product attributes using weak supervision, public e-commerce
datasets, and internal Walmart product data.
Fine-tuned and benchmarked transformer models for product attribute
retrieval, improving F1 over the production baseline by up to 25%
and achieving an F1 score of 84% on a held-out test set.
Skills: Python (Transformers, PyTorch, Pandas, Sklearn),
PySpark, Google Cloud Platform (GCP), SQL, Git/Github, Natural
Language Processing (NLP), Deep Learning
Software Engineering Intern
|
Microsoft
May '24 - Aug '24 | Redmond, WA
Developed a pipeline for a summary statistics page for Windows
Autopilot, saving an estimated 40 hours per month among on-call
Microsoft engineers and Intune admins.
Gained expertise on privacy engineering and threat modelling by
leading team discussions through privacy and security reviews for
a new feature in Microsoft.
Skills: C#, Scala, TypeScript, Microsoft Power BI, Azure,
Git/Github, Software Engineering, Data Engineering
Software Engineering Intern
|
Microsoft
May '23 - Aug '23 | Redmond, WA
Developed a pipeline for an updated setting recommendation system
in collaboration with the machine learning team on Intune,
increasing scope to over 10k settings.
Implemented a setting recommendation feature as a domain-specific
task for Microsoft Security Copilot, enabling administrators to
query and receive personalized setting recommendations through
natural language prompts.
Skills: C#, Scala, TypeScript, Large Language Models (LLMs),
Git/Github, Software Engineering, Data Engineering
Peer Teacher
|
Department of Computer Science & Engineering,
Texas A&M University
Tutored over 50 students in computer science courses per semester
with a 95% approval rate.
Uploaded weekly review sessions to
YouTube, netting over 8,000 views
since January 2022 across all sections I've worked with (4,000
views coming from the first year alone), and increasing the review
viewership of CSCE 314 by 150% from Spring 2022 to Fall 2022.
Skills: C++, Python, Java, Data Structures, Computing
Systems, Software Engineering
Education
University of Massachusetts Amherst
Master of Science in Computer Science
| Awarded May '26
GPA: 3.818/4.000
Concentration: Data Science
Selected Coursework: Algorithms in Data Science,
Computational Biology & Bioinformatics, Decarbonization and Data
Science, Fixing Social Media, Information Retrieval, Machine Learning,
Natural Language Processing, Research Methods in Empirical Computer
Science, Simulation Methods, Systems for Data Science
Texas A&M University
Bachelor of Science in Computer Science
| Awarded May '24
GPA: 3.825/4.000
Minors: Mathematics, Statistics
Selected Coursework: Artificial Intelligence, Bayesian
Statistics, Computer Security, Data Analytics in Cybersecurity,
Databases, Information Retrieval, Machine Learning, Software
Engineering, Statistical Computing, Statistics
Projects
Note: Due to academic honor codes or other confidentiality agreements, not
all projects can have code be made public. If you would like to learn more about
a certain project, please send me an email and provide context to request access.
Only featured projects are shown by default. Toggle the checkbox to view all projects.
SciEncoder: A Tiny Domain-Specific Encoder
Oct '25 - May '26 |
Personal Project / UMass AI/ML Club|[link]
Designed a compact 54M-parameter BERT-style scientific encoder
pre-trained on ~1.5B tokens of multi-domain scientific text,
creating a lightweight alternative with 50% fewer parameters than
BERT Base.
Led a team of five through their first end-to-end data science
project, assigning work in data preprocessing and benchmarking, and
teaching concepts in NLP, computer science research, and GitHub.
Skills: Python, PyTorch, Hugging Face Transformers
Identifying
Connections in the
New
York Times
Connections [IConnNYC]
Oct '24 - May '25|
Personal Project / UMass AI/ML Club|[link]
Developed a multi-model inference pipeline combining word2vec,
a web-based word search API, and a knowledge graph, with
LLM-based semantic reasoning for answer validation and re-ranking.
Published a daily self-updating dataset on previous NYT Connections
games to Kaggle and HuggingFace,
leveraging APIs to automate data collection from over 500 games,
accumulating over 100 dataset downloads in the first month of
publication across platforms.
Achieved a 40% improvement in solve accuracy and 25% improvement
in first-round accuracy over baseline LLMs (GPT-4o) by
leveraging structured knowledge sources and LLM inferences.
Won 1st place out of 30+ teams at the UMass Machine Learning Club
project showcase.
Mar '25 - May '25| COMPSCI 685: Advanced
Natural Language Processing |[link]
Developed an autonomous agent using Proximal Policy Optimization
(PPO) integrated with a Greedy Best-First Graph Search strategy,
dramatically enhancing semantic word-guessing AI performance.
Curated a Semantle-specific vocabulary dataset of 1,200 entries and
nearly 10,000 unique words using API data mining and Google N-grams
for frequency-based filtering.
Outperformed GPT-4o baseline with an 8.5x increase in games solved
within 200 guesses, achieving a 94% solve rate compared to 11% with
LLM baseline.
Served as project manager for a partnership between the Aggie Data
Science Club and General Motors by expanding upon transportation
research from the Federal Highway Administration's National Highway
Travel Survey and identifying applications for GM's mission
statement.
Led a team of 40 students to gain skills in understanding
data science applications in research, including research on
California traffic flow and commuter trends across NHTS surveys.
Jan '24 - May '24 | CSCE 482: Senior
Capstone |[link]
Worked frontend in a team of 5 to develop an Android-based Cookbook
application for a senior capstone course.
Conducted user acceptance tests across students, ensuring ease of
use and accessibility by identifying user interaction challenges
and sharing results with the team on a biweekly basis.
Served as scrum master in a group of 6 to develop a website
application for Maroon Health, a student organization providing
volunteer opportunities for medical students by providing Health
care for Houston's homeless population.
Successfully developed a web application using the Ruby on Rails
framework with a PostgreSQL backend.
Exceeded customer expectations by improving efficiency and accuracy
of volunteer sign-ups and scheduling.
Dec '22 - May '24 |
TAMU Album of the Week |[link]
Developed a website for a student organization to centralize club
activities for a club with approximately 60 active members per
semester.
Automated club tasks among members and officers, saving an
estimated 90 hours among officers every year from manual tasks and
reducing errors committed by members by approximately 80%.
Managed a partnership between the Aggie Data Science Club and
American Airlines to predict number of checked-in bags for flights,
helping reduce costs through improved resource allocation.
Led a team of over 50 students throughout a semester, enabling
novice students to gain hands-on experience in machine learning
techniques.
Created an XGBoost regression model to estimate the number of
checked-in bags per flight, achieving an R2 score of
0.841 and a mean absolute error of 10.99.
Summarized an Apple Music user's listening habits and preferences
by analyzing play history spreadsheets to present additional user
statistics and insights.
Focused on presenting additional statistics on a user's listening
history akin to Spotify Wrapped with a clean dashboard.
Served as a project leader for undergraduate students on projects
involving AI and machine learning, including the AI Connections
Solver in Spring 2025 and SciEncoder in Spring 2026.
Created challenge problems for a data science-oriented hackathon
for two years with an impact of over 300 competitors.
Designed a sketch recognition-based challenge in TAMU Datathon 2023
based on sketches of symbols of Texas A&M, drawing over 40
competitors for the challenge.
Produced code notebooks and led workshops on data analytics and
webscraping for attendees to help them gain skills and find
solutions.
Centralize organization activities (such as album nomination,
voting, and rating) and give the organization greater presence
through the continuous development of the club's website and
Discord bots.
Created the club's website on-and-off over the span of 8 months,
continue to moderate and make bug fixes since July 2023.
Assumed data science responsibilities in the club, including developing an album recommendation tool based on
member ratings (with an RMSE of 1.34), creating infographics of song recommendations posted by members in 2023 and 2024,
analyzing and plotting statistics of the club at end-of-year celebrations.
The analyses for end-of-year celebrations can be viewed here:
Spring 2024,
Fall 2023,
Spring 2023.
Aggie Data Science Club
Projects Officer, Treasurer | Dec '21 -
May '24 |
[link]
Among the original officer team at the club's founding, served as
treasurer from December 2021 to May 2023, and served as projects
officer from May 2023 to May 2024.
Delegated financial obligations and defined treasury role, working
with transactions for meetings, special events, and merchandise
orders.
Led a combined total of over 100 students in data science projects
across three semesters with over 90% positive feedback. Projects
include:
American Airlines Collaboration Project (Spring 2023)
Kaggle Walkthrough (Fall 2023)
General Motors Collaboration Project (Spring 2024)
Album of the Week Recommendation Tool (Spring 2024)
Resolved roadblocks and ensure student-led projects are running
smoothly, leading oversight on projects with over 250 members
combined in the Fall 2023 and Spring 2024 semesters.
Miscellaneous
I love discovering new music, particularly indie music! I was a highly
active member of Texas A&M's Album of the Week club and UMass Amherst's
WMUA 91.1 FM.
If you're interested in what I'm listening to, you can stalk me on Apple Music.
Running is a huge part of my identity! I'm currently training to run
my first full marathon in Sacramento this December. If you want to learn
more about my physical activity, you can stalk me on Strava.
One of my favorite hobbies (and biggest distractions) is going down
Wikipedia rabbit holes! Unless I'm off the grid, I'm reading something
on Wikipedia every single day. I also edit Wikipedia from time to
time.
In high school, I was a member of the Vandegrift HS Marching Band for four
years, and had the opportunity to perform at the Bands of America Grand
Nationals in Indianapolis in 2019, where the band won that year's national
championships! You can watch a video of the performance
here.
I've been to all 50 states in the United States! The last state I visited
was Alaska in August 2023.
Alongside being a U.S. citizen 🇺🇸, I am also a citizen of Brazil 🇧🇷, and can
speak Portuguese at a professional level! I can also speak Spanish at an
intermediate level and French at a beginner level.