Open to roles

MuhammadNuril Huda

AI engineer and data scientist. I build systems that hold up once real people start using them.

Two years of evaluating model output taught me to distrust a demo that works once. So I test behaviour first, then ship.

Bojonegoro, Indonesia · —

About

Vague problem in. Working system out.

I am an AI engineer and data scientist in Bojonegoro, East Java. Most jobs reach me as a sentence or two, and the real problem usually sits three layers under that sentence. Finding it is my favourite part of the work.

My route here was not a straight line. I started in computer vision, wrote a book about Java while I was still an undergraduate, and led a small team building a deepfake classifier. Then forecasting, Indonesian NLP, and LLM evaluation each showed up because a project needed them. My master's research at UGM paired plantar thermograms with temperature readings to catch diabetic foot ulcers earlier, and it remains the work I am most attached to.

Reviewing model output for Outlier and Alignerr changed how I build. Spend two years reading answers that are wrong and completely confident about it, and you stop trusting anything that only worked once in a demo. Now I write the evaluation before I write the feature, and I keep arithmetic out of the language model where I can.

These days I am building Aureum, an AI financial copilot, mostly on my own. I like messy data, requirements that are still forming, and users who tell me when something is broken.

Build

Four kinds of systems.

LLM & GenAI systems

Retrieval that returns the right passage, prompts that survive strange input, and evaluation that catches the model being confidently wrong before a user has to.

AI product engineering

The unglamorous half: APIs, queues, databases, containers, deployment. A model nobody can reach is just a research note.

Data science & forecasting

Pipelines, analysis, and forecasting. Data tends to arrive messy, inconsistent, and missing the columns you wanted, and my job is turning it into something you can decide with. Most of my work has been some version of this.

Vision & media AI

Thermal imaging, image enhancement, detection and tracking, speech, video pipelines. Some of it research, some of it shipped.

Stack

Technical capability, grouped by function.

AI Engineering & LLM Systems

Generative AI · Natural language processing (NLP) · Agentic AI & multi-agent systems · MCP (Model Context Protocol) · RAG · hybrid retrieval · reranking · Semantic search & embeddings · Vector databases · LangChain · LangGraph · Hugging Face Transformers · Fine-tuning (LoRA) · IndoBERTweet · Context & prompt engineering · Structured outputs & tool calling · LLM evaluation & LLM-as-judge · RLHF/SFT evaluation · Guardrails & AI safety · Prompt caching & cost optimization

Backend & AI Product Engineering

Python · FastAPI · REST APIs · API design · SQL · PostgreSQL · Relational databases · Caching · Asynchronous processing · Celery · Docker · Git · Backend architecture · Workflow automation

Full-Stack Web & TypeScript

TypeScript · Next.js (App Router) · React · Node.js · Tailwind CSS · Drizzle ORM · PostgreSQL · Server actions · OIDC / OAuth (Cognito) · PWA · Monorepo (pnpm) · Vitest

Data Engineering & Analytics

PySpark · ETL pipelines · Big data processing · Batch processing · Data modeling · Schema design · Data warehousing · BigQuery · Looker Studio · Data visualization · Dashboard design · Reporting automation

Data Science & Machine Learning

Python · R · SQL · Scikit-learn · NumPy · Pandas · Matplotlib · Seaborn · Jupyter · Exploratory data analysis (EDA) · Predictive modeling · Supervised & unsupervised learning · Linear & logistic regression · Ridge & lasso · Decision trees · Random Forest · SVM · k-NN · Naive Bayes · Gradient boosting · XGBoost · LightGBM · K-Means · DBSCAN · Hierarchical clustering · PCA · t-SNE · Ensembles (hard/soft voting, weighted voting, stacking, bagging, boosting) · Deep learning · Neural networks · ANN / MLP · Feature engineering & selection

Statistics & Time Series

Descriptive statistics · Distribution analysis · Correlation analysis · Stationarity testing (ADF) · Seasonal decomposition · ACF / PACF · Time-series forecasting · ARIMA · ARIMAX · SARIMA · SARIMAX · LSTM · GRU · Demand forecasting

Model Evaluation & Tuning

Cross-validation (k-fold, stratified) · GridSearchCV · RandomizedSearchCV · Hyperparameter tuning · Class imbalance (SMOTE, class weighting) · Regularization · Precision / recall / F1 · ROC-AUC · Confusion matrix · MAE / RMSE / MAPE · Model comparison & selection · Error analysis

Computer Vision & Media AI

TensorFlow · PyTorch · Keras · Transfer learning · OpenCV · CNN classification · Image classification · Object detection · Object tracking · Image enhancement · Medical image analysis · Thermal imaging · Deepfake detection · YOLOv11 · MediaPipe · ByteTrack · Speech recognition (ASR) · Whisper · FFmpeg · Image generation & editing · Stable Diffusion · ComfyUI · Automatic1111

Cloud & Deployment

AWS ECS · AWS S3 · AWS Lambda · Serverless architecture · DynamoDB · CloudFront · Self-hosted LLM gateway (9router) · CI/CD · GCP · BigQuery · Compute Engine · Vertex AI · Streamlit · Static hosting

AI-Accelerated Engineering

Most of this site, including the chatbot in the corner, I built pair-programming with AI agents. It is how a one-person team ships at this pace.

AI coding agents (Claude Code, Hermes) · Spec-driven development · AI code review & auditing · Eval-driven development · LLM training data quality · Human-in-the-loop workflows · Rapid AI prototyping

Product, Research & Collaboration

Business problem framing · Technical documentation · Stakeholder reporting · Async collaboration · Fast iteration · Startup leadership · Research writing · Team leadership

Work

Eight roles since 2021, and the systems that came out of them.

Jan 2026 — Present

Founder · Aureum

A conversational personal-finance copilot. It reads balances and market data, then talks through allocation and stock analysis without ever holding anyone's money. I designed the AI orchestration, the analytics, the market intelligence pipelines, and the infrastructure under all of it. The model was never the hard part. Making the numbers trustworthy enough to act on was, which is why the arithmetic lives in code and not in the prompt.

AI orchestration · Financial analytics · Market intelligence · Cloud infrastructure

Mar 2026 — Jul 2026

AI Engineer · Ayclip

Long video in, short clips out. I built and deployed the containerised Python service behind it: face tracking with 9:16 auto-reframing, semantic highlight detection, speech-to-text with word-level subtitle sync, and async processing so one enormous upload never blocks the queue.

Python · FastAPI · Celery · YOLOv11 · MediaPipe · ByteTrack · Whisper · FFmpeg · Docker · AWS ECS/S3

Oct 2024 — Present

AI Evaluation Consultant & Code Review Specialist · Alignerr

I write adversarial data science and machine learning problems used to test frontier models. Each one ships with a deterministic grader, private held-out fixtures, and a reference solution pinned to a fixed score anchor. A task only counts if the model fails all five scored attempts, so the problems have to resist shortcuts and survive review. Before this I reviewed Python and ML code for logic errors and edge cases.

Python · Benchmark design · Deterministic graders · Rubric evaluation · RLHF/SFT · QA workflows

2022 — Present

Freelance Data Scientist & AI Engineer

Machine learning and analysis for individuals and small teams across finance, healthcare, and consumer work. Three that I can describe: gold price forecasting with Random Forest against regression baselines, compared on held-out error before delivery; an Indonesian sentiment system over comments from four beauty brands, weighing a curated lexicon against fine-tuned IndoBERTweet and few-shot LoRA, which flagged the posts where a competitor was pulling stronger positive sentiment; and a hospital booking assistant on FastAPI, Docker, and LangChain RAG that let patients pick a doctor and confirm a slot without an administrator in the loop. Client names stay private.

Python · scikit-learn · IndoBERTweet · LoRA · Hugging Face · LangChain · FastAPI · Docker

May 2024 — Oct 2025

AI Trainer & LLM Evaluator · Outlier

Scored RLHF and SFT output against task rubrics across general, technical, and computer science domains, and documented where the reasoning, the facts, or the instruction-following broke down.

RLHF · SFT · Rubric evaluation · Annotation QA

Jan 2024 — Jul 2024

Data Scientist Intern · Data Glacier

Travel demand forecasting on PySpark and GCP, joining flight records with taxi and weather data. I checked the series for seasonal structure before picking a model, using an Augmented Dickey-Fuller test, seasonal decomposition, and ACF/PACF. SARIMA and LSTM then cut held-out MAPE by 15%, and redesigning the schema and storage layout dropped measured pipeline runtime by 90%.

80M+

Flight records processed · PySpark · GCP · SARIMA · LSTM

Jul 2023 — Dec 2023

Data Analyst Apprentice · GoTo Impact Foundation

Defined the learning metrics and wrote the SQL behind BigQuery and Looker Studio dashboards. Program owners used them to spot participants who were falling behind, find the strongest ones, and judge how classes were performing.

SQL · BigQuery · Looker Studio · Dashboard design

Aug 2021 — Feb 2022

AI Engineer Apprentice · Orbit Future Academy

Led a team of four building a deepfake classifier with MTCNN face extraction and InceptionResNetV1 transfer learning, plus the modular preprocessing and evaluation workflow around it.

TensorFlow · CNN · MTCNN · InceptionResNetV1 · Transfer learning

Open source

DeciSense

A Python assistant for the repetitive start of any data science job: dataset intake, validation, profiling, model recommendation, evaluation, and reporting. Deterministic checks run first and the language model only speaks once there is something real to explain.

Python · Data validation · Profiling · ML planning · LLM explanations

Education

Computer science, twice, plus the clubs that taught me the rest.

Aug 2023 — Jan 2025

Master's degree, Computer Science

Universitas Gadjah Mada · GPA 3.79/4.00

HIMPASIKOM student association, talents and interests division

Aug 2018 — Dec 2022

Bachelor's degree, Informatics

Universitas Muhammadiyah Malang · GPA 3.84/4.00

Data Science Club · Google Developer Student Club

2015 — 2018

MAN Model Bojonegoro

Senior high school, where the programming habit started.

Certificates

Verifiable, and mostly still valid.

DataCamp Professional Data Scientist certificate issued to Muhammad Nuril Huda

Professional Data Scientist DataCamp · Oct 2025 — Oct 2027 DS0024331192493

DBS Foundation Coding Camp certificate for the Machine Learning Engineer path at expert level

Machine Learning Engineer, expert level DBS Foundation Coding Camp · Dec 2024 DCC2024/PS/L3-ML-044

Google Project Management professional certificate issued through Coursera

Google Project Management Specialization Google · Coursera · Jul 2024 GGCHUU5PFC48

Database Administrator National Professional Certification Board (BNSP) · Sep 2022 — Sep 2025 62019 2521 0040139 2022

A few more sit on LinkedIn. I have not framed any of them yet.

Publication

One paper, one book.

SINTECHCOM · Feb 2025

Optimization of Plantar Foot Thermogram for Diabetic Foot Ulceration Early Detection: An Image Enhancement Approach

My master's research, peer reviewed. Plantar thermogram imaging combined with temperature data in a multi-input CNN and ANN model to catch ulceration risk earlier. I compared CLAHE, gamma correction, solarize, and posterize enhancement and measured what each one did to classification instead of assuming.

97.06%

Reported classification accuracy · DOI 10.59190/stc.v5i2.273

DIVA Press · Dec 2020

Java Itu Mudah

A beginner's guide to Java in Indonesian, written while I was still an undergraduate. ISBN 9786023918980.