LLM & GenAI Systems
Practical AI workflows using RAG, prompt orchestration, structured outputs, and model evaluation patterns that prioritize correctness and usefulness.
AI Engineer & Data Scientist · Indonesia
Building practical AI systems from data, models, and real-world workflows.
I turn messy data and manual workflows into AI products that people actually use. My evaluation background means I test model behavior before I trust it.
About
I work where AI engineering, data science, and product execution meet. Give me an operational problem that arrives half-formed, and I'll turn it into a structured system: define the workflow, reason about the data, evaluate how the model behaves, then ship something people can actually use.
I've worked on LLM evaluation, AI products, forecasting pipelines, backend RAG applications, and medical computer vision research. The common thread: I care whether the system actually helps someone decide or act, not whether the demo looks good.
What I build
Practical AI workflows using RAG, prompt orchestration, structured outputs, and model evaluation patterns that prioritize correctness and usefulness.
Backend-driven AI applications with Python, FastAPI, Docker, relational workflows, caching, and cloud deployment.
Data pipelines, analytics workflows, and forecasting systems using Python, SQL, PySpark, SARIMA/LSTM, and stakeholder-ready reporting.
Research and project experience in thermal imaging, deep learning, image enhancement, face/deepfake detection, and video intelligence pipelines.
Featured work
AI products, LLM evaluation, data pipelines, and applied computer vision.
Founder · AI Product
Founded and built an AI financial copilot: users manage personal finances and weigh investment decisions through plain conversation. I designed the AI orchestration, financial analytics, market intelligence pipelines, and the cloud infrastructure underneath, and kept every recommendation transparent and non-custodial.
GenAI · Computer Vision · Media
AI-powered video clipping architecture that turns long-form video into short-form clips: YOLOv11 + MediaPipe + ByteTrack for face tracking and 9:16 auto-reframing, Claude-based semantic highlight detection, Whisper speech-to-text with word-level subtitle synchronization, and asynchronous processing on Docker and AWS ECS/S3.
LLM Evaluation
I evaluate AI-generated STEM, coding, ML, and data science reasoning. In practice that means reviewing code for logic errors and edge cases, writing structured feedback for LLM training workflows, and auditing annotation quality against rubrics.
RAG · Backend AI
A medical chatbot that took a manual appointment workflow and made it self-service. Under the hood: a LangChain RAG pipeline, document retrieval, multi-intent handling, and automated doctor assignment on a FastAPI backend.
AI Assistant · DS Tooling
Python assistant that automates data science workflows: dataset intake, validation, profiling, task inference, planning, modeling, evaluation, and reporting. Simple inspectable logic first, LLM assistance second.
Data Science · Forecasting
Forecasting pipelines that join taxi, flight, and weather data on PySpark and GCP. The SARIMA and LSTM models improved prediction accuracy by 15% and cut pipeline runtime by 90%.
Case-study details for each system are available on request.
Capability matrix
Most of this site, including the recruiter chatbot in the corner, was built pair-programming with AI agents. Working this way is how I ship faster than a one-person team usually can.
Experience
Jan 2026 — Present
Building an AI financial copilot end to end: AI orchestration, financial analytics, market intelligence pipelines, and cloud infrastructure.
Mar 2026 — Present
Designing AI video clipping architecture: CV pipelines, semantic highlight detection, ASR/subtitles, and asynchronous media processing on AWS.
Oct 2024 — Present
Evaluating AI-generated STEM/coding/ML solutions, reviewing code quality, and auditing annotation outputs for rubric adherence and reasoning quality.
May 2024 — Oct 2025
Reviewed and annotated AI responses for RLHF/SFT across general, technical, and CS domains; assessed instruction-following, truthfulness, and reasoning.
Jan 2024 — Jul 2024
Built PySpark/GCP pipelines integrating taxi, flight, and weather data; SARIMA & LSTM forecasting improved accuracy 15% and cut runtime 90%.
2023
Designed dashboards and automated reporting for student performance tracking; translated findings for non-technical stakeholders.
Aug 2021 — Feb 2022
Led a team of four building a deepfake detection system with CNN architectures (MTCNN, InceptionResNetV1), from preprocessing to evaluation.
Research & education
Master's thesis · Universitas Gadjah Mada
97.06% reported classification accuracy
Combined plantar thermogram imaging with temperature data to catch diabetic foot ulceration risk earlier. The work covered image enhancement, feature integration, and model evaluation.
Published in SINTECHCOM Journal, Feb 2025 · DOI 10.59190/stc.v5i2.273Education
Universitas Gadjah Mada · GPA 3.79/4.00
Education
Universitas Muhammadiyah Malang · GPA 3.86/4.00
Book
DIVA Press, 2020 · ISBN 9786023918980
Contact
Interested in AI engineering, GenAI, LLM systems, data science, or applied AI product work? Reach me by email or LinkedIn. I'm open to roles and collaborations.
nurilhuda3333@gmail.com