Case Study 1
Nusantara - Digital Divide Dashboard (Award Winning)
Favorite Champion, Data Slayer 3.0
Dec 2025Power BICRISP-DMPolicy AnalyticsBPS Official Data
Problem statement: Digital poverty is more complex than internet access alone. Policymakers need a clear dashboard to identify priority provinces and intervention strategy.
Expand full technical breakdown
Objectives
- - Map digital inequality trends across Indonesia (2017-2024).
- - Diagnose the relationship between poverty, internet access, and ICT skills.
- - Support policy prioritization using hotspot identification and projections.
Dataset
- - Official BPS indicators: internet access, device ownership, ICT skills, poverty, HDI, IP-TIK.
- - Multi-year panel style data from 2017-2024, transformed into analysis-ready model.
Methodology
- - CRISP-DM pipeline: business understanding, data understanding, preparation, modeling, evaluation, deployment.
- - ETL, normalization, star-schema modeling, and DAX measures in Power BI.
- - Multi-level analytics framework: descriptive, diagnostic, predictive, prescriptive, exploratory.
Outcomes
- - National internet access trend increased to 89.8% in 2024.
- - Identified 11 provinces in digital exclusion hotspot category for intervention priority.
- - Produced 2025-2027 outlook for poverty and IP-TIK supporting policy planning.
Impact
- - Improved evidence-based discussion for digital inclusion policy.
- - Provided interactive scenario testing, not just static reporting.
Tech Stack
Power BI, DAX, Excel, CRISP-DM, Data Modeling
Case Study 2
Fatigue Detection Camera for Motorcycle Driver
Computer Vision + Mobile AI
Feb 2025 - Jun 2025MediapipeGRUReal-time InferenceAndroid
Problem statement: Motorcycle fatigue often causes preventable accidents. A light mobile-first AI detector is needed for real-time warnings.
Expand full technical breakdown
Objectives
- - Detect drowsiness through facial movement patterns accurately.
- - Provide real-time alert output on medium-to-low spec phones.
- - Keep inference efficient for practical use.
Dataset
- - 6 subjects, 22 videos, 720p, day/evening sessions.
- - Frame extraction generated 66,472 train frames, 16,835 validation, 9,328 test.
- - Labels: Alert (0) and Drowsy (1).
Methodology
- - Feature extraction with MediaPipe Face Mesh.
- - Derived EAR (Eye Aspect Ratio) and MAR (Mouth Aspect Ratio).
- - GRU training configuration: sequence length 24, epoch 49, Adam optimizer, early stopping, LR scheduler.
Outcomes
- - Test accuracy: 78%.
- - Class-level precision/recall/F1 around 0.78 for both classes.
- - Built end-to-end inference workflow from camera frame to drowsy/alert output.
Impact
- - Potential early warning mechanism to reduce fatigue-related accidents.
- - Demonstrated practical deployment path from model to user-facing app.
Tech Stack
Python, TensorFlow, MediaPipe, GRU, Android Studio, Kotlin, TFLite
View project demoCase Study 3
CineMood - Emotion-Based Movie Recommender
NLP + Mobile + Cloud
Oct 2024 - Dec 2024LSTMTFLiteCloud RunAndroid
Problem statement: Recommendation systems mostly rely on history and ignore emotional context. Users need mood-aware recommendations.
Expand full technical breakdown
Objectives
- - Classify text mood into six emotion classes.
- - Serve mood-aware movie recommendations in mobile flow.
- - Deploy API infrastructure in scalable cloud environment.
Dataset
- - TMDb metadata for movie recommendation candidate pool.
- - Emotion dataset from annotated Twitter text with 6 classes.
Methodology
- - Emotion classifier using LSTM architecture with embedding and softmax output.
- - Model conversion to TFLite for mobile compatibility.
- - Cloud deployment via Google Cloud Run and Cloud Storage support.
Outcomes
- - Model complexity: 7.6 million trainable parameters.
- - Achieved high training/validation performance in experimentation phase.
- - Delivered end-to-end app flow from mood input to recommendation page.
Impact
- - Created more personal entertainment recommendation experience.
- - Showcased cross-functional collaboration across ML, mobile, and cloud roles.
Tech Stack
TensorFlow, LSTM, Kotlin, Android Studio, Google Cloud Run, TFLite
Case Study 4
Liver Disease Prediction with Stacking Ensemble
Clinical Classification Modeling
Mar 2024 - Sep 2024Ensemble LearningSMOTENCAUC 0.91Healthcare ML
Problem statement: Late and inaccurate liver disease indication can delay treatment. A robust predictive model can support early clinical decisions.
Expand full technical breakdown
Objectives
- - Predict liver disease risk using demographic and laboratory features.
- - Minimize misdiagnosis risk via stronger classification performance.
- - Benchmark ensemble performance against baseline models.
Dataset
- - Clinical dataset with demographic and lab variables such as bilirubin, albumin, enzymes, proteins.
- - Handled mild class imbalance using SMOTENC.
Methodology
- - Data cleaning, encoding, scaling, outlier handling rationale, and missing value treatment.
- - Stacking ensemble with base estimators (KNN, GaussianNB, LightGBM, SVC, XGBoost, Random Forest, CatBoost).
- - Meta estimator: Logistic Regression.
Outcomes
- - Accuracy: 82.85%.
- - AUC: 0.9128.
- - Recall: 79.23% and Precision: 85.19%.
Impact
- - Demonstrated improvement over several baseline models.
- - Useful as decision support companion to expert diagnosis.
Tech Stack
Python, Scikit-learn, CatBoost, XGBoost, LightGBM, SMOTENC
Case Study 5
Suicidal Ideation Detection from Reddit
NLP for Early Mental-Health Signals
Jun 2025232K postsTF-IDFNaive BayesROC-AUC 98.2%
Problem statement: Early suicidal ideation signals are hard to identify at scale. Text mining can support timely intervention.
Expand full technical breakdown
Objectives
- - Classify subreddit posts into suicidal vs non-suicidal categories.
- - Build reproducible NLP pipeline for large-scale social text.
- - Evaluate robustness with train-test split and cross-validation.
Dataset
- - 232,074 Reddit posts from related subreddits (balanced classes).
- - 80/20 train-test split with 10-fold cross-validation setup.
Methodology
- - Text preprocessing: lowercase, URL removal, mention/hashtag cleanup, digit cleanup, stopword handling.
- - TF-IDF vectorization with tuned settings (max_df, min_df, ngram range, sublinear TF).
- - Multinomial Naive Bayes model for binary text classification.
Outcomes
- - Accuracy: 92.9%.
- - ROC-AUC: 98.2%.
- - Reliable separation between suicidal and non-suicidal classes.
Impact
- - Illustrates how NLP can aid social good and mental-health support systems.
- - Strong benchmark project for high-volume text analytics.
Tech Stack
Python, NLTK, Scikit-learn, TF-IDF, Naive Bayes
Case Study 6
INNOVA - Integrated News & Overview Analysis
Internal Tool for BPS Brebes
Jul 2025 - Sep 202527+ portalsAutomationFlask APISQLite
Problem statement: Manual news monitoring was slow, inconsistent, and difficult to archive for statistical analysis work.
Expand full technical breakdown
Objectives
- - Automate scraping and ingestion from multiple local portals.
- - Provide searchable archive and export-ready dataset.
- - Improve usability through iterative product versions.
Dataset
- - News content pulled from 27+ online sources including local and national portals.
- - Structured archival data with source/date/category indexing.
Methodology
- - Backend architecture design with scraping modules and task progress management.
- - API endpoint development and data export workflow.
- - Frontend iteration from low-fidelity prototype to polished user-centered v2.0.
Outcomes
- - Reduced time for news collection and monitoring.
- - Centralized up-to-date local news archive for analysis.
- - Improved retrieval and documentation for long-term maintainability.
Impact
- - Increased operational efficiency for local statistical division.
- - Converted repetitive manual process into reusable digital workflow.
Tech Stack
Python, Flask, SQLite, BeautifulSoup, HTML, CSS, Pandas