Detailed Portfolio

In-depth Case Studies with Methodology, Metrics, and Impact

Explore each project with complete context: problem framing, dataset design, modeling approach, evaluation metrics, and real-world value.

Quick jump

Case Study 1

Nusantara - Digital Divide Dashboard (Award Winning)

Favorite Champion, Data Slayer 3.0

Dec 2025
Power BICRISP-DMPolicy AnalyticsBPS Official Data

Problem statement: Digital poverty is more complex than internet access alone. Policymakers need a clear dashboard to identify priority provinces and intervention strategy.

Expand full technical breakdown

Objectives

  • - Map digital inequality trends across Indonesia (2017-2024).
  • - Diagnose the relationship between poverty, internet access, and ICT skills.
  • - Support policy prioritization using hotspot identification and projections.

Dataset

  • - Official BPS indicators: internet access, device ownership, ICT skills, poverty, HDI, IP-TIK.
  • - Multi-year panel style data from 2017-2024, transformed into analysis-ready model.

Methodology

  • - CRISP-DM pipeline: business understanding, data understanding, preparation, modeling, evaluation, deployment.
  • - ETL, normalization, star-schema modeling, and DAX measures in Power BI.
  • - Multi-level analytics framework: descriptive, diagnostic, predictive, prescriptive, exploratory.

Outcomes

  • - National internet access trend increased to 89.8% in 2024.
  • - Identified 11 provinces in digital exclusion hotspot category for intervention priority.
  • - Produced 2025-2027 outlook for poverty and IP-TIK supporting policy planning.

Impact

  • - Improved evidence-based discussion for digital inclusion policy.
  • - Provided interactive scenario testing, not just static reporting.

Tech Stack

Power BI, DAX, Excel, CRISP-DM, Data Modeling

Case Study 2

Fatigue Detection Camera for Motorcycle Driver

Computer Vision + Mobile AI

Feb 2025 - Jun 2025
MediapipeGRUReal-time InferenceAndroid

Problem statement: Motorcycle fatigue often causes preventable accidents. A light mobile-first AI detector is needed for real-time warnings.

Expand full technical breakdown

Objectives

  • - Detect drowsiness through facial movement patterns accurately.
  • - Provide real-time alert output on medium-to-low spec phones.
  • - Keep inference efficient for practical use.

Dataset

  • - 6 subjects, 22 videos, 720p, day/evening sessions.
  • - Frame extraction generated 66,472 train frames, 16,835 validation, 9,328 test.
  • - Labels: Alert (0) and Drowsy (1).

Methodology

  • - Feature extraction with MediaPipe Face Mesh.
  • - Derived EAR (Eye Aspect Ratio) and MAR (Mouth Aspect Ratio).
  • - GRU training configuration: sequence length 24, epoch 49, Adam optimizer, early stopping, LR scheduler.

Outcomes

  • - Test accuracy: 78%.
  • - Class-level precision/recall/F1 around 0.78 for both classes.
  • - Built end-to-end inference workflow from camera frame to drowsy/alert output.

Impact

  • - Potential early warning mechanism to reduce fatigue-related accidents.
  • - Demonstrated practical deployment path from model to user-facing app.

Tech Stack

Python, TensorFlow, MediaPipe, GRU, Android Studio, Kotlin, TFLite

View project demo

Case Study 3

CineMood - Emotion-Based Movie Recommender

NLP + Mobile + Cloud

Oct 2024 - Dec 2024
LSTMTFLiteCloud RunAndroid

Problem statement: Recommendation systems mostly rely on history and ignore emotional context. Users need mood-aware recommendations.

Expand full technical breakdown

Objectives

  • - Classify text mood into six emotion classes.
  • - Serve mood-aware movie recommendations in mobile flow.
  • - Deploy API infrastructure in scalable cloud environment.

Dataset

  • - TMDb metadata for movie recommendation candidate pool.
  • - Emotion dataset from annotated Twitter text with 6 classes.

Methodology

  • - Emotion classifier using LSTM architecture with embedding and softmax output.
  • - Model conversion to TFLite for mobile compatibility.
  • - Cloud deployment via Google Cloud Run and Cloud Storage support.

Outcomes

  • - Model complexity: 7.6 million trainable parameters.
  • - Achieved high training/validation performance in experimentation phase.
  • - Delivered end-to-end app flow from mood input to recommendation page.

Impact

  • - Created more personal entertainment recommendation experience.
  • - Showcased cross-functional collaboration across ML, mobile, and cloud roles.

Tech Stack

TensorFlow, LSTM, Kotlin, Android Studio, Google Cloud Run, TFLite

Case Study 4

Liver Disease Prediction with Stacking Ensemble

Clinical Classification Modeling

Mar 2024 - Sep 2024
Ensemble LearningSMOTENCAUC 0.91Healthcare ML

Problem statement: Late and inaccurate liver disease indication can delay treatment. A robust predictive model can support early clinical decisions.

Expand full technical breakdown

Objectives

  • - Predict liver disease risk using demographic and laboratory features.
  • - Minimize misdiagnosis risk via stronger classification performance.
  • - Benchmark ensemble performance against baseline models.

Dataset

  • - Clinical dataset with demographic and lab variables such as bilirubin, albumin, enzymes, proteins.
  • - Handled mild class imbalance using SMOTENC.

Methodology

  • - Data cleaning, encoding, scaling, outlier handling rationale, and missing value treatment.
  • - Stacking ensemble with base estimators (KNN, GaussianNB, LightGBM, SVC, XGBoost, Random Forest, CatBoost).
  • - Meta estimator: Logistic Regression.

Outcomes

  • - Accuracy: 82.85%.
  • - AUC: 0.9128.
  • - Recall: 79.23% and Precision: 85.19%.

Impact

  • - Demonstrated improvement over several baseline models.
  • - Useful as decision support companion to expert diagnosis.

Tech Stack

Python, Scikit-learn, CatBoost, XGBoost, LightGBM, SMOTENC

Case Study 5

Suicidal Ideation Detection from Reddit

NLP for Early Mental-Health Signals

Jun 2025
232K postsTF-IDFNaive BayesROC-AUC 98.2%

Problem statement: Early suicidal ideation signals are hard to identify at scale. Text mining can support timely intervention.

Expand full technical breakdown

Objectives

  • - Classify subreddit posts into suicidal vs non-suicidal categories.
  • - Build reproducible NLP pipeline for large-scale social text.
  • - Evaluate robustness with train-test split and cross-validation.

Dataset

  • - 232,074 Reddit posts from related subreddits (balanced classes).
  • - 80/20 train-test split with 10-fold cross-validation setup.

Methodology

  • - Text preprocessing: lowercase, URL removal, mention/hashtag cleanup, digit cleanup, stopword handling.
  • - TF-IDF vectorization with tuned settings (max_df, min_df, ngram range, sublinear TF).
  • - Multinomial Naive Bayes model for binary text classification.

Outcomes

  • - Accuracy: 92.9%.
  • - ROC-AUC: 98.2%.
  • - Reliable separation between suicidal and non-suicidal classes.

Impact

  • - Illustrates how NLP can aid social good and mental-health support systems.
  • - Strong benchmark project for high-volume text analytics.

Tech Stack

Python, NLTK, Scikit-learn, TF-IDF, Naive Bayes

Case Study 6

INNOVA - Integrated News & Overview Analysis

Internal Tool for BPS Brebes

Jul 2025 - Sep 2025
27+ portalsAutomationFlask APISQLite

Problem statement: Manual news monitoring was slow, inconsistent, and difficult to archive for statistical analysis work.

Expand full technical breakdown

Objectives

  • - Automate scraping and ingestion from multiple local portals.
  • - Provide searchable archive and export-ready dataset.
  • - Improve usability through iterative product versions.

Dataset

  • - News content pulled from 27+ online sources including local and national portals.
  • - Structured archival data with source/date/category indexing.

Methodology

  • - Backend architecture design with scraping modules and task progress management.
  • - API endpoint development and data export workflow.
  • - Frontend iteration from low-fidelity prototype to polished user-centered v2.0.

Outcomes

  • - Reduced time for news collection and monitoring.
  • - Centralized up-to-date local news archive for analysis.
  • - Improved retrieval and documentation for long-term maintainability.

Impact

  • - Increased operational efficiency for local statistical division.
  • - Converted repetitive manual process into reusable digital workflow.

Tech Stack

Python, Flask, SQLite, BeautifulSoup, HTML, CSS, Pandas