Detailed Portfolio

In-depth Case Studies with Methodology, Metrics, and Impact

Explore each project with complete context: problem framing, dataset design, modeling approach, evaluation metrics, and real-world value.

Quick jump

Nusantara - Digital Divide Dashboard (Award Winning)Fatigue Detection Camera for Motorcycle Driver CineMood - Emotion-Based Movie Recommender Liver Disease Prediction with Stacking Ensemble Suicidal Ideation Detection from Reddit INNOVA - Integrated News & Overview Analysis

Case Study 1

Nusantara - Digital Divide Dashboard (Award Winning)

Favorite Champion, Data Slayer 3.0

Dec 2025

Power BICRISP-DMPolicy AnalyticsBPS Official Data

Problem statement: Digital poverty is more complex than internet access alone. Policymakers need a clear dashboard to identify priority provinces and intervention strategy.

Expand full technical breakdown

Objectives

- Map digital inequality trends across Indonesia (2017-2024).
- Diagnose the relationship between poverty, internet access, and ICT skills.
- Support policy prioritization using hotspot identification and projections.

Dataset

- Official BPS indicators: internet access, device ownership, ICT skills, poverty, HDI, IP-TIK.
- Multi-year panel style data from 2017-2024, transformed into analysis-ready model.

Methodology

- CRISP-DM pipeline: business understanding, data understanding, preparation, modeling, evaluation, deployment.
- ETL, normalization, star-schema modeling, and DAX measures in Power BI.
- Multi-level analytics framework: descriptive, diagnostic, predictive, prescriptive, exploratory.

Outcomes

- National internet access trend increased to 89.8% in 2024.
- Identified 11 provinces in digital exclusion hotspot category for intervention priority.
- Produced 2025-2027 outlook for poverty and IP-TIK supporting policy planning.

Impact

- Improved evidence-based discussion for digital inclusion policy.
- Provided interactive scenario testing, not just static reporting.

Tech Stack

Power BI, DAX, Excel, CRISP-DM, Data Modeling

Case Study 2

Fatigue Detection Camera for Motorcycle Driver

Computer Vision + Mobile AI

Feb 2025 - Jun 2025

MediapipeGRUReal-time InferenceAndroid

Problem statement: Motorcycle fatigue often causes preventable accidents. A light mobile-first AI detector is needed for real-time warnings.

Expand full technical breakdown

Objectives

- Detect drowsiness through facial movement patterns accurately.
- Provide real-time alert output on medium-to-low spec phones.
- Keep inference efficient for practical use.

Dataset

- 6 subjects, 22 videos, 720p, day/evening sessions.
- Frame extraction generated 66,472 train frames, 16,835 validation, 9,328 test.
- Labels: Alert (0) and Drowsy (1).

Methodology

- Feature extraction with MediaPipe Face Mesh.
- Derived EAR (Eye Aspect Ratio) and MAR (Mouth Aspect Ratio).
- GRU training configuration: sequence length 24, epoch 49, Adam optimizer, early stopping, LR scheduler.

Outcomes

- Test accuracy: 78%.
- Class-level precision/recall/F1 around 0.78 for both classes.
- Built end-to-end inference workflow from camera frame to drowsy/alert output.

Impact

- Potential early warning mechanism to reduce fatigue-related accidents.
- Demonstrated practical deployment path from model to user-facing app.

Tech Stack

Python, TensorFlow, MediaPipe, GRU, Android Studio, Kotlin, TFLite

View project demo

Case Study 3

CineMood - Emotion-Based Movie Recommender

NLP + Mobile + Cloud

Oct 2024 - Dec 2024

LSTMTFLiteCloud RunAndroid

Problem statement: Recommendation systems mostly rely on history and ignore emotional context. Users need mood-aware recommendations.

Expand full technical breakdown

Objectives

- Classify text mood into six emotion classes.
- Serve mood-aware movie recommendations in mobile flow.
- Deploy API infrastructure in scalable cloud environment.

Dataset

- TMDb metadata for movie recommendation candidate pool.
- Emotion dataset from annotated Twitter text with 6 classes.

Methodology

- Emotion classifier using LSTM architecture with embedding and softmax output.
- Model conversion to TFLite for mobile compatibility.
- Cloud deployment via Google Cloud Run and Cloud Storage support.

Outcomes

- Model complexity: 7.6 million trainable parameters.
- Achieved high training/validation performance in experimentation phase.
- Delivered end-to-end app flow from mood input to recommendation page.

Impact

- Created more personal entertainment recommendation experience.
- Showcased cross-functional collaboration across ML, mobile, and cloud roles.

Tech Stack

TensorFlow, LSTM, Kotlin, Android Studio, Google Cloud Run, TFLite

Case Study 4

Liver Disease Prediction with Stacking Ensemble

Clinical Classification Modeling

Mar 2024 - Sep 2024

Ensemble LearningSMOTENCAUC 0.91Healthcare ML

Problem statement: Late and inaccurate liver disease indication can delay treatment. A robust predictive model can support early clinical decisions.

Expand full technical breakdown

Objectives

- Predict liver disease risk using demographic and laboratory features.
- Minimize misdiagnosis risk via stronger classification performance.
- Benchmark ensemble performance against baseline models.

Dataset

- Clinical dataset with demographic and lab variables such as bilirubin, albumin, enzymes, proteins.
- Handled mild class imbalance using SMOTENC.

Methodology

- Data cleaning, encoding, scaling, outlier handling rationale, and missing value treatment.
- Stacking ensemble with base estimators (KNN, GaussianNB, LightGBM, SVC, XGBoost, Random Forest, CatBoost).
- Meta estimator: Logistic Regression.

Outcomes

- Accuracy: 82.85%.
- AUC: 0.9128.
- Recall: 79.23% and Precision: 85.19%.

Impact

- Demonstrated improvement over several baseline models.
- Useful as decision support companion to expert diagnosis.

Tech Stack

Python, Scikit-learn, CatBoost, XGBoost, LightGBM, SMOTENC

Case Study 5

Suicidal Ideation Detection from Reddit

NLP for Early Mental-Health Signals

Jun 2025

232K postsTF-IDFNaive BayesROC-AUC 98.2%

Problem statement: Early suicidal ideation signals are hard to identify at scale. Text mining can support timely intervention.

Expand full technical breakdown

Objectives

- Classify subreddit posts into suicidal vs non-suicidal categories.
- Build reproducible NLP pipeline for large-scale social text.
- Evaluate robustness with train-test split and cross-validation.

Dataset

- 232,074 Reddit posts from related subreddits (balanced classes).
- 80/20 train-test split with 10-fold cross-validation setup.

Methodology

- Text preprocessing: lowercase, URL removal, mention/hashtag cleanup, digit cleanup, stopword handling.
- TF-IDF vectorization with tuned settings (max_df, min_df, ngram range, sublinear TF).
- Multinomial Naive Bayes model for binary text classification.

Outcomes

- Accuracy: 92.9%.
- ROC-AUC: 98.2%.
- Reliable separation between suicidal and non-suicidal classes.

Impact

- Illustrates how NLP can aid social good and mental-health support systems.
- Strong benchmark project for high-volume text analytics.

Tech Stack

Python, NLTK, Scikit-learn, TF-IDF, Naive Bayes

Case Study 6

INNOVA - Integrated News & Overview Analysis

Internal Tool for BPS Brebes

Jul 2025 - Sep 2025

27+ portalsAutomationFlask APISQLite

Problem statement: Manual news monitoring was slow, inconsistent, and difficult to archive for statistical analysis work.

Expand full technical breakdown

Objectives

- Automate scraping and ingestion from multiple local portals.
- Provide searchable archive and export-ready dataset.
- Improve usability through iterative product versions.

Dataset

- News content pulled from 27+ online sources including local and national portals.
- Structured archival data with source/date/category indexing.

Methodology

- Backend architecture design with scraping modules and task progress management.
- API endpoint development and data export workflow.
- Frontend iteration from low-fidelity prototype to polished user-centered v2.0.

Outcomes

- Reduced time for news collection and monitoring.
- Centralized up-to-date local news archive for analysis.
- Improved retrieval and documentation for long-term maintainability.

Impact

- Increased operational efficiency for local statistical division.
- Converted repetitive manual process into reusable digital workflow.

Tech Stack

Python, Flask, SQLite, BeautifulSoup, HTML, CSS, Pandas