Open to Opportunities

Data Scientist &
Credit Strategy Professional.

Bridging the gap between advanced predictive analytics, scalable ML architecture, and cross-functional business execution.

Drawing on over a decade of experience across banking, insurance, and tech manufacturing, I specialize in applying cross-industry predictive frameworks to engineer modern credit and growth strategies. With expertise in Python, Databricks, and SQL, I design end-to-end analytical pipelines. Beyond the code, I thrive as a strategic bridge—translating complex data models to secure executive buy-in, mentoring technical teams, and partnering directly with marketing and business units to execute highly targeted campaigns.

View My Work Get In Touch

Core Competencies

Technical Stack

Python SQL Databricks PySpark VS Code

Data Science

Propensity Modeling Predictive Analytics Feature Engineering A/B Testing

Credit & Strategy

Portfolio Optimization Risk vs Reward Model Governance Product Cross-selling

Leadership

Executive Presentations Stakeholder Alignment Cross-Team Execution Technical Mentorship

Strategic Projects

Case studies demonstrating my ability to navigate complex enterprise data systems, engineer predictive features, and collaborate across teams to optimize business outcomes.

Supervised ML / Survival Analysis

Fintech Probability of Default Engine

The Challenge: Legacy binary classification models only predict if a customer will default, failing to forecast exactly when the business will incur losses.

The Solution: Engineered an end-to-end AWS cloud pipeline using Python and Boto3. Developed a Cox Proportional Hazards model to calculate dynamic Hazard Ratios and 12-month survival curves.

The Impact: Establishes a scalable cloud architecture that translates hazard ratios into actionable intelligence, enabling credit teams to deploy targeted interventions and refinancing offers before the critical default window.

View Code Repository
AWS S3 CoxPH Model Boto3
Supervised ML / Regression

Predictive Liability & Risk Engine

The Challenge: Traditional actuarial methods rely on broad averages, leading to over-reserving (trapped capital) or under-reserving (P&L shocks) for liability claims.

The Solution: Architected an end-to-end PySpark ML pipeline on Databricks. Engineered a Gradient Boosted Tree model that captures non-linear risk interactions, outperforming standard GLMs, and tracked the deployment via MLflow Unity Catalog.

The Impact: Demonstrates the power of non-linear risk modeling via decile analysis. This framework successfully isolates routine claims from catastrophic shock losses, unlocking the ability to optimize capital reserves and pricing strategies.

View Code Repository
Databricks PySpark GBT MLflow MLOps
Supervised ML / Classification

Strategic Retention Engine

The Challenge: Reactive call-center retention efforts led to high policyholder churn rates and bloated "Cost-to-Serve" operational metrics for enterprise insurance clients.

The Solution: Architected a Databricks pipeline leveraging PySpark and XGBoost to predict customer churn probability based on historical interaction frequency, billing friction, and policy metadata.

The Impact: Provides a predictive blueprint for proactive intervention. By isolating the key drivers of churn, this engine enables operations teams to reduce Cost-to-Serve and strategically protect Customer Lifetime Value (LTV).

View Code Repository
Databricks PySpark XGBoost Classification
Unsupervised ML / Clustering

Customer Segmentation & Acquisition

The Challenge: A "spray and pray" digital marketing strategy led to high Customer Acquisition Costs (CAC) and low conversion rates across diverse demographic funnels.

The Solution: Built a data generation engine to simulate 100k+ realistic inbound leads. Designed a PySpark pipeline utilizing VectorAssemblers and StandardScaler (Z-Score Normalization) to feed into a K-Means Clustering algorithm.

The Impact: Delivers a robust clustering framework that uncovers distinct behavioral personas, empowering marketing teams to replace generic campaigns with dynamic, personalized routing to optimize Customer Acquisition Cost (CAC).

View Code Repository
Databricks PySpark K-Means Feature Scaling
Supervised ML / Time-Series

Enterprise Demand Forecasting

The Challenge: Reactive agent scheduling and manual volume forecasting lead to under-staffing during peak hours and wasted operational expenditure during lulls.

The Solution: Built a PySpark Pandas UDF architecture on Databricks to distribute the training of Facebook Prophet models. This enables simultaneous, parallel forecasting across multiple business departments.

The Impact: Establishes a highly scalable forecasting engine that predicts exact daily demand 365 days out, allowing operations teams to optimize workforce allocation and strictly control Cost-to-Serve.

View Code Repository
Databricks Pandas UDFs Prophet
Supervised ML / Survival Analysis

Industrial Predictive Maintenance

The Challenge: Legacy maintenance strategies rely on reactive repairs or fixed schedules, leading to unplanned downtime and bloated operational costs.

The Solution: Engineered an end-to-end PySpark pipeline on Databricks using synthetic IoT sensor data. Trained an Accelerated Failure Time (AFT) Survival Regression model to predict Remaining Useful Life (RUL).

The Impact: Quantifies the exact lifespan impact of industrial friction (voltage spikes, vibration), providing a mathematical foundation to shift operational strategies from reactive repairs to predictive, just-in-time maintenance.

View Code Repository
Databricks PySpark AFT Unity Catalog