Surya "Nivi" Selvaraj | Data Scientist & Credit Strategy Professional

Supervised ML / Survival Analysis

Fintech Probability of Default Engine

The Challenge: Legacy binary classification models only predict if a customer will default, failing to forecast exactly when the business will incur losses.

The Solution: Engineered an end-to-end AWS cloud pipeline using Python and Boto3. Developed a Cox Proportional Hazards model to calculate dynamic Hazard Ratios and 12-month survival curves.

The Impact: Establishes a scalable cloud architecture that translates hazard ratios into actionable intelligence, enabling credit teams to deploy targeted interventions and refinancing offers before the critical default window.

View Code Repository

AWS S3 CoxPH Model Boto3

Supervised ML / Regression

Predictive Liability & Risk Engine

The Challenge: Traditional actuarial methods rely on broad averages, leading to over-reserving (trapped capital) or under-reserving (P&L shocks) for liability claims.

The Solution: Architected an end-to-end PySpark ML pipeline on Databricks. Engineered a Gradient Boosted Tree model that captures non-linear risk interactions, outperforming standard GLMs, and tracked the deployment via MLflow Unity Catalog.

The Impact: Demonstrates the power of non-linear risk modeling via decile analysis. This framework successfully isolates routine claims from catastrophic shock losses, unlocking the ability to optimize capital reserves and pricing strategies.

View Code Repository

Databricks PySpark GBT MLflow MLOps

Supervised ML / Classification

Strategic Retention Engine

The Challenge: Reactive call-center retention efforts led to high policyholder churn rates and bloated "Cost-to-Serve" operational metrics for enterprise insurance clients.

The Solution: Architected a Databricks pipeline leveraging PySpark and XGBoost to predict customer churn probability based on historical interaction frequency, billing friction, and policy metadata.

The Impact: Provides a predictive blueprint for proactive intervention. By isolating the key drivers of churn, this engine enables operations teams to reduce Cost-to-Serve and strategically protect Customer Lifetime Value (LTV).

View Code Repository

Databricks PySpark XGBoost Classification

Unsupervised ML / Clustering

Customer Segmentation & Acquisition

The Challenge: A "spray and pray" digital marketing strategy led to high Customer Acquisition Costs (CAC) and low conversion rates across diverse demographic funnels.

The Solution: Built a data generation engine to simulate 100k+ realistic inbound leads. Designed a PySpark pipeline utilizing VectorAssemblers and StandardScaler (Z-Score Normalization) to feed into a K-Means Clustering algorithm.

The Impact: Delivers a robust clustering framework that uncovers distinct behavioral personas, empowering marketing teams to replace generic campaigns with dynamic, personalized routing to optimize Customer Acquisition Cost (CAC).

View Code Repository

Databricks PySpark K-Means Feature Scaling

Supervised ML / Time-Series

Enterprise Demand Forecasting

The Challenge: Reactive agent scheduling and manual volume forecasting lead to under-staffing during peak hours and wasted operational expenditure during lulls.

The Solution: Built a PySpark Pandas UDF architecture on Databricks to distribute the training of Facebook Prophet models. This enables simultaneous, parallel forecasting across multiple business departments.

The Impact: Establishes a highly scalable forecasting engine that predicts exact daily demand 365 days out, allowing operations teams to optimize workforce allocation and strictly control Cost-to-Serve.

View Code Repository

Databricks Pandas UDFs Prophet

Supervised ML / Survival Analysis

Industrial Predictive Maintenance

The Challenge: Legacy maintenance strategies rely on reactive repairs or fixed schedules, leading to unplanned downtime and bloated operational costs.

The Solution: Engineered an end-to-end PySpark pipeline on Databricks using synthetic IoT sensor data. Trained an Accelerated Failure Time (AFT) Survival Regression model to predict Remaining Useful Life (RUL).

The Impact: Quantifies the exact lifespan impact of industrial friction (voltage spikes, vibration), providing a mathematical foundation to shift operational strategies from reactive repairs to predictive, just-in-time maintenance.

View Code Repository

Databricks PySpark AFT Unity Catalog

Data Scientist &
Credit Strategy Professional.

Core Competencies

Technical Stack

Data Science

Credit & Strategy

Leadership

Strategic Projects

Fintech Probability of Default Engine

Predictive Liability & Risk Engine

Strategic Retention Engine

Customer Segmentation & Acquisition

Enterprise Demand Forecasting

Industrial Predictive Maintenance

Data Scientist & Credit Strategy Professional.

Core Competencies

Technical Stack

Data Science

Credit & Strategy

Leadership

Strategic Projects

Fintech Probability of Default Engine

Predictive Liability & Risk Engine

Strategic Retention Engine

Customer Segmentation & Acquisition

Enterprise Demand Forecasting

Industrial Predictive Maintenance

Data Scientist &
Credit Strategy Professional.