Arun

Hi
I'
Arun

Data Scientist | AI/ML Engineer | Data Analyst | Data Engineer

Passionate about solving problems using data and AI. 5 years of experience of turning real world data into actionable insights and solutions.

 

Summary

With 5 years of experience spanning technology, mining, and manufacturing industry, I am a data science and engineering professional. With Master’s degree in Data Science from Rutgers University and dual degree in Engineering from IIT Dhanbad, I am proficient in Python, R, SQL, and modern data tools and bring hands-on expertise across Data Science, Artificial Intelligence & Machine Learning, Data Analysis, Data Engineering, that drive revenue growth and optimize critical business metrics. I thrive in cross-functional environments, transforming complex data into actionable insights and leading projects from ideation to deployment. I possess strong leadership & team coordination skills, analytical thinking, clear communication, adaptability, and a collaborative approach. My business acumen enables me to align data-driven solutions with strategic objectives and deliver measurable value for the organisation. Curiosity-driven and committed to continuous learning, I’m passionate about using data and technology to solve real-world problems and contribute to high-impact, innovative initiatives.

LANGUAGES

PythonPython
SQLSQL
RR
JavaScriptJavaScript
TypeScriptTypeScript
C++C++
HTMLHTML

Machine Learning

PandasPandas
NumpyNumpy
MatplotlibMatplotlib
TensorflowTensorflow
PyTorchPyTorch
Sci-kit LearnSci-kit Learn
SeabornSeaborn
Hugging FaceHugging Face
KerasKeras

Generative AI

OpenAIOpenAI
LLama2LLama2
MistralMistral
LangchainLangchain
Natural Language ProcessingNatural Language Processing
Retrieval Augmented GenerationRetrieval Augmented Generation
Agentic AIAgentic AI
Vector DBVector DB

DATABASES

PostgreSQLPostgreSQL
MySQLMySQL
MongoDBMongoDB
DuckDBDuckDB
Elastic SearchElastic Search
SnowflakeSnowflake

INFRASTRUCTURE

AWS Cloud ServicesAWS Cloud Services
Azure Cloud ServicesAzure Cloud Services
Google Cloud PlatformGoogle Cloud Platform
DockerDocker
KubernetesKubernetes
GitHubGitHub
GitLabGitLab

TOOLS

Visual Studio CodeVisual Studio Code
Power BIPower BI
MS officeMS office
ChatGPTChatGPT
GitGit

FRONTEND

 CSS CSS
ReactReact
Three jsThree js

BACKEND

DjangoDjango
Node.jsNode.js
Next.jsNext.js
 

Work Experience

 

Selected Projects

Domain-Specific Medical LLM Fine-Tuning using LoRA & Scalable QA Pipeline

Domain-Specific Medical LLM Fine-Tuning using LoRA & Scalable QA Pipeline

Engineered a scalable, parameter-efficient fine-tuning pipeline for Llama 2 using LoRA adapters, achieving faster training and lower memory usage on GPUs. Implemented custom instruction-style prompting and leveraged float16 precision to further accelerate training, reduce inference time significantly (half) and memory usage on GPUs, while maintaining high accuracy with minimal precision degradation.

Fraud Detection with a novel math-driven oversampling strategy outperforming ADASYN, SMOTE and Tabnet (Transformer Architecture from google for Tabular Data)

Fraud Detection with a novel math-driven oversampling strategy outperforming ADASYN, SMOTE and Tabnet (Transformer Architecture from google for Tabular Data)

Developed a novel math-driven oversampling strategy that intelligently identifies and amplifies hard-to-classify samples using a custom probability-based scoring system. Engineered dynamic, adaptive sampling that targets rare and difficult instances boosting the model's focus where it matters most. Tuned class balance with precision through adaptive lambda scaling, avoiding dataset bloat. Rigorously benchmarked against SMOTE, ADASYN, TabNet, and the base model, consistently outperforming them across metrics with far fewer oversamples. Ensured robust evaluation with strict test set isolation, avoiding data leakage. This solution delivers sharper classification by strategically reinforcing model weaknesses.

AI-Driven Optimization for Maximizing Metal Recovery in Mining

AI-Driven Optimization for Maximizing Metal Recovery in Mining

Optimised reagent dosing in iron ore flotation by using Random Forest model, automating the process while minimising chemical waste. Engineered and refined a large-scale industrial dataset (580,000+ records, 29 features), conducting rigorous feature selection using OLS p-values and multicollinearity diagnostics. Delivered a robust, data-driven solution that enhanced process efficiency and supported sustainable extraction.

Sentiment Analysis of Popular Songs' Lyrics

Sentiment Analysis of Popular Songs' Lyrics

Analysed over 169,000 top Spotify tracks to uncover evolving trends in music sentiment and audio attributes using R. Conducted in-depth exploration of features highlighting how musical composition shifted across decades and artist eras. Delivered key findings showing modern songs trend toward higher energy, louder volumes, and increased danceability, while exhibiting lower acousticness and instrumentation.

Time Series Analysis with Conformal Prediction

Time Series Analysis with Conformal Prediction

Benchmarked traditional ARIMA against the state-of-the-art TSDiff diffusion model for probabilistic stock price forecasting using Yahoo Finance data across various tickers. Engineered a forecasting pipeline to evaluate model accuracy and trend-tracking capabilities on highly volatile time series. Leveraged ARIMA with dynamic retraining and TSDiff with quantile-guided DDPM sampling, generating 100 probabilistic samples per prediction. Analysed performance via MAE and MSE, revealing ARIMA's superior precision on short-horizon forecasts. Delivered key insights into the limitations of diffusion-based models on non-periodic financial data, while showcasing practical strengths of classical statistical methods.

 

Leadership, Honors and Extracurricular Activities

*

Achieved 98.7 percentile in the Common Admission Test (CAT) 2022 among 300,000 candidates, earning interviews from India’s top management institutes.

@ Common Admission Test-2022

Feedback by Common Admission Test-2022

*

Ranked in the top 0.8% out of 1 million candidates in the IIT-Joint Entrance Examination 2013, earning admission to IIT Dhanbad.

@ IIT-Joint Entrance Examination(Advanced)-2013

Feedback by IIT-Joint Entrance Examination(Advanced)-2013

*

Elected as Placement Representative and Coordinator, Student Body, IIT Dhanbad. Elected person represents his entire class in Training and Placement (Job) cell of the college.

@ IIT Dhanbad

Feedback by IIT Dhanbad

*

Founded "Vakta [Orator]", a Toastmasters-inspired club at IIT Dhanbad, empowering 100+ students in public speaking and interview readiness.

@ IIT Dhanbad

Feedback by IIT Dhanbad

*

Completed 500+ hours of active practice in presentations, public speaking, and body language over four years, resulting in significant improvement in communication skills.

@ IIT Dhanbad

Feedback by IIT Dhanbad

*

Community Service: Volunteered as a Math Teacher with "Kartavya [Duty]" (2014-2017), supporting over 50 students at one of largest student-run NGO in India.

@ IIT Dhanbad

Feedback by IIT Dhanbad