Academic Projects

As part of my Master's in Data Science at Deakin University, ranked 197th globally by QS World Rankings, I completed an extensive portfolio of 32 academic projects across diverse subjects, including Machine Learning, Modern Data Science, Real World Analytics, Data Wrangling, and Engineering AI Solutions. Each project was designed to address real-world challenges using cutting-edge tools and techniques, ranging from predictive analytics and optimization to advanced AI bot development. These experiences have provided me with a well-rounded skill set in data science and AI, equipping me to drive data-driven solutions in any professional setting. My academic performance culminated in a High Distinction with a score of 91.3%, further validating my dedication and expertise.

Engineering AI Solutions (SIG788) - 8 Projects

  • Developed the Shortage Product AI Bot using OpenAI, Azure Cognitive Services (AI Search, Blob Storage, Speech, and Computer Vision), and Flask, leveraging PSNC drug shortage data to provide personalized recommendations and address pharmaceutical supply chain challenges through multimodal interactions (text, voice, and image) (High Distinction) Reference

  • Built the Drug Tariff Advisor Bot using Azure Cognitive Language Studio and Python SDK to provide healthcare professionals in England with instant access to NHS drug tariff information, enhancing decision-making and inventory management through custom question-answering capabilities integrated with Webchat and Facebook Messenger channels (Distinction) Reference

  • Built an Object Detection model for vehicle tracking using Azure's Custom Vision API, employing frames extracted from videos to train on categories like cars, buses, bicycles, and pedestrians, achieving a precision of 94.6% and mAP of 75.0% with identified improvement areas for imbalanced data. (Distinction) Reference

  • Conducted predictive analysis for chronic kidney disease using machine learning models (Decision Trees, Random Forests) with an end-to-end pipeline of data preprocessing, feature engineering, hyperparameter tuning, and evaluation on imbalanced datasets, achieving a robust Random Forest model for early CKD detection (Credit) Reference

  • Explored Azure OpenAI services to create advanced intelligent systems using Retrieval-Augmented Generation (RAG), embeddings, and hybrid retrieval for real-world applications, emphasizing responsible AI practices and multimodal inputs for enhanced system performance (Pass) Reference

  • Designed and deployed a predictive model for Chronic Kidney Disease using Azure Machine Learning Studio, Python SDK, and Random Forest, with MLflow for experiment tracking, enhancing early CKD detection through data preprocessing and model optimization (Pass) Reference

  • Utilized Azure Computer Vision API to detect objects, tags, and landmarks in images, drawing bounding boxes with Python SDK, achieving accurate object recognition and actionable insights for real-world applications like inventory management and traffic analysis (Pass) Reference

  • Explored machine learning workflows, software engineering practices for ML, and the application of fuzzy logic in healthcare supply chain management to optimize inventory levels, improve decision-making, and enhance operational efficiency (Pass) Reference

Machine Learning (SIG720) - 8 Projects

  • Conducted a comparative analysis of models (SVM, KNN, Decision Tree, Random Forest, DNN, and RNN) using the NSL-KDD dataset for intrusion detection, focusing on addressing overfitting, class imbalance, and leveraging advanced data preprocessing and model optimization techniques (High Distinction) Reference

  • Applied advanced clustering methods (UMAP, DBSCAN, Spectral, OPTICS, Ensemble Clustering) on the TripAdvisor Review dataset to improve traveler grouping, and evaluated performance metrics like ARI, Silhouette Score, and Davies-Bouldin Index to identify optimal clustering approaches (Distinction) Reference

  • Utilized SCADI and Obesity datasets to evaluate clustering algorithms (Hierarchical, DBSCAN) and regression models (Linear Regression, Random Forest) with dimensionality reduction (PCA) and hyperparameter tuning for improved accuracy in self-care activity classification and obesity level prediction (Credit) Reference

  • Analyzed and optimized Logistic Regression, SVM, and KNN models for classification on the Yeast dataset using hyperparameter tuning, feature importance, and advanced evaluation techniques to improve generalization and address overfitting challenges (Credit) Reference

  • Compared Decision Trees, KNN, Random Forest, and Gradient Boosting models to evaluate performance on the HR-Employee-Attrition dataset (Pass) Reference

  • Implemented Logistic Regression, SVM, Decision Trees, and Gradient Boosting models with SMOTE to handle imbalanced HR-Employee-Attrition data (Pass) Reference

  • Refined Random Forest and Gradient Boosting models using hyperparameter tuning and SMOTE for better performance on an imbalanced dataset (Pass) Reference

  • Developed an MLP neural network to achieve high accuracy in handwritten digit recognition using the MNIST dataset with hyperparameter tuning (Pass) Reference

Data Wrangling (SIG731) - 8 Projects

  • Implemented machine learning algorithms (including Random Forest, Gradient Boosting, and XGBoost) on the Breast Cancer Wisconsin Diagnostic dataset to analyze predictive performance, optimize feature selection, and evaluate results using advanced metrics like ROC-AUC and F1 Score (High Distinction) Reference

  • Analyzed the Stack Overflow Dataset using Natural Language Processing (NLP) and machine learning techniques, including feature engineering, clustering, and classification, to understand user behavior, tag popularity, and community engagement patterns (High Distinction) Reference

  • Analyzed the Flights and Airports Dataset using Python and SQL to compare query execution efficiency and accuracy, integrating data on flight schedules, delays, weather, and planes to identify actionable insights for the aviation industry (Distinction) Reference

  • Analyzed hourly meteorological data from the NYC Flights Weather Dataset (2013) using Power BI to evaluate trends in wind speed, visibility, and precipitation across LaGuardia (LGA), JFK, and Newark (EWR) airports, enabling insights into operational efficiency and safety measures (Credit) Reference

  • Analyzed the NYC Flights Weather Dataset (2013) using Python to calculate and visualize daily mean wind speeds for LaGuardia Airport (LGA), employing pandas and matplotlib to highlight seasonal trends and their implications on flight operations (Pass) Reference

  • Analyzed the BMI Distribution Dataset to explore gender-specific trends using histograms, box plots, and statistical metrics, identifying outliers and insights into obesity prevalence and health risks (Pass) Reference

  • Analyzed Bitcoin to USD Daily Closing Rates (BTC-USD) for Q3 2023 using Yahoo Finance data, employing Numpy for statistical analysis (mean, median, IQR, etc.) and Matplotlib for visualizations to uncover market trends, price volatility, and outliers. (Pass) Reference

  • Analyzed Body Mass Index (BMI) and New BMI values for individuals using custom Python scripts to calculate traditional and revised BMI with an exponent of 2.5, categorized results, and visualized data with Matplotlib for insights into health and fitness levels (Pass) Reference

Modern Data Science (SIG742) - 5 Projects

  • Analyzed the Item Listings Dataset by removing duplicates, handling missing values, and splitting categories into meaningful segments. Explored pricing trends, brand impact, and category popularity through visualizations like boxplots and word clouds, providing actionable insights for pricing and category management Reference

  • Processed the NYC Taxi Demand Dataset by aggregating data into daily and hourly intervals, uncovering weekly seasonality through trend, seasonality, and residual decomposition. Built ARIMA and Holt-Winters models for demand forecasting and used Isolation Forest for anomaly detection, identifying outliers linked to holidays and events, enabling enhanced operational planning Reference

  • Analyzed Monty Hall gameplay scenarios using Python simulations to evaluate probability outcomes of sticking versus switching doors, validating theoretical insights with empirical evidence over 100,000 trials for statistical accuracy Reference

  • Analyzed Body Mass Index (BMI) values using Python scripts to compute both traditional and revised BMI calculations (exponent 2.5), categorized health levels, and visualized trends with Matplotlib for insights into health and fitness Reference

  • Processed text descriptions using NLP techniques to generate word clouds and categorize content by price quantiles, revealing patterns in descriptive language and aligning them with pricing dynamics for actionable insights Reference

Real World Analytics (SIG718) - 3 Projects

Project Overview:

  • Developed and solved advanced Linear Programming (LP) models to optimize production schedules and bidding strategies in a competitive business environment. Incorporated practical applications of LP methods, sensitivity analysis, game theory, and R programming for decision-making and profit maximization.

Garment Factory Optimization:

  • Formulated and solved LP models to maximize profit for a garment factory producing shirts and pants under resource, labor, and demand constraints. Using Excel Solver and graphical methods, determined the optimal production quantities and conducted sensitivity analysis, demonstrating robustness in profit maximization strategies.

Textile Production Planning:

  • Designed LP models for optimizing the production of Bloom, Amber, and Leaf textile products under material proportion constraints. Solved models using R and Excel Solver, achieving an optimal profit of $141,850 while meeting demand and resource limitations.

Game Theory in Bidding:

  • Analyzed a two-player zero-sum bidding game using payoff matrices, saddle points, and mixed strategies. Constructed and solved LP models in R, determining optimal bidding strategies for Sky Construction Company. Sky alternated between $10M and $40M bids with equal probability, minimizing expected losses.

Technologies & Tools:

  • Linear Programming (LP)

  • Excel Solver, R/RStudio

  • Game Theory (Saddle Points & Mixed Strategies)

  • lpSolveAPI Library in R

Reference

As part of my Post Graduate Program in Artificial Intelligence and Machine Learning from Great Learning, certified by the University of Texas at Austin, a top-ranked global institution, I completed 12 projects in various domains like Deep Learning, NLP, Computer Vision, and Reinforcement Learning. These projects addressed real-world challenges using advanced techniques such as neural networks and AI-driven solutions. This rigorous program provided me with deep expertise in AI and ML, allowing me to craft innovative, data-driven strategies. I finished with an impressive A+ score of 98%, reflecting my dedication and proficiency in this dynamic field.

Post Graduation Program Project - 12 Projects- References

  • Developed a Car Dataset Clustering Model using K-means clustering to categorize vehicles based on city-cycle fuel consumption and other attributes, enabling efficient segmentation and analysis of vehicle datasets.

  • Built a Vehicle Silhouette Classification Model using PCA for dimensionality reduction and supervised learning models, enabling accurate classification of vehicle types from diverse angles.

  • Created a Bank Marketing Prediction System leveraging machine learning to predict customer conversion potential during focused marketing campaigns, driving digital transformation strategies for a banking client.

  • Designed a Pharmacy Volume Estimation Framework using statistical and machine learning techniques to predict monthly prescriptions and volume for individual pharmacies, optimizing supply chain operations.

  • Developed a Basketball Analytics Dashboard by analyzing data of professional teams, providing actionable insights for managing investments in the league's top-performing teams.

  • Conducted Startup Ecosystem Analysis by evaluating datasets of startups, uncovering trends, and offering strategic insights for growth in the European market.

  • Developed the Pneumonia Detection AI Model using VGG19, YOLO, and deep learning techniques to diagnose pneumonia from chest X-rays with a classification accuracy of 99.34%, enhancing medical imaging analysis.

  • Built a Sequential NLP Model using LSTM and attention mechanisms for text classification tasks, enabling precise categorization of unstructured text data.

  • Created a Statistical NLP Sentiment Analysis System leveraging Naive Bayes and logistic regression models to analyze customer sentiment in textual datasets, improving sentiment-driven decision-making.

  • Enhanced Object Detection with Computer Vision by implementing advanced models like Faster RCNN and YOLOv8, improving detection accuracy and real-time processing of visual data.