My Projects

A showcase of my work, including web applications, machine learning, and data engineering projects.

Spotify End-to-End Data Pipepline

Data Engineering

An end-to-end data pipeline that tries to replicate Spotify Wrapped functionality with a PowerBI dashboard, allowing users to view their personalized music insights at any time. The pipeline extracts data from the Spotify Web API, processes it using modern data engineering technologies, and presents user-friendly visualizations.

PythonApache KafkaApache AirflowMinIO Object StoreApache Great ExpectationsGitPostgreSQLPowerBISpotify Web APIDocker

Source Code

Azure F1 End-to-End Data Pipeline

Data Engineering

This project presents the creation of a seamless data pipeline on Azure, orchestrated with PySpark, Azure Data Lake Storage (ADLS), Azure Databricks, and Azure Data Factory. The pipeline guides data through its transformation from the raw bronze stage to the refined gold layer, utilizing incremental loading and establishing external tables for in-depth analytics. At its core, Delta Lake, an open-source storage layer, guarantees ACID transactions.

PythonPysparkADSL Gen2Azure DatabricksAzure Data FactoryRBACAzure Unity Catelog

Source Code

Keyword Spotting with discrete vs continuous audio representations

Machine Learning

This study compares discrete versus continuous audio representations for keyword spotting, finding that while WavLM's continuous features achieved superior accuracy (0.92% error rate), discrete EnCodec representations demonstrated potential with sequence-based models despite information capture limitations.

PythonPytorchSpeech BrainTransformersData AugmentationMatplotlibHyper-parameter tuningLSTMCRDNNRNNXvector

Source Code

CIFAR-10 Image classification on limited dataset

Machine Learning

This project achieved second place in a competition classifying CIFAR-10 images with only 50 training samples by comparing machine learning models (70% accuracy), custom neural networks (79.28%), and transfer learning (92% with ImageNet pre-training), demonstrating effective strategies for learning from extremely limited data.

PytorchData CleaningData Augmentation Transfer learningCustom Neural Network Architecture

Source Code

LanguageMate: An AI-Powered Multilingual Voice Assistant

web app

LanguageMate is an innovative AI-powered voice assistant designed to seamlessly handle multilingual voice interactions. The system integrates automatic speech recognition (ASR), natural language processing (NLP) via a large language model (LLM), and text-to-speech (TTS) technologies to provide a comprehensive conversational experience.

PythonFastAPIReactOllamaTTS modelASR modelLocally hosted LLMs

Source Code

Stock's Annual/Quaterly Report Analyzer

web app

A Retrieval-Augmented Generation (RAG) application focused on financial data analysis 📈, built with Python, LangChain, Ollama, and Streamlit. This application leverages the power of Large Language Models (LLMs) combined with a specialized vector database (ChromaDB) 🧠 to generate summaries and calculate important metrics based on the annual and quaterly returns filed by the company. It uses LangChain to orchestrate the complex RAG pipeline, Ollama for seamless integration with local LLMs , and Streamlit to provide a sleek, user-friendly web interface

PythonDockerOllamaAgentic RAGCromaDBLangchainStreamlit

Source Code

Image Classification of Grocery Products

Machine Learning

This project benchmarks three CNN architectures-ResNet-18, GoogLeNet and AlexNet-on three small image datasets (Vegetable Images, Freiburg Groceries and Grocery Store) using data-augmentation (flips, rotations, normalization), transfer learning, batch normalization, dropout and Adam optimization to maximize classification accuracy.

PythonPytorchPandasJuypter NotebooksHyperparameter TuningData Augmentation

Source Code