My Projects
A showcase of my work, including web applications, machine learning, and data engineering projects.

Spotify End-to-End Data Pipepline
Data EngineeringAn end-to-end data pipeline that tries to replicate Spotify Wrapped functionality with a PowerBI dashboard, allowing users to view their personalized music insights at any time. The pipeline extracts data from the Spotify Web API, processes it using modern data engineering technologies, and presents user-friendly visualizations.

Azure F1 End-to-End Data Pipeline
Data EngineeringThis project presents the creation of a seamless data pipeline on Azure, orchestrated with PySpark, Azure Data Lake Storage (ADLS), Azure Databricks, and Azure Data Factory. The pipeline guides data through its transformation from the raw bronze stage to the refined gold layer, utilizing incremental loading and establishing external tables for in-depth analytics. At its core, Delta Lake, an open-source storage layer, guarantees ACID transactions.

Keyword Spotting with discrete vs continuous audio representations
Machine LearningThis study compares discrete versus continuous audio representations for keyword spotting, finding that while WavLM's continuous features achieved superior accuracy (0.92% error rate), discrete EnCodec representations demonstrated potential with sequence-based models despite information capture limitations.

CIFAR-10 Image classification on limited dataset
Machine LearningThis project achieved second place in a competition classifying CIFAR-10 images with only 50 training samples by comparing machine learning models (70% accuracy), custom neural networks (79.28%), and transfer learning (92% with ImageNet pre-training), demonstrating effective strategies for learning from extremely limited data.

LanguageMate: An AI-Powered Multilingual Voice Assistant
web appLanguageMate is an innovative AI-powered voice assistant designed to seamlessly handle multilingual voice interactions. The system integrates automatic speech recognition (ASR), natural language processing (NLP) via a large language model (LLM), and text-to-speech (TTS) technologies to provide a comprehensive conversational experience.

Stock's Annual/Quaterly Report Analyzer
web appA Retrieval-Augmented Generation (RAG) application focused on financial data analysis 📈, built with Python, LangChain, Ollama, and Streamlit. This application leverages the power of Large Language Models (LLMs) combined with a specialized vector database (ChromaDB) 🧠to generate summaries and calculate important metrics based on the annual and quaterly returns filed by the company. It uses LangChain to orchestrate the complex RAG pipeline, Ollama for seamless integration with local LLMs , and Streamlit to provide a sleek, user-friendly web interface

Image Classification of Grocery Products
Machine LearningThis project benchmarks three CNN architectures-ResNet-18, GoogLeNet and AlexNet-on three small image datasets (Vegetable Images, Freiburg Groceries and Grocery Store) using data-augmentation (flips, rotations, normalization), transfer learning, batch normalization, dropout and Adam optimization to maximize classification accuracy.