svg

Projects

Friend-Lite Audio Intelligence Platform
AGIFastAPIWebSocketsQdrantASR

Friend-Lite / Chronicle: Local-first Audio Intelligence

Local-first audio intelligence platform exploring small-model AGI. | :--- | :--- | | FastAPI + WebSockets | Qdrant (Semantic) | Parakeet ASR + Hindi LoRA | Exploring the boundaries of local-first AI by building an audio-first intelligence platform. The goal was to create a system that could ingest real-time audio, process it using small but efficient models, and maintain a persistent semantic memory without relying on cloud-heavy infrastructure. I engineered the core real-time audio ingestion pipeline using FastAPI and WebSockets to handle low-latency data streams. To make the intelligence actionable, I implemented a semantic memory layer via Qdrant and integrated specialized adapters like Hindi LoRA for localized language support. --- *Efficient local intelligence.* 1. Low Latency First: Real-time audio requires immediate processing. We prioritized WebSocket efficiency to ensure no data loss during ingestion. 2. Semantic Retrieval: Instead of keyword search, we used vector embeddings to allow the agent to "remember" the context of past conversations. 3. Human-in-the-Loop: Designed a feedback loop for alignment, allowing the user to correct and refine the agent's understanding in real-time. --- A robust, local-first architecture: * Real-time Ingestion: FastAPI + WebSockets for handling live audio streams. * Semantic Memory: Vectorized storage using Qdrant for long-term context retention. * Efficient ASR: Leveraging Parakeet ASR with Hindi LoRA adapters for high-accuracy local inference. Github: [github.com/0xrushi/friend-lite](https://github.com/0xrushi/friend-lite)

HeteroShard distributed GPU training
HomelabLLMsPythonOpen Source

Frankenstein GPUs: Training LLMs on Mixed Hardware

A DIY framework for running LLMs across whatever random GPUs I had lying around (Nvidia & AMD). If you try to mix GPUs today, you hit a wall: * Driver Hell: Trying to get PyTorch Distributed to work with mismatched CUDA versions is a recipe for a headache. * The "Walled Garden": Most frameworks just assume you are using a data center full of identical A100s. * Wasted Hardware: I had capable GPUs sitting idle just because they weren't the "right" brand or generation. I fell down a rabbit hole of research papers (Cephalo, HetHub) that claimed mixed-hardware training was mathematically possible. But none of them had code I could actually download and run. The blockers weren't physical; they were software. If I just treated the GPUs as separate nodes and passed data via simple network sockets, I could bypass the driver issues entirely. I built a custom pipeline that acts as a translator between machines: * Layer Splitting: The model gets chopped up. GPU A takes layers 0-15, GPU B takes 16-32. * The Coordinator: One script loads the data and sends tensors over the network to the other machine. * The Profiler: A script that runs a quick speed test on each GPU before training to figure out the perfect split. --- I looked at everything—DeepSpeed, Megatron-LM, TorchGpipe. They were all too rigid. I wanted something that felt like LEGOs, not a pre-built model kit. I designed a simple topology. Machine A does its math, then throws the results over the local network (TCP/IP) to Machine B. Because they don't share memory, they don't need to share drivers. I needed to solve the "slow kid in class" problem. I implemented a profiler (inspired by the Cephalo paper) that measures compute speed and memory usage to auto-balance the load. I ran benchmarks on: Single RTX 5090 (289s), AMD Strix Halo alone, and the distributed pipeline together. It actually worked. By splitting the VRAM load, I could train faster than on the single high-end card because I wasn't bottlenecking memory. Single RTX 5090: 289 seconds. Distributed Pipeline: 184 seconds. Speed isn't everything In pipeline parallelism, network latency is a real factor. But for LLMs, VRAM is usually the bigger constraint. Trading a little network lag for double the VRAM is a trade I'll take any day. Open Source is key All the enterprise solutions ignored this use case because it doesn't make money. But for the homelab community, this is a game changer. Github: [github.com/0xrushi/HeteroShard](https://github.com/0xrushi/HeteroShard)

Predictive Analytics Platform
MLOpsData EngineeringPlatform

Operationalizing Predictive Analytics: From Experiment to Enterprise Platform

Transformed a disconnected proof-of-concept into a scalable forecasting engine by engineering the missing data foundations. Tags: `Data Engineering` `MLOps` `Platform Architecture` | :--- | :--- | | Enterprise Operations | Data Engineering, Data Science, Operations Stakeholders | Data Engineer, Platform Architect | A large operational organization had developed an early forecasting model to predict critical field events. While the math showed promise, the model was a "science project" stuck in a silo. It relied on fragmented logs, external signals, and disconnected workflows. It wasn't driving decisions because it wasn't connected to the systems where decisions actually happened. Acting as the bridge between experimental data science and production engineering, I led the platform architecture and data engineering efforts. I shifted the focus from "improving model accuracy" to "building the infrastructure to support it." I designed the pipelines, feature engineering workflows, and deployment architecture necessary to turn a static script into a living, scalable data product. --- *A model is only as good as the infrastructure that feeds it.* 1. Engineering Before Algorithms: What looked like a modeling problem was actually a data availability problem. We prioritized fixing the data supply chain before tuning the predictive math. 2. Integration Is The Product: A forecast is useless if it sits in a database. We designed the architecture backward from the operational dashboard, ensuring insights were consumable in real-time. 3. Pipelines Over Patches: We moved away from ad-hoc data pulls and built repeatable, monitored pipelines. Reliability was treated as a feature, not an afterthought. --- The analytics initiative was failing to reach production due to significant technical debt: * Fragmented Data Ecosystem: Critical predictive signals were trapped in disconnected logs and historical records. * Inconsistent Quality: Feature readiness was low, with sparse datasets limiting model granularity. * The "Deployment Gap": There was no architecture to get the model's output into the hands of operators. * Zero Lifecycle Management: The model lacked monitoring, making it impossible to trust over time. The organization believed they had an *accuracy* problem. In reality, they had an *infrastructure* problem. The model worked mathematically, but it failed operationally because the data feeding it was brittle and the output had nowhere to go. We treated forecasting as a data product, not an experiment: * Unified Data Layer: A consolidated pipeline architecture merging internal logs, external signals, and system states. * Scalable Feature Engine: Automated workflows to turn raw, sparse data into reliable predictive features. * Production Integration: A robust framework for embedding model outputs directly into operational platforms. --- Reframed the initiative from a modeling exercise to a platform engineering effort. We audited existing workflows and identified the four core gaps: fragmentation, feature engineering, deployment, and integration. Designed pipelines to consolidate distributed predictive signals. This involved heavy timestamp alignment, schema harmonization, and event normalization to create a single, trustworthy analytical dataset. Forecast performance depended on signal quality. We engineered infrastructure to handle sparse event indicators and structured datasets to support repeatable feature generation, enabling faster experimentation. Ensured predictive insights were actually consumed. We defined data interfaces for dashboard integration and built output pipelines that fed directly into the tools operators used daily. --- Foundations First By prioritizing data engineering over model tuning, we created a dataset that actually supported high-granularity forecasting, proving that better pipes lead to better predictions. From Project to Product We successfully transformed an isolated proof-of-concept into a scalable capability. The model is no longer a static experiment; it is an embedded part of the operational workflow. The Engineering Edge The project demonstrated that in complex operational environments, the barrier to AI isn't the AI itself—it's the messy, fragmented data ecosystem underneath it. We fixed the ecosystem to unlock the value.

Sports AI Knowledge System
AIRAGData Engineering

Sports RAG: Transforming Rulebook Knowledge with AI

Transforming a static, fragmented rulebook into a dynamic AI-powered knowledge engine to resolve operational bottlenecks. Tags: `GenAI Strategy` `Data Engineering` `Operational Efficiency` | :--- | :--- | | Sports Governance | Data Engineering & Domain Experts | Data Engineer, Architect | The rulebook is not just a document; it is the law of the game. But as the sport expanded globally, the governing body faced a crushing operational bottleneck: 27,000+ annual inquiries from players seeking clarification on complex scenarios. With only a handful of qualified experts capable of interpreting the dense hierarchy of rules, the system was unscalable. The organization was drowning in email queues, risking inconsistent rulings and delayed responses. I translated this operational crisis into a data and AI platform challenge. I spearheaded the end-to-end architecture of a Retrieval-Augmented Generation (RAG) system—moving from static PDFs to a vectorized knowledge base. I aligned domain experts and engineering requirements to build a system that didn't just "chat," but provided cited, defensible governance decisions at scale. --- *Trust is built on explainability, not just accuracy.* 1. Governance is the Guardrail: In a regulatory environment, "hallucinations" are liabilities. We architected the system to prioritize grounding over creativity—if the model cannot cite the specific rule article, it refuses to answer. 2. Context is the Query: Players describe messy, real-world scenarios; the rulebook uses rigid, legalistic terminology. The architecture had to bridge this semantic gap, translating "user slang" into "governance logic." 3. Evaluation is Engineering: Building the pipeline is easy; proving it works is hard. We established a rigorous "Ground Truth" framework, treating evaluation metrics as first-class citizens in the development lifecycle. --- The organization's knowledge ecosystem was analog and fragmented: * Expert Bottlenecks: Deep institutional knowledge was siloed within 3-4 individuals. * Ambiguous Inputs: Users rarely used official terminology, making keyword search useless. * Complex Hierarchy: Hundreds of sub-rules, exceptions, and addendums were buried in static documents. * Zero Margin for Error: Incorrect answers could impact the integrity of the sport. The core challenge wasn't generating an answer—it was identifying the context. A standard Language Model (LLM) fails when asked about specific rules because it lacks the "legal precedent" of the sport. We realized we needed to treat historical support emails not just as logs, but as a living library of interpretations to augment the static rulebook. We moved from manual interpretation to an intelligent digital assistant: * A RAG-Based Knowledge Engine: Combining vectorized rulebooks with 10,000+ historical expert conversations. * Semantic Search Layer: A retrieval system capable of understanding the *intent* of a question, not just matching words. * Cited Evidence: A frontend that displays the generated answer alongside the specific rule extracts used to form the opinion. --- Analyzed the anatomy of 27,000 annual inquiries. We mapped the "question patterns" against the rulebook hierarchy to understand where the gaps in understanding were occurring. Designed a secure data flow: User Input → PII Redaction → Semantic Retrieval (Rules + History) → Context Injection → LLM Generation. This ensured data privacy while leveraging the full power of the organization's knowledge. We discovered that ingesting the rulebook as a whole failed. We shifted to a "chunking" strategy, treating individual articles and email threads as discrete semantic units, significantly boosting retrieval precision. We didn't launch on faith. We built a validation set of 30 complex "edge case" questions and measured the model's output against answers written by the Chief Rules Official. --- > "I know the rule exists, but I don't know what it's called. I tried searching the PDF for 'obstruction' but there are 50 results and none of them fit my specific scenario. I just emailed support because I gave up." > > — Archetype: The Player Evaluation Outcomes *(Achieved ~0.85 similarity score against expert Ground Truth answers for complex scenarios.)* --- Granularity Drives Accuracy Breaking down the "monolith" of data into smaller, semantic chunks was the turning point for accuracy. The machine needs distinct concepts, not long chapters. Hybrid Knowledge Bases The Rulebook provided the "Theory," but the Historical Emails provided the "Practice." Combining them allowed the AI to handle edge cases that the rules technically covered but didn't explicitly describe. From Chatbot to Infrastructure This project wasn't just about building a chat interface; it was about turning a complex, static ecosystem into a queryable API that can power future public-facing tools and referee training modules.

too lazy to browse? feed this to Claude!