1

Problem: Large language models require huge memory for multi-vector retrieval.

Technologies: Python, PyTorch, CUDA, SSDs

My Role: Led SSD-based embedding offload, designed software prefetcher, and optimized retrieval pipeline.

Outcome: Achieved 16x memory reduction and 6.4x faster retrieval. PaperCode