1
Problem: Large language models require huge memory for multi-vector retrieval.
Technologies: Python, PyTorch, CUDA, SSDs
My Role: Led SSD-based embedding offload, designed software prefetcher, and optimized retrieval pipeline.
Outcome: Achieved 16x memory reduction and 6.4x faster retrieval. Paper | Code |