1
A scalable multi-vector information retrieval system for large language models. Built with Python, PyTorch, CUDA. Led SSD-based embedding offload, achieving 16x memory reduction and 6.4x faster retrieval. Code
A scalable multi-vector information retrieval system for large language models. Built with Python, PyTorch, CUDA. Led SSD-based embedding offload, achieving 16x memory reduction and 6.4x faster retrieval. Code
Short description of portfolio item number 2
A new method for structured sparsity in deep neural networks using polar coordinate transformations. Enables efficient model pruning and compression for real-world deployment. Paper