Publications

My research focuses on building efficient and scalable machine learning systems, with an emphasis on inference optimization through sparsity, hardware-aware design, and distributed architectures.

Featured Work

📚 Conference Publications

ISMM 2024 Conference

ESPN: Memory Efficient Multi-Vector Information Retrieval

Susav Shrestha, Narasimha Reddy, Zongwang Li

Presents a memory-efficient approach for multi-vector retrieval systems, achieving significant performance improvements in large-scale information retrieval tasks.

📖 Journal Publications

The Journal of Supercomputing 2025 Journal

Storage Access Optimization for Efficient GPU-Centric Information Retrieval

Susav Shrestha, Aayush Gautam, Narasimha Reddy

Optimizes storage access patterns for GPU-centric retrieval systems, delivering substantial speedups in embedding processing.

🔬 Preprints

arXiv 2025 Preprint

Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding

Aayush Gautam*, Susav Shrestha*, Narasimha Reddy

*Equal contribution

Develops adaptive calibration techniques for speculative decoding, improving efficiency in language model inference.