Publications

Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity

Susav Shrestha, Brad Settlemyer, Nikoli Dryden, Narasimha Reddy. arXiv, 2025.

PDF Code

ESPN: Memory Efficient Multi-Vector Information Retrieval

Susav Shrestha, Narasimha Reddy, Zongwang Li. ISMM 2024.

PDF Code

Storage Access Optimization for Efficient GPU‑Centric Information Retrieval

Susav Shrestha, Aayush Gautam, Narasimha Reddy. The Journal of Supercomputing, 2025.

PDF

Token‑Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding

Aayush Gautam*, Susav Shrestha*, Narasimha Reddy. arXiv, 2025.

*Equal contribution

PDF