Publications
Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity
Susav Shrestha, Brad Settlemyer, Nikoli Dryden, Narasimha Reddy. arXiv, 2025.
PDF CodeESPN: Memory Efficient Multi-Vector Information Retrieval
Susav Shrestha, Narasimha Reddy, Zongwang Li. ISMM 2024.
PDF Code
Storage Access Optimization for Efficient GPU‑Centric Information Retrieval
Susav Shrestha, Aayush Gautam, Narasimha Reddy. The Journal of Supercomputing, 2025.
PDFToken‑Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding
Aayush Gautam*, Susav Shrestha*, Narasimha Reddy. arXiv, 2025.
*Equal contribution
PDF