Susav Shrestha
I specialize in building efficient and scalable machine learning systems for real-world deployment. My work bridges the gap between algorithm design and system-level efficiency, with a focus on accelerating large-scale inference through hardware-aware design, sparsity, and parallelism.
I am a PhD candidate in Computer Engineering at Texas A&M University, advised by Dr. Narasimha Reddy.
Research Interests
- Efficient and Sparse LLM Inference
- Hardware Efficient and High-Throughput Distributed Inference at Scale
Selected Publications
Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity Susav Shrestha, Brad Settlemyer, Nikoli Dryden, Narasimha Reddy. Paper,Code
ESPN: Memory-Efficient Multi-vector Information Retrieval
Susav Shrestha, Narasimha Reddy, Zongwang Li. In Proceedings of ISMM 2024Paper,Code
Selected Experience
- Research Intern, NVIDIA, Santa Clara, CA, May - Aug 2025
- Research Intern, NVIDIA, Austin, TX, May - Aug 2024
- Research Intern, Samsung Semiconductor, San Jose, CA, May - Aug 2022
Updates
- 🔬 Started a Research Internship at NVIDIA for Summer 2025 in Santa Clara
- 📄 Released new paper on Polar Sparsity with up to 2.2× decoding speedup in LLM inference