Susav Shrestha

I specialize in building efficient and scalable machine learning systems for real-world deployment. My work bridges the gap between algorithm design and system-level efficiency, with a focus on accelerating large-scale inference through hardware-aware design, sparsity, and parallelism.

I am a PhD candidate in Computer Engineering at Texas A&M University, advised by Dr. Narasimha Reddy.


Research Interests

  • Efficient and Sparse LLM Inference
  • Hardware Efficient and High-Throughput Distributed Inference at Scale

Selected Publications

  • Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity Susav Shrestha, Brad Settlemyer, Nikoli Dryden, Narasimha Reddy. Paper,Code

  • ESPN: Memory-Efficient Multi-vector Information Retrieval
    Susav Shrestha, Narasimha Reddy, Zongwang Li. In Proceedings of ISMM 2024Paper,Code

Selected Experience

  • Research Intern, NVIDIA, Santa Clara, CA, May - Aug 2025
  • Research Intern, NVIDIA, Austin, TX, May - Aug 2024
  • Research Intern, Samsung Semiconductor, San Jose, CA, May - Aug 2022

Updates

  • 🔬 Started a Research Internship at NVIDIA for Summer 2025 in Santa Clara
  • 📄 Released new paper on Polar Sparsity with up to 2.2× decoding speedup in LLM inference