Susav Shrestha
I am a PhD candidate in Computer Engineering at Texas A&M University, advised by Dr. Narasimha Reddy. My work bridges the gap between algorithm design and system-level efficiency, with a focus on accelerating large-scale inference.
My work focuses on making machine learning systems more efficient and accessible through sparsity, hardware-aware design, and distributed inference.
Research Interests
Efficient & Sparse LLM Inference
Optimizing large language models for high-throughput deployment
HPC and Distributed Systems
Building scalable, distributed architectures for efficient multi-GPU and multi-node LLM serving
Featured Publications
NeurIPS 2025
Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity
Experience
2025
Research Intern ยท NVIDIA
Santa Clara, CA ยท May - Aug 2025
2024
Research Intern ยท NVIDIA
Austin, TX ยท May - Aug 2024
2022
Research Intern ยท Samsung Semiconductor
San Jose, CA ยท May - Aug 2022
Recent Updates
2025
๐ Polar Sparsity accepted at NeurIPS 2025
