Susav Shrestha

Researching high-throughput, low-latency LLM inference systems at scale. My focus spans model parallelism, efficient attention, and sparsity-driven optimizations for scalable distributed inference across GPU clusters.


Experience

2026โ€“

Senior AI and HPC Engineer ยท NVIDIA

Feb 2026 โ€“ Present

2024โ€“25

Research Intern ยท NVIDIA

Austin, TX ยท May โ€“ Aug 2024
Santa Clara, CA ยท May โ€“ Aug 2025
2022

Research Intern ยท Samsung

San Jose, CA ยท May โ€“ Aug 2022

Education

PhD, Computer Engineering
Texas A&M University ยท Aug 2021 โ€“ Feb 2026

Recent Updates

Feb 2026

๐ŸŽ“ Successfully defended my PhD dissertation

2025

๐Ÿ“„ Polar Sparsity accepted at NeurIPS 2025