CV
Education
Texas A&M Unviersity
Doctor of Philosophy in Computer Engineering
College Station, TX, Aug 2021 - Est. 2025
Thesis: Hardware Efficient ML System Design in Knowledge Retrieval
University of Texas Arlington
Bachelor of Science in Electrical Engineering with Honors
Minor in Computer Science
Arlington, TX, Aug 2017 - May 2021
Honors Thesis: A novel remote sensing system for in-situ measurement of subsurface soil properties
Work Experience
NVIDIA
Research Intern
Austin, TX, May 2024 – Aug 2024
- Led research to accelerate LLM inference via activation and contextual sparsity.
- Built sparsely activated OPT and LLaMA models by training activation routers for MLP and Attention layers.
- Developed custom sparse GPU kernels achieving 1.5–3× speedup in MLP layers and up to 2.5× in Attention.
- Delivered end-to-end decoding speedups up to 2.2× across diverse batch sizes and sequence lengths.
Samsung Semiconductor Inc.
Research Intern
San Jose, CA, May 2022 - Aug 2022
- Reduced CPU workload by 4x and accelerated neural inference by 64% by optimizing data pipeline and model execution.
- Filed 2 patent applications for efficient neural network deployment on edge devices.
Texas A&M University
Graduate Research Assistant
College Station, TX, Aug 2021 - Est. 2025
- Led a project that resulted in a 23% speedup in embedding processing for large-scale information retrieval.
- Developed scalable multi-vector retrieval systems for large language models.
University of Texas Arlington
Undergraduate Research and Teaching Assistant
Arlington, TX, Aug 2019 - May 2021
- Developed signal processing algorithms to measure dynamic soil properties using a Radar system, simulated, designed, and fabricated electromagnetic sensors.
- Teaching assistant for EE2347, Mathematical Foundation of Electrical Engineering. Responsibilities: instructor for lab; taught sophomores algorithms and introduction to python, graded assignments.
- Teaching assistant for EE3407, Fundamentals of Electromagnetics. Responsibilites: conducted study sessions and exam reviews. Assisted in redesigning lab experiments with integration of new simulation tools like Ansys HFSS.
University of Texas Arlington
Academic Tutor, Supplimental Instructor and Mentor
Arlington, TX, Aug 2018 - May 2021
- 1 on 1 tutoring for UTSI. Academic tutor for Calculus I & II, Physcis I & II, Chemistry.
- Supplimental Instructor (SI Leader) for UTSI. SI leader for Calculus III, Differential Equation & Linear Algebra. Responsibilites: conducted biweekly study sessions, helped students learn course material and exam reviews.
- Supplimental Instructor Mentor for UTSI. Responsibilities: Managed and mentored 5-6 SI leaders per semester. Conducted training for SI leaders, hosted weekly mentoring sessions.
Publications
Shrestha, S., Settlemyer, B., Dryden, N., & Reddy, N. (2025). Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity. arXiv preprint. https://arxiv.org/abs/2505.14884
Shrestha, S., Reddy, N., & Li, Z. (2024). ESPN: Memory Efficient Multi-Vector Information Retrieval. In Proceedings of the ACM SIGPLAN International Symposium on Memory Management (ISMM 2024). https://doi.org/10.1145/3652024.3665515
Shrestha, S., Gautam, A., & Reddy, N. (2025). Storage Access Optimization for Efficient GPU‑Centric Information Retrieval. The Journal of Supercomputing. https://link.springer.com/article/10.1007/s11227-025-07118-9
Gautam, A., Shrestha, S., & Reddy, N. (2025). Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding. arXiv preprint. https://arxiv.org/abs/2504.00030
*Equal contribution
Patents
System and Method for Embeddings Retrieval
US20240330193A1 — Samsung Electronics Co., Ltd.
Link
System and Method for Processing Embeddings
US20240330290A1 — Samsung Electronics Co., Ltd.
Link
Honors
- Cambridge Learners Award 2015
- Academic Excellence Award 2016
- Dean’s List 2018-21
- Innovation Day Award 2021
Skills
Relevant Coursework: Advance Computer Architecture, Parallel Computing, Distributed Processing, Deep Learning,Machine Learning, NLP, Information Retrieval, Memory & Storage Systems, Operating Systems, Advanced Algorithms.
Technical: C/C++, Python, Java, CUDA, MATLAB, OpenCL, OpenMP, Pthreads, MPI, Pytorch, Scikit-learn, gRPC, Hadoop, Azure, AWS, Spark, Verilog, VHDL, Vitis/Vivado HLS, Object Oriented Programming (OOP), GIT, Linux.