Seminar: How to scale your research on ARC efficiently: from GPU-nicorns to LLMagic
Alberto Cano
Associate Vice President for Research Computing,
Advanced Research Computing
Associate Professor of Computer Science, Virginia Tech
Friday, September 5
2:30 - 3:45 p.m.
Classroom Building, Room 260

Abstract
This seminar provides Computer Science students with hands-on guidance for successfully running and scaling research computing jobs on the ARC clusters. We'll cover best practices: strategic resource allocation via Slurm and Open OnDemand (including budgeting and QoS utilization), performance optimization through job tuning, and critical "do's and don'ts" for cluster etiquette. Practical sessions focus on essential workflows: crafting efficient job scripts, leveraging containerized environments, deploying GPU workloads efficiently, and managing complex LLM and AI API deployments while navigating quotas and resource monitoring. Participants will learn to identify common issues and optimize their workflows, gaining foundational profiling skills to diagnose performance bottlenecks, and understanding the principles of "cluster citizenship" to reduce interferences. Students will be able to map research problems to appropriate cluster resources, choose optimal execution patterns, and apply optimizations to reduce runtime.
Biography
Alberto Cano is the Associate Vice President for Research Computing and an Associate Professor in the Department of Computer Science at Virginia Tech. He directs the university's Advanced Research Computing unit, overseeing campus-wide high-performance computing services and initiatives to expand research computing and AI infrastructure. His research spans machine learning, data streams, concept drift, continual learning, and scalable GPU and distributed computing.