Virginia Tech® home

Seminar: How to scale your research on ARC efficiently: from GPU-nicorns to LLMagic

Alberto Cano

Associate Vice President for Research Computing,
Advanced Research Computing
Associate Professor of Computer Science, Virginia Tech

Friday, September 5
2:30 - 3:45 p.m.
Classroom Building, Room 260

 

Abstract


This seminar provides Computer Science students with hands-on guidance for successfully running and scaling research computing jobs on the ARC clusters. We'll cover best practices: strategic resource allocation via Slurm and Open OnDemand (including budgeting and QoS utilization), performance optimization through job tuning, and critical "do's and don'ts" for cluster etiquette. Practical sessions focus on essential workflows: crafting efficient job scripts, leveraging containerized environments, deploying GPU workloads efficiently, and managing complex LLM and AI API deployments while navigating quotas and resource monitoring. Participants will learn to identify common issues and optimize their workflows, gaining foundational profiling skills to diagnose performance bottlenecks, and understanding the principles of "cluster citizenship" to reduce interferences. Students will be able to map research problems to appropriate cluster resources, choose optimal execution patterns, and apply optimizations to reduce runtime.

Biography

Alberto Cano is the Associate Vice President for Research Computing and an Associate Professor in the Department of Computer Science at Virginia Tech. He directs the university's Advanced Research Computing unit, overseeing campus-wide high-performance computing services and initiatives to expand research computing and AI infrastructure. His research spans machine learning, data streams, concept drift, continual learning, and scalable GPU and distributed computing.