Yao Fu (University of Edinburgh): “Efficient Sharing of AI Infrastructures with Specialized Serverless Computing”
January 29 @ 12:00 PM - 1:15 PM
Abstract:
The efficient sharing of AI infrastructures is becoming increasingly important in both public and private data centers. This demand is driven by two key factors: the proliferation of specialized AI models tailored for different users and applications, and the highly dynamic nature of requests, which are often on-demand. Dedicated GPU allocation in such scenarios results in prohibitively high costs and inefficient resource utilization.
In this talk, I will introduce serverless computing as a promising paradigm for addressing these challenges by enabling efficient, on-demand sharing of AI infrastructures. I will highlight its use cases and discuss key barriers to broader adoption. Following this, I will present ServerlessLLM, a state-of-the-art system designed to tackle key challenges in serverless large language model (LLM) inference, particularly cold-start latency. Specifically, I will cover ServerlessLLM’s novel contributions, including its checkpoint format design, locality-aware scheduling, and inference request live migration. Finally, I will outline open challenges beyond efficiency, such as fairness, privacy, and sustainability, which are critical for the future of serverless AI systems.
Biography:
Yao is a final-year PhD student at The University of Edinburgh, advised by Professor Luo Mai. His research focuses on distributed machine learning systems, with a specific interest in building efficient and cost-effective systems for large-scale AI model deployment. His work has appeared in top-tier venues such as OSDI and JMLR. He is the founder of the ServerlessLLM open-source community, which has drawn significant attention from academia, industry, and the open-source ecosystem. In 2024, he was recognized as one of the Rising Stars in ML and Systems.
Zoom Link: https://upenn.zoom.us/j/95090162762