Download

Unleashing the Power of DRA (Dynamic Resource Allocation) for Just-in-Time GPU Slicing

https://www.youtube.com/watch?v=hk2R51pw-Xg&t=566s

Here is the translation of the text to Chinese:NVIDIA 在 KubeCon + CloudNativeCon 会议上的发言中介绍了“动态资源分配 для预测工作流程”的主题。 demo 秀了两个 демонстрации，一个使用普通的库Container和上游NVIDIA DRA驱动程序，另一个展示了更高级别的功能，需要对DRA驱动程序进行修改。会议嘉宾问道，与其他 exist GPU 共享机制，如时间拍和多进程服务，与 DRA 进行比较和建议。speaker replied that while MIG is the focus of this talk, time sharing and MPS are also useful for inference workloads。一个问题被提出，INSTaSwise 如何将信息注入 NVIDIA 驱动程序？ speaker 确认，INSTaSwise actually does an injection or mutating webhook，更新 NVIDIA 驱动程序的信息。下一个问题被提出，Webhooks 对于这种Architecture 的影响，是否需要将Scheduler 扩展以追踪新的资源？ speaker replied that DRA relies on scheduler extensions, and the initial version of DRA is an alpha feature upstream， Which implies a change in the scheduler。 additionally，speaker mentioned that auto-scaling could also work on pending claims，once the claims are known，they can be bunched together and asked for a node using KP project。 speaker 还提到，有多种 Auto-scaling 解决方案，取决于你想要探索什么。