Publications
See a full list on Google Scholar
2024
Real-Time Video Generation with Pyramid Attention Broadcast
Xuanlei Zhao*, Xiaolong Jin*, Kai Wang*, Yang You
Arxiv
| paper | code | blog |Wallfacer: Guiding transformer model training out of the long-context dark forest with n-body problem
Ziming Liu, Shaoyu Wang, Shenggan Cheng, Zhongkai Zhao, Kai Wang, Xuanlei Zhao, James Demmel, Yang You
Arxiv
| paper |DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers
Xuanlei Zhao, Shenggan Cheng, Chang Chen, Zangwei Zheng, Ziming Liu, Zheming Yang, Yang You
Arxiv
| paper | code |HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
Xuanlei Zhao*, Bin Jia*, Haotian Zhou*, Ziming Liu, Shenggan Cheng, Yang You
MLSys 2024
| paper |FastFold: FastFold: Optimizing AlphaFold Training and Inference on GPU Clusters
Shenggan Cheng, Xuanlei Zhao, Guangyang Lu, Jiarui Fang, Tian Zheng, Ruidong Wu, Xiwen Zhang, Jian Peng, Yang You
PPoPP 2024
| paper | code |AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
Xuanlei Zhao, Shenggan Cheng, Guangyang Lu, Jiarui Fang, Haotian Zhou, Bin Jia, Ziming Liu, Yang You
ICLR 2024
| paper | code |