Ching-Hsiang Chu

Ching-Hsiang Chu
朱慶翔 (in Traditional Chinese)

I am a Research Scientist at Meta since August 2020. I received my Ph.D. degree in Computer Science and Engineering from The Ohio State University at 2020. I was born in the beautiful country Taiwan and grew up there until I moved to US in 2014 for pursuing my PhD degree.

About Me

Education

Doctor of Philosophy (PhD), 2014-2020

Computer Science and Engineering,
THE Ohio State University, Columbus, OH, USA

Dissertation: "Accelerator-enabled Communication Middleware for Large-scale Heterogeneous HPC Systems with Modern Interconnects"

Advisor: Dr. Dhabaleswar K. Panda

Master of Science (MS), 2010-2012

Computer Science and Information Engineering,
National Central University, Taiwan

Thesis: "Jitter-based TCP for Incast Communication on Data Center Networks"

Advisor: Dr. Eric Hsiao-Kuang Wu

Bachelor of Science (BS), 2006-2010

Computer Science and Information Engineering,
National Changhua University of Education, Taiwan

Research Interests and Professional Skills (aka buzzwords)

High-performance Computing (HPC) AI Infrastructure Distributed AI Systems GPU Communication Deep Learning Wireless Networking PyTorch NCCL UCC/UCX CUDA MPI OpenMP Python C/C++ Java

Experience

Research Scientist, Meta

Menlo Park, CA, USA
08/2020 - Present

MSL Infra Kernel and Optimization Team
AI System Co-design team

Contributed Open-source projects:

Graduate Research Associate, The Ohio State University

Columbus, OH, USA
01/2015 - 08/2020

Department of Computer Science and Engineering
Network-based Computing Lab (NOWLAB)

Advisor: Dr. DK Panda

Core developer of MVAPICH2-GDR, a CUDA-Aware MPI library in MVAPICH project.

Software Engineer Intern, NVIDIA Corp.

Santa Clara, CA, USA
05/2018 - 08/2018

GPU Communication team

Developed a OpenSHMEM-based Key-Value storing mechanism achieving 4.8X speedup compared to SOTA GPU-based schemes.
Ching-Hsiang Chu, Potluri S, Goswami A, Gorentla Venkata M, Imam N, Newburn CJ. "Designing High-Performance In-Memory Key-Value Operations with Persistent GPU Kernels and OpenSHMEM". In Workshop on OpenSHMEM and Related Technologies 2018 Aug 21 (pp. 148-164).

Research Assistant, Academia Sinica

Taipei, Taiwan
08/2013 - 07/2014

Institute of Information Science

Supervisor: Dr. Ling-Jyh Chen

Corporal of Artillery, Army of Republic of China (R.O.C.)

Matsu, Taiwan
08/2012 - 07/2013

Army of Republic of China (R.O.C.)

* Please visit my LinkedIn for more details.

Selected Publications

I have been lucky enough to collaborate with many top-notch researchers, scientists and engineers, and co-authored 50+ peer-reviewed papers in the areas of HPC, ML Systems, Networking and Computer Architecture, you can find a near complete list in Google Scholar Citations

arXiv/Preprints

  1. Meta AI teams, "Collective Communication for 100k+ GPUs", 2025.

  2. Meta Llama team, "The Llama 3 Herd of Models", 2024.

Journal

  1. Dhabaleswar K. (DK) Panda, Hari Subramoni, Ching-Hsiang Chu and Mohammadreza Bayatpour, "The MVAPICH Project: Transforming Research into High-Performance MPI Library for HPC Community," in Journal of Computational Science, Special issue on Translational Computer Science, Vol 52, 2021. (2019 Impact Factor: 2.644)

  2. Ching-Hsiang Chu, Xiaoyi Lu, Ammar Ahmad Awan, Hari Subramoni, Bracy Elton, Dhabaleswar K. (DK) Panda, "Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast," in IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 30, no. 3, pp. 575-588, 1 March 2019. (2019 Impact Factor: 2.6)

Conference/Workshop

  1. Weiwei Chu, Xinfeng Xie, Jiecao Yu, Jie Wang, Amar Phanishayee, Chunqiang Tang, Yuchen Hao, Jianyu Huang, Mustafa Ozdal, Jun Wang, Vedanuj Goswami, Naman Goyal, Abhishek Kadian, Andrew Gu, Chris Cai, Feng Tian, Xiaodong Wang, Min Si, Pavan Balaji, Ching-Hsiang Chu, and Jongsoo Park, "Scaling Llama 3 Training with Efficient Parallelism Strategies," In Proceedings of the 52nd Annual International Symposium on Computer Architecture (ISCA '25). Association for Computing Machinery, New York, NY, USA, 1703–1716.

  2. Hao Feng, Boyuan Zhang, Fanjiang Ye, Min Si, Ching-Hsiang Chu, Jiannan Tian, Chunxing Yin, Summer Deng, Yuchen Hao, Pavan Balaji, Tong Geng, Dingwen Tao, "Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression," SC24: International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA, Nov. 17-22, 2024.

  3. Kshiteej Mahajan, Ching-Hsiang Chu, Srinivas Sridharan, Aditya Akella, "Better Together: Jointly Optimizing ML Collective Scheduling and Execution Planning using SYNDICATE," 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), Boston, MA. 2023.

  4. Q Zhou, Ching-Hsiang Chu, NS Kumar, Pouya Kousha, Seyedeh Mahdieh Ghazimirsaeed, Hari Subramoni, Dhabaleswar K Panda, "Designing High-Performance MPI Libraries with On-the-fly Compression for Modern GPU Clusters," 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Portland, OR, USA. 17-21 May 2021

    Best Paper
  5. Ching-Hsiang Chu, Pouya Kousha, Ammar Awan, Kawthar Shafie Khorassani, Hari Subramoni and D. K. Panda, "NV-Group: Link-Efficient Reductions for Distributed Deep Learning on Modern Dense GPU Systems," The 34th ACM International Conference on Supercomputing (ICS-2020), Barcelona, Spain (ONLINE due to COVID-19), June 29 - July 2, 2020. (Acceptance rate: 30%, 40/132)

Poster/Demo

  1. Ching-Hsiang Chu and Dhabaleswar Panda, "Efficient and Scalable Communication Middleware for Emerging Dense-GPU Clusters," The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'19), Denver, CO, USA, Nov. 18-21, 2019. Doctoral Showcase with TCHPC Travel Award

Dissertation

  1. Ching-Hsiang Chu, "Accelerator-enabled Communication Middleware for Large-scale Heterogeneous HPC Systems with Modern Interconnects," July, 2020.

Contact

Email

kingchc0120_AT_gmail.com