CS Seminar: “Improving the Performance of 1D Vertex Parallel GNN Training on Distributed Memory Systems”, Kutay Taşcı, 11:00AM July 31 2024 (EN)

Improving the Performance of 1D Vertex Parallel GNN Training on Distributed Memory Systems

Kutay Taşcı
Master Student

(Supervisor: Prof.Dr. Cevdet Aykanat)
Computer Engineering Department
Bilkent University

Abstract: Graph Neural Networks (GNNs) are pivotal for analyzing data within graph-structured domains such as social media, biological networks, and recommendation systems. Despite their advantages, scaling GNN training to large datasets in distributed settings poses significant challenges due to the complex task of managing computation and communication costs. The objective of this work is to scale 1D vertex-parallel GNN training on distributed memory systems via (i) two-constraint partitioning formulation for better computational load balancing and (ii) overlapping communication with computation for reducing communication overhead. In the proposed two-constraint formulation, one constraint encodes the computational load balance during forward propagation, whereas the second constraint encodes the computational load balance during backward propagation. We propose three communication and computation overlapping methods that perform overlapping at three different levels. These methods were tested against traditional approaches using benchmark datasets, demonstrating improved training efficiency without altering the model structure. The outcomes indicate that multi-constraint graph partitioning and the integration of communication and computation overlapping schemes can significantly mitigate the challenges of distributed GNN training. The research concludes with recommendations for future work, including adapting these techniques to dynamic and more complex GNN architectures, promising further improvements in the efficiency and applicability of GNNs in real-world scenarios

DATE: July 31, Wednesday @ 11:00 Place: EA 409