Seminar @ Cornell Tech: CS Candidate Mark Zhao

Bloomberg Center Room 401 / Cornell Tech Campus / NYC

In this talk, Zhao will emphasize the importance of building scalable systems across the entire ML pipeline. In particular, Zhao will explore how large-scale ML training pipelines, including those deployed at Meta, require distributed data storage and ingestion systems to manage massive training datasets. Optimizing these data systems is essential as data demands continue to grow. To achieve this, Zhao will demonstrate how synergistic optimizations across the training data pipeline can unlock performance and efficiency gains beyond what isolated system optimizations can achieve.