OPTIMIZING COMMUNICATION IN PARALLEL DEEP LEARNING ON EXASCALE-CLASS MACHINES