Large scale optimization for deep learning