"Methods that scale with computation are the future of AI", Richard Sutton, father of reinforcement learning. Large labelled training datasets were only one of the key pillars of the deep learning revolution, the widespread availability of GPU compute was the other. The next phase of deep learning is the widespread availability of distributed GPU compute. As data volumes increase, GPU clusters will be needed for the new distributed methods that already produce produce the state-of-the-art results for ImageNet and Cifar-10, such as neural architecture search. Auto-ml is also predicated on the availability of GPU clusters. Deep Learning systems at hyper-scale AI companies attack the toughest problems with distributed deep learning. Distributed Deep Learning enables both AI researchers and practioners to be more productive and the training of models that would be intractable on a single GPU server.
Jim introduces the latest developments in distributed Deep Learning and how distribution can both massively reduce training time and enable parallel experimentation for both AutoML and hyperparameter optimization. We will introduce different distributed deep learning paradigms, including model-level parallelism and data-level parallelism, and show how data parallelism can be used for distributed training. We will also introduce the open-source Hops platfrom that supports GPUs as a resource, a more scalable HDFS filesystems and a secure multi-tenant environment. We will show how to program a machine learning pipeline, end-to-end with only Python code on Hops.