Accelerate Machine Learning on Google Cloud with Intel Xeon processors
11th December 2018
The world’s data is expected to grow 10-fold over the next decade. Consequently, more and more sophisticated solutions to store and process data are required. Most interestingly, the Machine Learning workflows that businesses rely on to improve their customers’ experience need to be optimised for this increasing Big Data with a design enabling great results while keeping costs contained. At Datatonic, we have partnered with Intel and Google to accelerate Machine Learning for enterprises by leveraging the brightest minds and latest research in Artificial Intelligence, Data Science, and Engineering.
Customers are frequently led to believe that they must invest in new infrastructure built on GPUs to achieve their AI goals. We know this is not always the case. We want to share here the solution we have designed for one of our customers, a top-5 UK retailer, for whom we have created a proprietary recommender model to predict customer purchases across thousands of brands —all at a lower cost of ownership by leveraging Google Cloud and Intel Xeon CPUs.
Google Cloud Engine DL Images with Intel’s optimizations
Google Cloud Engine offers Deep Learning Images containing popular deep learning and machine learning frameworks. This makes it easy and fast to get started as the user does not worry about the installation and configuration of machine learning frameworks or their dependencies. Most recently, Google Cloud has launched two new deep learning images in Google Cloud Compute Engine:
– Intel® optimized Deep Learning Image: Base m12 (with Intel® MKL and CUDA 10.0). A Debian-based image with CUDA 10.0 plus Intel optimized NumPy, SciPy, and scikit-learn.
– Intel® optimized Deep Learning Image: TensorFlow 1.12.0 m12 (with Intel® MKL-DNN/MKL and CUDA 10.0). A Debian based image with TensorFlow (Intel® MKL-DNN, Intel® MKL and with CUDA 10.0) plus Intel® optimized NumPy, SciPy, and scikit-learn.
These easy-to-use images are regularly updated so that Google Cloud’s end-users always have the most up-to-date software optimizations at their fingertips to run their mission-critical workloads at scale. Deep Learning Images greatly ease the task of building Machine Learning libraries from scratch to optimise them for the hardware at hand, and facilitate the consumption of the latest optimizations without the installation and configuration overhead.
Best platform for our workload
Our work for the top-5 UK retailer encompassed millions of customers, thousands of products, and billions of answers to various online questions. Together with Google Cloud, we identified the best solution to be an ad-hoc cluster of several cloud instances that could accelerate the customer’s environment and provide the agility and scale they required.
During our testing, we compared performance, with respect to execution time and running costs, on a n1-standard-8 VM with Intel Xeon Scalable processors (aka Skylake CPUs) and, then, attaching two different Nvidia GPU offerings (K80 and V100). Through hundreds of runs, we determined that Intel Xeon Scalable processors were up to 57% faster and up to 11x cheaper for this workload.
This may look like a surprising result, but through our R&D we have experienced this behaviour consistently. Other experiments on standard datasets like Movielens and Census have given comparable results, thus leading to the understanding that Intel CPUs outshine GPUs for Machine Learning workflows involving Multi Layer Perceptron architectures.
Following our findings, we rebuilt the retailer’s customer propensity models into a set of neural networks and tooling that prepares and trains over thousands of brands in just 4 hours using TensorFlow optimized with the Intel MKL-DNN library.
The new solution enables the retailer’s Customer Insights team to leverage data from across thousands of online and offline touchpoints, and predict which brands customers will shop in the next 30 days. Their Modelling Team now has a single, highly-efficient propensity model for their entire brand portfolio. It can be retrained quickly when needed and allows for additional brands to be added without compromising performance.
With the amazing feedback obtained after sharing these benchmark results at the Google Next 2018 UK conference last month (you can find the slides presented at the conference on SlideShare), the interest in expanding our research to get a better understanding of which Machine Learning workflows run best on CPUs or GPUs is greater than ever!
If you are interested to share your R&D results with us or understand how your Machine Learning workflows can benefit from running on optimised Intel Xeon Scalable processors, do not hesitate to contact us!