Reducing the Carbon Impact of AI Inference Using State-of-the-Art Infrastructure
AI has been accelerating rapidly over the past decade. Most recently, the rise of Generative AI has captured the fascination and imagination of us all – with Google at the forefront of creating Generative models that empower users to revolutionise how we complete many tasks, such as Code Assistants and Image Creation models.
This trend is predicted to continue throughout this decade, with a recent study showing that the AI market will grow 20-fold in the next seven years.
Fig 1: Predicted growth of the AI market from 2023-2030
We are creating more and more models that cover every aspect of our lives. This is fantastic from an innovation perspective, but a subject that is often overlooked, or forgotten about completely, is the energy that goes into both training AI + ML models and the energy required for model inference.
Before we dig into how we can reduce the carbon footprint of our AI estate by using Intel’s state-of-the-art infrastructure, let’s go back to basics – how does Cloud infrastructure create emissions?
Carbon Footprint of the Cloud
Over the past decade, an increasing percentage of IT infrastructure has been migrated to the Cloud. All software running in the Cloud, from applications to data pipelines to AI, runs in Data Centres (DCs) and consumes electricity.
In short, electricity can either come from dirty sources (i.e., fossil fuels) or clean sources (i.e., renewables, such as solar and wind). This means that calculating the carbon impact of a particular piece of software is dependent on several factors, including the electricity make-up of the grid at runtime. To encapsulate all of the various factors that go into calculating the carbon impact of software, we use the SCI equation.
The SCI is a method of scoring any software application, regardless of where it runs, comprising of the following:
SCI = ((E * I ) + M) per R
- E = Energy consumed by software in kWh
- I = Carbon emitted per kWh of energy (gCO²/kWh)
- M = Carbon emitted through the hardware that the software is running on
- R = Functional Unit e.g., per user or device
From this equation, we can see that the part (E * I) is heavily influential when calculating the carbon footprint of a particular piece of software.
In the simplest form, E is dependent on the execution of the software itself. Logically, if software can run more efficiently, then it will consume less energy (this is important for later!).
The I part of the equation is all about the make-up of the grid that was mentioned earlier – i.e., the “carbon intensity” of the energy being used by the Cloud provider’s DC at that time. In the case of Google Cloud, these are published for all to see here.
So now that we’ve covered the basics of Cloud carbon impact, let’s show how we can reduce the carbon impact of our AI inferencing!
Reducing the Carbon Impact of AI Inferencing on Google Cloud
The scene is set: AI is growing exponentially, it uses a lot of energy, and energy use is a key part of calculating the carbon footprint on a particular software (in our case, AI Inference).
As mentioned previously, the more efficiently a piece of software runs, the less energy it will consume. This is where Intel’s new state-of-the-art infrastructure on Google Cloud comes in for AI inference!
Introducing the new C3 machine series on Google Cloud
A new addition to Google’s ecosystem of workload-optimised infrastructure is the C3 VM instance type – powered by the 4th Gen Intel Xeon Scalable processor and Google’s custom Intel Infrastructure Processing Unit (IPU).
C3 instances were the first VMs in the public cloud with this next-gen processor which also features Intel(R) AMX (Advanced Matrix Technology) – a built-in accelerator for deep-learning training and inference workloads. Combining “Tiles” (hardware registers for storing larger chunks of data) and “TMUL” (matrix multiplication instructions), and with support for INT8 and BF16 data types, Intel AMX enables 4th Gen Xeons to deliver significantly higher training and inference performance than prior processor generations.
But how does this translate into impact in the real world? In short, it makes the C3 instance particularly good for AI Inferencing – the process of a model and receiving results.
It’s worth pointing out that the C3 VM provides us with CPU-based compute. You might be thinking “Why not use a GPU to improve efficiency?” – and GPUs are indeed very effective for AI workloads. However, GPU-based workloads tend to be considerably more expensive. This positions the C3 Instance in a unique position, where you can take advantage of the price advantage of using CPU compute for AI inference, with incredible performance.
To demonstrate the performance improvements for AI inference with the C3 instance type, Datatonic ran a benchmarking exercise against its predecessors!
Putting C3 to the test
To highlight the benefit of running AI inference on the new C3 instance, we compared it to its older cousin, the N2 instance (all running on Google Cloud). In our experiments, we ran two tests: one RESNET model and one BERT model (both with INT8 quantization and utilising the Intel Extension for PyTorch to enable AMX support) – measuring for CO² produced and time taken.
Note: to harness metrics about CO², we used the open-source Python library CodeCarbon.
The results are quite clear to see:
|Instance Type||RESNET – CO² (kgCO2e)||RESNET – Time (s)||BERT – CO² (kgCO2e)||BERT – Time (s)|
When compared to the N2 Instance type, the C3 reduces the carbon impact of AI inference by more than 50%! In addition, we received results roughly twice as quickly too – meaning that we cannot only serve predictions quicker, but we can more than halve our carbon impact in the process by simply switching the underlying infrastructure!
AI will be one of the key influences that shape the future of humankind, however, the environmental impact of AI is a topic that is often pushed to one side. If we are to live in a future where AI models are responsible for assisting or even fully owning everyday tasks, we must ensure that we build AI in a responsible manner – especially from an environmental perspective.
There are many ways to reduce the carbon footprint of your AI, such as selecting your Cloud regions to deploy AI into the greenest regions possible or utilising more efficient model architectures, but one super simple way to reduce the carbon impact of AI inference, while looking after cost, is to simply switch the underlying CPU infrastructure.
We have demonstrated that switching from an N2 instance to the new C3 instance, powered by Intel’s 4th Gen Intel Xeon Scalable processor, more than halves the carbon emissions of AI inference!
To find out how Data, AI, and Machine Learning can benefit your business and help you to become more sustainable, get in touch here.