Computer Vision: Deploying Image Segmentation Models on Vertex AI

Authors: Alex Thomas, Senior Data Scientist and Matt Gela, Senior Data Scientist

In the most recent blog of our Computer Vision series, we showed you how to use Vertex AI’s AutoML to train and deploy an Object Detection model. As powerful and efficient as AutoML is, you may want the flexibility to train and deploy more bespoke models, or tackle a Computer Vision task that isn’t currently covered by AutoML. Fortunately, Vertex AI has you covered here as well, with custom options for training and deploying your models on Google Cloud. 

In this blog, we’ll show you how to train and deploy your own custom Image Segmentation model using Vertex AI. We’ll provide a background on Image Segmentation, and Vertex AI’s training options. Then, we will walk you through the step-by-step process of getting your own model up and running.


Image Segmentation

Image Segmentation is a technique used in digital image processing, where an image is partitioned into multiple different segments, each representing a different part of the image. Image Segmentation can be used for tasks like distinguishing the background from the foreground in an image, or clustering pixels in an image together based on colour similarities.

In particular, we are going to focus on building a model that can perform a particular type of Image Segmentation, called Semantic Segmentation

Semantic Segmentation

Semantic Segmentation is a type of Image Segmentation where the aim is to partition the image into semantically meaningful classes and to classify each pixel into one of these predetermined classes.

Semantic Segmentation is one step more sophisticated than Object Detection, in that rather than trying to draw a bounding box around the object, the goal is to draw a careful outline around the object that is detected so that you know exactly which pixels belong to the object and which pixels don’t.

An example of Semantic Segmentation (a type of Image Segmentation)
Fig 1: An example of Semantic Segmentation

One important thing to note is that in Semantic Segmentation we don’t differentiate between instances of the same object. For example, if there are two bikes in an image, Semantic Segmentation would label all of the pixels corresponding to either bike with the same “bike” label. This is in contrast to Instance Segmentation, which gives a unique label to every instance of a particular object in the image.

Semantic Segmentation is particularly useful in applications such as self-driving cars, where the car needs to understand exactly which pixels represent a drivable surface and hence where it is safe to drive. Additionally, it can be used in medical imaging to segment sections of the patient’s anatomy, which can make it easier to spot irregularities and diagnose serious diseases.


In this blog, the model architecture that we will be using is a special type of Convolutional Neural Network, called a U-Net. U-Net was initially developed for biomedical Image Segmentation, but the architecture is useful in many other applications, making it one of the most important and foundational neural network architectures of Computer Vision today.

To train and deploy such a model on Google Cloud, we’ll need to use Vertex AI’s custom training options.

Custom Model Training with Vertex AI

Vertex AI’s training options allow you to have more control over the model’s architecture, meaning you can train and deploy model architectures that you write yourself. It does this through the use of containers, where you can choose from either pre-built containers (available for supported frameworks such as Tensorflow), or custom container options which allow you to put your code in a Docker container and push it to the Container registry to run on Vertex AI. 

This means that you can deploy pretty much any model framework or architecture you want with relative ease, and can even accelerate the training process using GPUs. Let’s get started!


In this tutorial, we train and deploy a model which provides a Semantic Segmentation of urban street scene images. We will use the Cityscapes dataset for this tutorial, where the objective is to label each pixel in the image as one of the following eight categories listed under ‘Group’ in the table below:

Categories in the Cityscapes dataset for Image Segmentation
Fig 2: Categories in the Cityscapes dataset

Creating a Project on GCP

To get started in training a custom Image Segmentation model on Google Cloud, create a Google Cloud project and set up a Google account with sufficient permissions. For this tutorial, you will need permission to use Vertex AI and to create and manage Storage Buckets. You may also need to enable certain APIs such as the Vertex AI API, the Compute Engine API, and the Cloud Storage API.

Next, choose a region to run your project in. If you would like to use a GPU to train your model, make sure to choose a region where they are available as they cannot be used in all regions.

Finally, create a cloud storage bucket. This is the location on Google Cloud where all input and output data will be stored.

Preparing the Training Dataset

To download and use the Cityscapes data, you will need to sign up for an account and request access. Download the and files once you have access. Decompress these two files and upload the resulting folders to the base of your Cloud Storage Bucket.

Training a Model

The first step in training a model using Vertex AI custom training is to write a Python training script. For this tutorial, we have adapted this script for use with the Cityscapes dataset. Our updated version of this script, named, can be found here.

The first part of this script (see here) is a helper class which makes it easier to read in batches of the training images and their labels, known as segmentation masks, from the Cloud Storage Bucket. Each of these segmentation masks is a 1-D array, which has the same size as its corresponding image, where each entry relates to a pixel in the original image. 

The values for each pixel contain a numeric ID which maps to one of the classes in the dataset. For example, if a pixel in the segmentation mask has a value of 11, that means that the pixel is assigned to that of a building.

Note that instead of training using these classes directly we instead group them into the 8 broad categories mentioned above (flat, human, vehicle, construction, object, nature, sky, void). The id_to_cat variable is used to do this.

Finally, we define the part of the script (see here) which uses our helper classes to load in our training data from our storage bucket, create a model using the get_model method above, train a model, and save it to the storage bucket.

Packaging the Model Code to Run Vertex AI Training

The next step is to package up the training code as a Docker container that can be run by the Vertex AI training service. There are two main ways this can be done. One is to create our own custom docker container which packages all of the dependencies and training code together. 

The other, which we use in this tutorial, is to make use of Google’s pre-built training containers. These pre-built containers are a great option if you are training a model which uses one of the common Python libraries for Machine Learning, such as scikit-learn, TensorFlow, PyTorch, or XGBoost. Given the model in this tutorial uses the TensorFlow framework, we use a TensorFlow image here. Select the image you wish to use and make a note of its URI.

At the same time, we also select a pre-built prediction container image that will be used later when deploying our trained model to a Vertex AI Endpoint. Here, again, we use a TensorFlow image. Select the image you wish to use and make a note of its URI.

We can also use GPUs to train the model. Make sure you check your chosen GCP region has GPUs available before deciding to use them.

If your region doesn’t have them available, or you do not want to use them, set the accelerator_type variable in the scripts below to “ACCELERATOR_TYPE_UNSPECIFIED” and make sure the training and prediction container images you select are for the CPU version of the particular framework you are using. Otherwise, select the GPU you wish to use and make a note of its name. For this tutorial, we use a single NVIDIA Tesla K80 GPU.

To make use of the pre-built training containers, we use the Vertex AI SDK for Python (aiplatform) to write a script that automatically packages up the training code with a pre-built training container and runs it on Vertex AI. You can find this script, named, here. We need to update the following variables in the script before running it:

  • the name of our GCP project (project_id),
  • the GCP region we’re using (region), 
  • our GCS Bucket in the same region (bucket_name), 
  • the URI of our pre-built training container image (train_image), 
  • the URI of our pre-built prediction container image (deploy_image), 
  • the type of GPU we want to use (accelerator_type), and 
  • the path to our training script (script_path)

When the script has run successfully, this should start a training job on Vertex AI.

Once your training job is running, you should see a new entry appear in the Training section of the Vertex AI UI.

The training section in Vertex AI
Fig 3: The Training section of the Vertex AI UI.

You can click into your training job, select its associated CustomJob, and then select View Logs to view the logs produced as your model trains.

Logs explorer in Vertex AI
Fig 4: Logs Explorer in Vertex AI 

Deploying Your Model on Vertex AI

Wait until your model has finished training, then click again on its training job entry in the Vertex AI. This will take you to the entry for your model artifact in the Model Registry. Click on Version Details here and make a note of your model’s ID.

The model registry
Fig 5: The Model Registry

Once again we make use of the Vertex AI Python SDK to deploy the trained model to a Vertex AI Endpoint. To do this we need to supply the script with the project_id, region, bucket_name and accelerator_type variables, and also provide the model’s ID (model_id).

Once your model has been deployed, you should see an entry appear in the Endpoints section of the Vertex AI UI for your model. Make a note of the ID of this endpoint, as we will need it when using the model to make predictions.

Vertex AI endpoints
Fig 6: Vertex AI Endpoints

Using Your Deployed Model to Segment an Image

We will now use our model to segment a test image. The script takes the image, converts it to a numpy array and sends it to the endpoint. The deployed model returns a segmentation mask which is colourised and saved as a PNG (mask.png) using the Pillow Python library. After this, the script overlays the segmentation mask onto the test image and also saves this overlay as a separate PNG (overlay.png).

Remember to update the project_id, region, and bucket_name variables, and provide the path to the test image (image_path)and also the ID of the deployed endpoint (endpoint_id) in the script below. Then, run it to test your deployed model.


Finally, we can examine the two output images to see how well the model performed at segmenting the test image.

The original image used in our Image Segmentation example
Fig 7: The original image used in our Semantic Segmentation example

The segmentation mask
Fig 8: The segmentation mask

The segmentation mask overlaid on the original image
Fig 9: The segmentation mask overlaid on the original image

As you can see, in this case, the model has successfully segmented the image into its component parts. The cars parked on the street (pink) are separated from the trees (green) and buildings (red) surrounding them. The bonnet of the driving car (white) is separated from the road (grey), and the sky (blue) is separated from the trees (green). 

The full list of colour mappings is as follows:

Fig 10: The list of colour mappings

It is worth acknowledging that the boundaries of the masks here are a bit fuzzy; this is because we are shrinking all of our images to 160×160 pixels during training and prediction to simplify the training process. If you want to improve on these results, feel free to try reducing the amount of shrinkage as this may help to make the boundaries between objects sharper. Additionally, we advise experimenting with training the model for a greater number of epochs (defined in, as this should give better results.


In this article, we’ve shown you how you use Vertex AI to train and deploy a custom Image Segmentation model. Hopefully, by now you’ve realised how easy it is to train and deploy your model using Vertex AI once you have developed it. Although the focus of this blog was on Image Segmentation, you can use the same training and deployment methods via the Vertex AI Python SDK for a much larger range of Machine Learning tasks.

The scripts we’ve provided in this blog are easily adaptable, with very few modifications needed to be made to the and scripts in particular, so we hope you enjoy tweaking them for your own use case!

As 4x Google Cloud Partner of the Year, Datatonic has a wealth of experience in Computer Vision, Machine Learning, and a range of Google products and services. Get in touch to learn how your business can benefit from Computer Vision or other Machine Learning models.

Check out our other Computer Vision blogs in this series:

Part 1: Computer Vision: Insights from Datatonic’s Experts

Part 2: Computer Vision: Emerging Trends and Google Cloud Technology

Part 3: Computer Vision: Object Detection and No-Code AI with AutoML

Part 4: Computer Vision: Deploying Image Segmentation Models on Vertex AI

Part 5: Computer Vision: Generative Models and Conditional Image Synthesis

Up next
Case Studies
View now