This site uses cookies. By continuing to browse, you agree to our use of cookies as outlined in our Privacy and Cookie Policy.
Author: Jonny Browning, Principal MLOps Engineer
Vertex Pipelines provides a great way to orchestrate your Machine Learning workloads serverlessly on Google Cloud. It is based on KubeFlow Pipelines, an open-source platform for building Machine Learning (ML) pipelines using containers.
However, unlike KubeFlow Pipelines, Vertex Pipelines does not have a built-in mechanism for scheduling pipeline runs. For many use cases, the ability to schedule pipeline runs is a key element of ML automation (e.g., scheduled batch prediction pipelines, scheduled model retraining…).
In the Vertex Pipelines documentation, Google provides an approach for scheduling Vertex Pipeline jobs using a few other GCP services:
However, did you know that you can eliminate the need for a Cloud Function and trigger Vertex Pipelines using only Cloud Scheduler?
One of the great features of the Vertex AI platform is that it is fully modular, and each component of the platform is exposed as an HTTP REST API. This includes a method to create PipelineJobs! (i.e. Vertex Pipeline runs). Since Cloud Scheduler can be used to schedule HTTP requests, we can use Cloud Scheduler to directly interact with the Vertex REST API, rather than using it to trigger a Cloud Function!
So, we can schedule Vertex Pipeline runs by crafting the right HTTP request in our Cloud Scheduler job that will interact directly with the Vertex API. Great! But, creating this HTTP request is fiddly and not user-friendly; we need a way to automate this to make it easier to use.
Introducing Datatonic’s Open-Source Terraform Module
To address this issue, Datatonic has just released an open-source Terraform module that makes it really simple to manage your scheduled Vertex Pipelines using Infrastructure-as-Code. Let’s take a look at what this looks like in practice. There are three main steps:
main.tf
file, include the module definition following the example below. Note that pipeline_spec_path
can either be a local or a GCS path.module "hello_world_pipeline" {
source = "teamdatatonic/scheduled-vertex-pipelines/gcp"
version = "1.0.0"
project = "my-gcp-project-id"
vertex_region = "europe-west2"
cloud_scheduler_region = "europe-west2"
pipeline_spec_path = "pipeline.json"
parameter_values = {
"text" = "Hello, world!"
}
gcs_output_directory = "gs://my-bucket/my-output-directory"
vertex_service_account_email = "my-vertex-service-account@my-gcp-project-id.iam.gserviceaccount.com"
time_zone = "UTC"
schedule = "0 0 * * *"
cloud_scheduler_sa_email = "my-cloud-scheduler-service-account@my-gcp-project-id.iam.gserviceaccount.com"
cloud_scheduler_job_name = "my-first-pipeline"
}
Fig 4: Example code for using Datatonic’s Terraform module for scheduling Vertex Pipelines
3. Set up CI/CD to automatically deploy your Terraform configuration when you merge your code. There are different ways to do this — for example, check out this guide in the Google Cloud Architecture Center for setting up Terraform using Google Cloud Build.
Once that’s done, you can easily specify your pipeline schedules in your Terraform configuration, and merge your code. Your CI/CD pipeline will automatically deploy the Cloud Scheduler jobs to Google Cloud.
For more details, check out the Terraform module on the Terraform registry and on GitHub. Don’t forget to follow us on Medium for more Vertex AI Tips and Tricks and much more!
Datatonic are Google Cloud’s Machine Learning Partner of the Year with a wealth of experience developing and deploying impactful Machine Learning models and MLOps Platform builds. Need help with developing an ML model, or deploying your Machine Learning models fast? Have a look at our MLOps 101 webinar, where our experts talk you through how to get started with Machine Learning at scale or get in touch to discuss your ML or MLOps requirements!
Know exactly where and how to start your AI journey with Datatonic’s
three-week AI Innovation Jumpstart *.
* Duration dependent on data complexity and use case chosen for POC model
With your own data sets, convince your business of the value of migrating your data warehouse, data lake and/or streaming platform to the cloud in four weeks.
With your own data, see how Looker can modernise your BI needs
with Datatonic’s two-week Showcase.