Vertex AI Tips and Tricks: Improving Debugging in Vertex AI Batch Prediction Jobs

Vertex AI
Author: Pat Wang, Machine Learning Engineer

Vertex AI Batch Prediction provides a managed service for serving Machine Learning predictions for your batch use cases. It can take input data from Google Cloud Storage (GCS)(JSON/CSV/TFRecords format) or a BigQuery table, run predictions against it, and return the results to GCS or BigQuery, respectively.

Under the hood, the Batch Prediction service creates a model endpoint to serve model predictions, and a Dataflow job to fetch the data, split it into batches, get predictions from the endpoint, and return the results to GCS or BigQuery. All of this is done in a Google-managed project, so you won’t see the model endpoint or the Dataflow job in your own project.

As a result, debugging issues can be difficult. Read on to learn how you can make it easier to debug your Vertex AI Batch Prediction jobs.

Model Serving Containers

To serve predictions from your model, you will need a serving container image. You have two choices for this in Vertex AI:

  • Use a pre-built container image provided by Google — if you are using Tensorflow, XGBoost, or scikit-learn
  • Create your own custom container image if you are using another framework, or if you have custom pre-processing steps in your serving code

For more information on how to build a container for model serving, Google has some useful resources:

The code below shows an example of a FastAPI app that can serve a model on Vertex AI:


from fastapi import FastAPI, Request
from import storage
import os

model_uri = f"{os.environ.get('AIP_MODEL_URI')}/model.pkl" # GCS path of your model object,
provided as an environment variable by Vertex

# download model artifact
gcs_client = storage.Client()
with open("model.pkl", "wb") as model_f:
gcs_client.download_blob_to_file(f"{model_uri}", model_f)

with open(model_file, "rb") as model_file:
_model = pickle.load(model_file)

# Initialise API
app = FastAPI()

# Define health check routine
@app.get("/healthz", status_code=200)
async def health(request: Request):
return "Healthy!"

# Define prediction request logic"/predict")
async def predict(request: Request):
"""Make a test prediction to demonstrate model."""
logging.debug("Successfully called route /predict.")
body = await request.json()
instances = body["instances"]
data = pd.DataFrame(instances)
# Predict and output
predictions = _model.predict_proba(data).tolist()
logging.debug(f"predictions: {predictions}")
response = {"predictions": predictions}
return response


Debugging Batch Prediction Jobs

Following the instructions above, you could build a container image that returns the correct predictions if nothing goes wrong. However, in real life, errors appear more often than expected. In that case, your Batch Prediction job will only return a vague error message:


('Post request fails. Cannot get predictions. Error: Exceeded retries: Non-OK result 500 
(<!doctype html>\n<html lang=en>\n<title>500 Internal Server Error</title>\n<h1>Internal Server
Error</h1>\n<p>The server encountered an internal error and was unable to complete your
request. Either the server is overloaded or there is an error in the application.</p>\n) from
server, retry=3.', 569)


With only this message, the team might end up spending weeks debugging an issue because:

  • The team has to try different solutions in the dark until it returns the prediction as expected.
  • Running batch prediction is time-consuming even though the volume of input data is small since the Batch Prediction service needs time to be ready (at least 15 minutes).
Adding Logging Statements

Adding more logging statements to the model serving code and checking the logs sounds like an ideal solution. However, while it works well for online prediction, it does not work well for Batch Prediction jobs.

Even though you can add logging statements in the model serving code above, you will not be able to see the logs in your Google Cloud Logs Explorer when you run the Batch Predictions. This is because the model is deployed to an endpoint in your Google Cloud project when doing online prediction. In contrast, the endpoint for batch prediction sits in a Google-managed project, which is invisible to users.

Running Test Predictions

Alternatively, you can create an endpoint, deploy the model, and run some test predictions in the Google Cloud console. Be aware that you need to pass the same data in the same format for online prediction and batch predictions. For example, if the model object contains a pre-processing step which drops one column with a specific name, this implies that your input data must contain the feature names. Your online prediction will generally be in a JSONL format, which contains feature names and data values stored in key-value pairs.

However, your Batch Prediction job might take a BigQuery table as an input. If this is the case, the Vertex AI Batch Prediction service will transform the data so that it will provide only a list of values, but not the feature names. Since your model serving code expects a different input format, your batch prediction will fail, and you won’t be able to see the logs from the model serving container.

Another downside of this method is that you have to create an endpoint which charges you 24/7 while Batch Prediction jobs only charge for each job. Remember to shut down your test endpoints once you have finished with them!

Returning an HTTP Response

There is another method you can use to debug your Vertex Batch Prediction jobs. In your model serving code, return any errors that occur as part of your HTTP response, and then they will be visible to you if you need to debug any issues."/prediction")
async def predict(request: Request):
"""Make a test prediction to demonstrate model."""
body = await request.json()
instances = body["instances"]
data = pd.DataFrame(instances)

# Predict and output
predictions = _model.predict_proba(data).tolist()
response = {"predictions": predictions}

except Exception as error:
response = {"error": f"{error}"}
return response


In the example above, rather than outputting the prediction column, an “error” column will be created in BigQuery or output files (errors) in a Google Cloud Storage bucket once the batch prediction job is finished. With this informative error message, developers can quickly take action to solve the issue, saving a lot of debugging time.

Here is an example error message which tells you that your input data for match prediction doesn’t contain the feature name. Then you can directly go to check your batch prediction inputs, making sure they contain feature names.


Post request fails. Cannot get predictions. Error: Predictions are not in the response. 
Got: {"error":""None of [Index(['primary_key', 'mean_radius', 'mean_texture', 'mean_perimeter',
\n 'mean_area', 'mean_smoothness', 'mean_compactness', 'mean_concavity',\n 'mean_concave_
points', 'mean_symmetry', 'mean_fractal_dimension',\n 'radius_error', 'texture_error',
'perimeter_error', 'area_error',\n 'smoothness_error', 'compactness_error', 'concavity_error',
\n 'concave_points_error', 'symmetry_error', 'fractal_dimension_error',\n 'worst_radius',
'worst_texture', 'worst_perimeter', 'worst_area',\n 'worst_smoothness', 'worst_compactness',
'worst_concavity',\n 'worst_concave_points', 'worst_symmetry', 'worst_fractal_dimension'],\n
dtype='object')] are in the [columns]""}.


Note: the column names above are from a public dataset, UCI ML Breast Cancer Wisconsin (Diagnostic), that we used for this example.


We have seen how debugging Vertex Batch Prediction jobs can be challenging because your model (and model serving code) are run in a Google-managed project, so you can’t see all the logs.

As we saw, the Batch Prediction service takes care of reading in your data from GCS/BigQuery, splitting it, transforming it, and sending it to a Vertex AI endpoint for predictions. As a result, sometimes your model (and model serving code) may work correctly using a Vertex AI endpoint but will return a vague error message when used in a Vertex Batch Prediction job.

Using the method described above, you can return any errors in the HTTP response, where they will be visible to you to help during debugging.

Datatonic are Google Cloud’s Machine Learning Partner of the Year with a wealth of experience developing and deploying impactful Machine Learning models and MLOps Platform builds. Need help with developing an ML model, or deploying your Machine Learning models fast? Have a look at our MLOps 101 webinar, where our experts talk you through how to get started with Machine Learning at scale or get in touch to discuss your ML or MLOps requirements!

View all
View all
Partner of the Year Awards
Datatonic Wins Four 2024 Google Cloud Partner of the Year Awards
Women in Data and Analytics
Coding Confidence: Inspiring Women in Data and Analytics
Prompt Engineering
Prompt Engineering 101: Using GenAI Effectively
Generative AI