Generative AI 101: GenAI on Google Cloud
Authors: Alvaro Azabal Favieres, Senior Machine Learning Engineer, Adham Al Hossary, Data Scientist and Pablo Carretero Álvarez, Data Scientist
This is the first of a series of blogs dedicated to Generative AI and how it can be adopted by enterprises to leverage the power of the new generation of Large Language Models.
The release of ChatGPT in late 2022 was the culmination of a set of innovations that had been taking part in the field of Natural Language Processing since Google released the Transformer model in 2017. Large Language Models (LLMs) have demonstrated not only to be a major advancement in the field of Machine Learning, but also a great opportunity for enterprises to innovate with their use of Artificial Intelligence technology.
With the introduction of these new technologies, companies will be able to create new business value by reaching larger audiences, as well as boosting their employees’ efficiency by accelerating their access to complex information. However, as businesses explore the possibilities of Generative AI (GenAI), they often face a challenge; until very recently, leveraging this technology typically incurred significant overhead costs, while also requiring very specialised expertise in training and implementing GenAI models.
To address this, Google has introduced a range of new GenAI products aimed at advancing and simplifying the adoption of GenAI models through Google Cloud. In this blog post, we will first describe the revolution GenAI has caused in the field of AI, going through why a new set of tools and methods are required to generate value from these models. We will then discuss how Google Cloud and Datatonic can help businesses leverage the latest models in the field, as well as some potential use cases. Lastly, this blog will introduce how a custom implementation of Generative AI can enhance the usage of AI in businesses for a set of common, realistic scenarios.
Introduction to Generative AI and the Paradigm Shift
Generative AI is a branch of artificial intelligence that involves training models to generate new, original content based on patterns and trends in existing data. It can be used to create images, videos or text that can be similar to existing inputs or creative and unique. Examples of these include text generation models like PaLM, which can generate articles, stories, and even poetry; image generation models like StableDiffusion, which can create realistic images of people and animals; and music generation models like MuseNet, which can generate original pieces of music in various genres.
Generative AI introduces a new paradigm shift in the work of AI. With the rise of Foundation Models (FMs), which are the core of GenAI applications, you no longer need to train a model for a specific use case. The ease of use of these huge models is opening a new range of opportunities for users; they can now translate, summarise, draw and create content in an extremely accessible manner. Before delving into this, let’s look at what an ML workflow without Foundation Models looks like.
For classical ML problems, when a good business idea for a use case is thought of (for example predicting customer churn), a data collection process is required to obtain training and evaluation data to train an ML model from scratch. This model can then be deployed and used to address specific business problems. There is a 1:1 relationship between a trained model and a use case (see figure below). This has been the standard approach until the rise of Foundation Models and Generative AI, which enable businesses to address a multitude of use cases without training or fine-tuning ML models. Foundation Models permit a one-to-many relationship between the model and use cases.
Although most Cloud providers have offered API services that allow common Machine Learning tasks to be performed without any training process taking place on your data for a while, these were tied to one very specific application and could not be used for anything else. For example, these APIs could be used to either summarise paragraphs, extract sentiment from phrases or translate text from one language to another, amongst many other applications, but different APIs had to be used for each of these tasks. With the appearance of multi-purpose Foundation Models, you can now use the same model for all of the aforementioned text tasks. This means that rather than needing one application per task, you only need to develop a single application that will let you achieve multiple tasks.
Foundation Models are general-purpose models, meaning that their immense size (often consisting of billions of parameters) allows them to be used for multiple tasks. They can even be used for different modalities (multimodal), so the same model can handle text and images. The appearance of these models has changed the classic workflow. Now, a good business application for a use case only requires the writing of a prompt to an existing FM, which skips the whole data collection and training steps. This enables the same language model to perform an extensive number of text-to-text tasks, from translation or summarisation to dialogue, and other models such as Stable Diffusion to work incredibly well in cross-model text-to-image scenarios.
The real breakthrough of this is that the barrier of entry to AI applications in a business is drastically reduced. While in the past companies had to gather labelled data and train or tune models, getting started with Generative AI use cases only requires deciding between Foundation Models and building your application around that model. Therefore companies can focus their efforts on building the best applications (introducing the concept of prompt engineering, which will be discussed in detail in a future blog), rather than having to build and maintain AI models themselves. Although many traditional Machine Learning use cases still cannot be addressed with FMs (think of most Supervised Learning scenarios, such as fraud prediction or spam detection), the low barrier of entry characteristic of the versatile and complex FM will enable companies to tackle highly sophisticated tasks at a faster pace than ever.
Given their size and compute requirements, Foundation Models are typically hosted by third parties and accessed via API calls, an aspect that can compromise the privacy policies of companies. An alternative to this approach is the set of Generative AI capabilities that Google Cloud has announced, which still provides Foundation Models via API calls but creates a secure environment for your data, ensuring that the model and the data can only be accessed by yourself, thus increasing application privacy. By doing so, Google ensures that customer data will not be used to improve Foundation Models nor be accessed or stored by Google employees at all.
Generative AI on Google Cloud
Google Cloud has recently announced a set of differentiating GenAI products aiming to accelerate the enterprise adoption of GenAI. First, Generative App Builder has been introduced, an application that aims at powering companies with out-of-the-box solutions for search and conversational experiences. Additionally, a whole collection of GenAI services has been integrated with Vertex AI, which will allow developers to leverage GenAI on Google’s end-to-end Machine Learning platform (Vertex AI), where they will be able to securely develop and deploy their models in production-ready environments. This allows companies to have state-of-the-art models at hand and integrates the enterprise security and deployment support inherent to Google Cloud. We will go through these products in detail, as well as through some of the use cases that they can enable.
Generative App Builder allows businesses to utilise the power of conversational and search GenAI tools with no code. Without the need to manage orchestration, it creates a smooth user experience for practitioners willing to develop these two modern applications.
With Conversational AI, users have access to human-like interactions that can handle common tasks such as authenticating users, checking order status, placing orders or making payments. Developers are given granular control over the generated responses, allowing them to edit the outputs provided by the model to maximise its value.
It also grants developers the ability to add business logic when building the chatbot through a logic graph that relates how states and tasks relate to each other (see figure below). Retail companies could use this product to help their customers find their products and place orders, or the healthcare industry could use the service to schedule appointments.
Enterprise Search allows businesses from all industries to provide their users with generative search experiences. By using multi-modal inputs (i.e. more than one instruction in the form of prompts, images, map locations…), customers can obtain personalised recommendations and can follow up with questions that delve deeper into the original topic at search.
This could be useful, for example, for an analyst in a financial services company looking to find key challenges in a certain industry. Through a conversational search, this analyst could access summaries of certain aspects of the sector, supplementary resources to investigate the topic at hand or even find recommendations that might answer the original question prompted in the search.
It can also be of great use for functional internal applications and vertical solutions, allowing companies to perform knowledge-base queries on their internal document repositories. Enterprise Search allows for a degree of complexity tuning; users with no ML skill can simply use the out-of-the-box chat and search, while more advanced practitioners can tune the search or even build their own search by training and tuning the original models. Additionally, enterprise search will integrate with Vertex AI, bringing the benefits of this MLOps platform into the Search product.
GenAI on Vertex AI
Aside from Generative App Builder, Google Cloud will also integrate a range of GenAI services on Vertex AI, Google Cloud’s ML platform. Here, ML practitioners can conveniently design, fine-tune and evaluate Machine Learning models. They can also deploy and productionise their work, ensuring the prediction performance of their models is maintained across time. This is executed while guaranteeing enterprise security, building on top of Responsible AI principles that guarantee its users standards of integrity and safety of their data. Developers can focus almost exclusively on the value of the technology at hand, instead of spending their time managing the infrastructure required to safeguard all the aforementioned aspects.
Google has released two main new GenAI capabilities in Vertex AI: Generative AI Studio, an interactive User Interface (UI) where users will be able to test prompt tuning for different LLMs and tasks, and Model Garden, which allows data scientists to host and fine-tune Foundational Models securely, without handing their data to third-parties.
Generative AI Studio users will be able to run jobs such as generating images from natural language and edit these by tweaking the prompt used to create the original images. This capability could open many opportunities for marketing teams across industries, hugely accelerating content creation. To achieve this, companies need to educate practitioners to learn how to assemble prompts optimally (a procedure called prompt engineering), since a small variation in prompt formulation can lead to results of very different quality.
Users will be able to iterate through different prompt templates, to then save the optimal template for use in production by every user across the company. In addition, GenAI Studio offers the option of obtaining the code required to generate the content created through the UI for use in applications or endpoints.
Model Garden helps data scientists develop fully custom GenAI applications. This democratises a wide set of high-quality LLMs (such as PaLM, LaMDA or other open-source pre-trained models that excel in tasks such as chat, text, image, dialogue, video and code generation or completion) securely, without data ever leaving a company’s Google Cloud estate. Model Garden avoids all the overhead effort required to host the models while giving users full control over their data. This is a key advantage for sensitive use cases, common in industries such as healthcare, financial services, insurance or defence.
By using this product, ML practitioners will be able to fine-tune state-of-the-art Foundation Models on new data specific to a task to optimise performance: for example, designing models specific for the summarisation of healthcare documents or insurance claims, instead of relying on general summarisation models only. Furthermore, more advanced developers will also be able to perform model distillation, a process that consists of training a smaller student model, more adequate for production environments, that still maintains the quality of the original model. This can help significantly to reach latency requirements at the time of inference, which is extremely useful for scenarios that call for real-time answers (such as real-time chatbots). All the aforementioned models can be integrated directly into an application using an SDK and accessed via notebooks, APIs or interactive prompts. This enables seamless deployment of any model to a production-ready environment.
Lastly, Google Cloud has also hinted at the release of six new APIs related to Generative AI. These APIs are essential because they will enable developers to access and integrate these Foundation Models in end-to-end applications using orchestration tools such as LangChain. Additionally, it will allow practitioners to fine-tune these models programmatically and create pipelines in production to automate these processes. By doing so, end users can then access the Foundation Models through a UI with no code. These APIs are:
- Text: a single API encompassing a wide set of tasks such as summarisation, sentiment analysis or classification.
- Code Generation: natural language for building code, to increase productivity for developers.
- Image & Video: generate images and videos through natural language.
- Dialogue: chat functionality through an API.
- Code Completion: another product aiming to increase developers’ productivity, finishing up code at the time of typing, finding bugs and correcting in real-time (similar to GitHub’s Copilot)
- Embeddings: aiming at facilitating the obtention of high-quality semantic information from unstructured data, allowing the possibility of recommendations and search. Embeddings are at the core of vector databases and retrieval use cases, such as the Enterprise Search application mentioned at the start of the blog. This API enables the creation and maintenance of production-ready applications.
Implementing Generative AI
So far, this blog has introduced the concept of Generative AI and has explained why Foundation Models bring a new window of opportunity and lower the barrier of entry for new use case applications in many businesses. As we have also described in this blog, Google Cloud’s GenAI offerings unlock many use cases that will enhance customer experience (company chatbots, AI assistants, personalised content recommendations..), or that aim at boosting employee productivity.
With tools such as Enterprise Search, conducting market research, extracting relevant information from documents and analysing datasets will be significantly less time-consuming for specialised analysts. Although the barrier to entry for these use cases is lower than ever, customising them for individual companies and deploying them to production still poses some challenges that must be addressed.
To illustrate this, we can look at an example of Enterprise Search. While calling an API to generate the response the user is looking for is straightforward, this use case still depends on the creation of a vector database to store all company documents and information. On top of this, all the mentioned documents must be embedded (using the Embeddings API, for example) so that documents with similar characteristics can be retrieved.
Additionally, one common challenge of generative LLMs is that the same prompt written in a slightly different way can lead to very different generated answers. This makes the concept of prompt engineering a central aspect of any custom Generative AI application. While the above points make it accessible to create an enterprise search prototype, establishing an actual enterprise search product can become a more complex job.
Similarly, other challenges must be addressed before deploying a GenAI application to production. If we take the example of chatbots, during the early stage of creation, developers must always consider how to align the model with the expected behaviour, including a human in the loop process that provides feedback to guide the model. AI assistants that aid you with multiple tasks such as sending emails, automating orders, etc, are prone to prompt injection (i.e.: when the model receives a new set of instructions that differ from the original set of instructions and starts acting unsafely).
These and many other challenges will be covered in detail in our following blog, “Generative AI Applications in Production”, covering the main factors to consider when deploying applications to production. A future blog will also extend on the importance of embedding Responsible AI principles within the context of Generative AI. Lastly, in a follow-up post, we will also deep dive into vector databases, which have become the core of many Generative AI applications. Here we will weigh the pros and cons of different databases to help businesses find vector databases that align with their needs.
Datatonic is Google Cloud’s Machine Learning Partner of the Year with a wealth of experience developing and deploying impactful Machine Learning models and MLOps Platform builds.
Turn Generative AI hype into business value with our workshops and packages:
- One-day workshop
- One-day hackathon
- PoCs + MVPs
Get in touch to discuss your Generative AI, ML or MLOps requirements!