Computer Vision: Emerging Trends and Google Cloud Technology

Computer Vision
Computer Vision

Contributors: Christelle Xu, Business Analyst, Tobi Lipede, Senior Data Scientist, Matt Gela, Senior Data Scientist, Alex Thomas, Senior Data Scientist, Sofie Verrewaere, Senior Data Scientist, Daniel Miskell, Machine Learning Engineer

In the first blog of this series, we explored some of the most exciting current use cases of Computer Vision, and how they are already starting to benefit our daily lives, as well as businesses. In this blog, we spoke with some of the team again to hear more about where Computer Vision is heading, and how Google Cloud can be used to make it all happen! 

The Future of Computer Vision

Much of the future use of Computer Vision is dependent on its increased availability as it becomes more widely used; new models can build upon accumulated research and training and create innovative use cases more easily. We are only just scratching the surface of what it is capable of, so we asked some of the team about what they expect in the near future:

Where do you see Computer Vision heading in the next few years?

  1. Democratisation

Christelle Xu, Business Analyst: I think we’ll see the democratisation of tooling that Data Scientists can use to develop sophisticated use cases. So for example, something that used to take hundreds of lines of code and deep expertise can now be deployed more easily by a Data Scientist using AutoML or XGBoost. I think with respect to Computer Vision, a lot of the tooling will become more democratised and more accessible to Data Scientists, so what used to be novel will be much more common. In the next few years, we’re going to begin to see more tooling to support easier modelling of use cases. We’re already starting to see it now with Vision AI and Document AI. Those use cases will largely be in areas that drive business value.

  1. Deploying Computer Vision Use Cases to Remain Competitive

Tobi Lipede, Senior Data Scientist: In certain industries, it’s going to be necessary to keep a competitive edge by leveraging Computer Vision. In industries like manufacturing, agriculture, and others with complex supply chains, companies that don’t begin to leverage Computer Vision will lose their ability to compete quite quickly and won’t be able to scale as fast as others. In the next couple of years, I definitely see it becoming a lot more standard.

  1. Enhancing Hardware 

Currently, many companies do not realise the power of the hardware they are using in more traditional manners, or how the application of simple Computer Vision installations can power modelling that creates hugely beneficial value for their business.

The agricultural sector has already started to incorporate several use cases of artificial intelligence and Computer Vision in areas such as plant health detection and monitoring, planting, weeding, harvesting, and advanced analysis of weather conditions. We developed a similar model for Crocus which is able to classify over 5000+ different plant varieties in under 30 seconds.


Crocus' Iris uses Image Classifcation, a type of Computer Vision model
Fig 1: Some of the features available in Crocus’ mobile application, Iris.

Applying Computer Vision to such industries would enable multiple possible use cases for already-existing hardware to have a more powerful impact on businesses. For example, Computer Vision would enable the augmentation of data powered by CCTV and satellite imagery.

Christelle Xu: I think one of the biggest challenges we have is that companies want to get involved but don’t always have cameras or the right data. This is part of the reason why big factories and the manufacturing industry, for example, can accelerate Computer Vision use cases, because often they already have cameras and mature image, video and IoT data. Many companies need help not only identifying use cases but also advice about the best approaches to collect data quickly and cost-efficiently, ultimately creating meaningful use cases with strategic business impact. This is something we’ve been working on with our clients at Datatonic.


Computer Vision being used to monitor the safety of workers in a warehouse
Fig 2: Computer Vision being used to monitor the safety of workers in a warehouse

On Edge Deployment

Matt Gela, Senior Data Scientist: One of the interesting trends is that models will be deployed more and more on edge devices, meaning we can capture and process images in near real-time, without a connection to the cloud when making predictions. This allows Computer Vision to perform tasks where action may need to be taken immediately, which will greatly advance its use in areas such as manufacturing safety.

Edge computing brings computation and data storage closer to the devices where it’s being collected, rather than relying on a central location. One significant advantage of this is the increased speed of Machine Learning models. 

Christelle Xu: On edge means being able to use Machine Learning at the point of the user. For example, I can use Computer Vision on my phone or my laptop, which can run much faster while also maintaining the user’s privacy. I think part of my excitement is seeing how we reach a point where it is much more widespread. 

One example of this is Google Lens, which allows you to use Optical Character Recognition on your mobile phone. This can be used for instant search features, as well as translation: 


Google Lens using Computer Vision on edge
Fig 3: Google Lens being used on edge to translate text

Building Upon Existing Models

The future of Computer Vision is also likely to see us improving upon existing types of models, for similar use cases that we are starting to experiment with:

Alex Thomas: I think text-to-image generation is going to be increasingly used over the next few years. These are the early days for this technique and I could see it getting better and expanding as time goes on. Potentially in the future, we could even generate 3D assets just by typing out some text. 

Computer Vision is expected to develop in many ways by building upon existing work. Some of the most significant and universal developments over the next few years will be:

  • Models that require fewer data points to train on before they are ready for deployment, significantly reducing training time and cost.
  • Ability to apply pre-trained models to other use cases; this will grow exponentially as more models are developed and more work involving Computer Vision is carried out.
  • Models that incorporate emerging types of imagery, enabling us to work within three dimensions. This includes Computer Vision that can incorporate other features such as depth and density.

Furthermore, developing better models faster and more easily will help to enable the aggregation of models. By having several models in one place, we can combine them, and create a propensity model supported by scalable architecture (MLOps) with pipelines that can support multiple complex models. This would also allow for data unification through a data warehouse. To learn more about MLOps, look at some of our case studies, or check out our recent MLOps 101 webinar about how to get started with Machine Learning at scale.

Multimodal Models

We’re heading toward greater adoption of multimodal models – models with various data sets added. This can be done by aggregating the data from various models or developing a single model with customised architecture, enabling us to create models that use a combination of inputs such as image, text, and sound. These models are especially powerful in situations where we need to make a decision based on multiple different types of input, such as reviewing and analysing evidence in legal cases.

This is already being developed through Visual Question Answering (VQA), a Computer Vision task where a system is given text-based questions about an image and can infer the answer. 


Fig 4: Formula 1 cars on a race track

For example, you could type “What sport is this?”, and a trained Computer Vision model could tell you it is Formula 1. Alternatively, you could ask for the main colour of the car at the front (orange). While these examples seem relatively simple, they require our model to have a complex understanding of both image and language and there are several potential real-life use cases.

Integrating Computer Vision Models

Computer Vision models should not be seen as isolated from the more traditional data found within businesses. They can be used to exploit previously siloed data points and integrate them into Modern Data Stacks or multi-model architectures to create significant overarching business value.

Images are often accompanied by text or metadata, which can all be combined through related modelling to produce a more informed and powerful output. Computer Vision models detecting anomalies on a factory floor could be combined with other testing data to determine the impact of a product flaw or inconsistency (e.g., identifying cracks using Computer Vision combined with the vibration of moving parts, the tension of screws, or the frequency of sound emitted during testing).


Computer Vision being used for crack detection
Fig 5: Crack detection using Computer Vision

Internet of Things (IoT)

The Internet of Things describes physical objects with sensors, processing ability, software, or other technologies that connect and exchange data with other devices and systems over the Internet or other communications networks. As the technology becomes more advanced, Computer Vision will likely be used in more devices which make up the IoT.

Christelle Xu: I find the Internet of Things (IoT) interesting. With regard to Computer Vision, I’m looking forward to the increased availability, resiliency, and speed that will be enabled as a result of improved technology, particularly as edge technology matures. We may already have a lot of IoT devices in our homes, but I’m interested in the use cases that begin to exist, as this technology matures. Furthermore, from a business perspective, manufacturers will be able to create new products with differentiators, enabling them to grow their business. The company can form a partnership with Google, and ultimately, customers are excited because they can take advantage of innovative products.


Fig 6: Google Home, an IoT smart home device

The IoT already includes devices such as Google Home, as well as wearable smartwatches. Computer Vision is likely to play a bigger role in the IoT over the next few years with devices such as next-generation home security systems, which make use of Computer Vision models including facial recognition and action recognition. 

Facial Recognition – matching faces to identities. Commonly used for:Security applications, allowing people into restricted areasUnlocking smartphones

Computer Vision and Image Captioning

Sofie Verrewaere, Senior Data Scientist: I’d like to see more Computer Vision being used for people who are visually impaired to interpret the objects in their surroundings. The next frontier for Computer Vision technologies could be acquiring and utilising visual common sense reasoning so that machines can move beyond just identifying the types of objects in image data. Computer Vision could then be used to answer more complex questions, such as who is doing what and for what reason? 

Image Captioning – automatically writing descriptive captions for images and videos. Commonly used for:
+ Describing visual media for visually impaired people
+ Summarising and identifying key points in a video

Projects such as Google’s Get Image Descriptions from Google have come a long way in the field of automatic image description features. This software uses Computer Vision to generate image descriptions for images that do not have any alt text.

Computer Vision is being used in a multitude of ways to aid visually impaired people. A study of one use case found that people using a long cane or a guide dog were able to reduce the number of collisions by 37 per cent with a wearable Computer Vision aid. The user wears a camera and two wristbands that vibrate when the camera detects a potential upcoming collision, prompting the user to stop and change direction.


Computer Vision for a wearable collision avoidance device
Fig 7: Image processing unit of the wearable collision avoidance device

Over the next few years, trends suggest that developments in Computer Vision and wearable technology will become even more valuable in use cases that aim to assist visually impaired people.

Scaling Computer Vision

As previously mentioned, we are still in the ‘exploitation’ phase of Computer Vision applications in the industry. Models are being developed, but in a lot of cases, they are not being applied at scale yet. There are a few reasons for this:

  • Image and video data can initially be difficult to work with; they can take up more space than some other types of data and have different formats.
  • Companies are not aware of the value that this data holds; they know that their spreadsheets and tables have value but not necessarily the image and video data that they collect.
  • Computer Vision can sometimes be more costly than ‘standard’ Machine Learning models and sometimes requires a higher upfront investment.

However, deploying effective Computer Vision models can provide a huge Return On Investment (ROI). Computer Vision can be used to automate many tasks, leading to lower running costs and less manual effort for less complex tasks. It can also be used to find less expensive solutions to problems, often by simply integrating Computer Vision into existing hardware to remove the need for manual monitoring. The sooner businesses invest in Computer Vision, the sooner they will be able to benefit from these long-term advantages!


Using Google Cloud

Obstacles to developing and productionising effective Computer Vision models can be overcome by using Google Cloud, and its various applications and services. 

Why use Google Cloud? What are some of the useful tools available?

Alex Thomas: The big tool that you’ll hear a lot of people talking about, for Computer Vision, in particular, is Google Cloud’s AutoML. At the moment you can use it for Object Detection and Image Classification. It’s relatively straightforward to develop a solution as you don’t have to use any coding or know Machine Learning in-depth to get started. On the most recent project I worked on, we used AutoML, and it was easy for us to iterate on and improve our model throughout the project. We could train a model, quickly see how it performed, tweak the images, and then generate an improved model. I think that’s the silver bullet that Google Cloud has over many of its competitors…

…Google also created a framework called TensorFlow which is a key piece of software used to build neural networks for Computer Vision work. A lot of Google Cloud services tie in well with TensorFlow. If you’ve built something that’s really cutting-edge, it’s easier to deploy it with Google Cloud and get it out there than with other platforms. You can deploy a high-tech model with only a few lines of code.

Being able to get started on Machine Learning models without needing massive amounts of coding experience makes it a lot more accessible for businesses, greatly reducing the time and cost to develop and deploy a model. 

Tobi Lipede: I’ve previously worked with other cloud providers. I think what distinguishes Google Cloud is how easy it is to get started. Something as simple as setting up a notebook environment in Google Cloud can be done without major DevOps/SRE overheads. Their hierarchical permissions model is simple to reason about while maintaining security. Going from the notebook to production is easy too – integrations between the (Vertex AI) workbench, BigQuery, the feature store and endpoints work really well.

Google Cloud has many great APIs that trickle down from Google’s cutting-edge research, which will enable customers to implement Computer Vision features with ease and great benefit.” – Daniel Miskell, Machine Learning Engineer 

Christelle Xu: I think for Google Cloud, the way that it tries to differentiate itself is in the Machine Learning space with tools such as Vertex AI. With Computer Vision in Google Cloud, there are loads of features available through AutoML, letting you use tools like Vision AI and Document AI. These are fast-forwarded tools that use transfer learning; a lot of the deep learning has already been done and all you need to do is train it on your data, and all of a sudden, you have the power of Google Cloud behind you which you can apply to your business use case.

Google's Document AI
Fig 8: Google’s Document AI turns unstructured documents into clear structured insights

The features and services provided by Google will continue to enable rapid progress toward the exciting Computer Vision developments and use cases discussed in this blog. The next blog in this series will talk you through setting up your own Object Detection model using Vertex AI’s AutoML without a single line of code!

As 4x Google Cloud Partner of the Year, Datatonic has a wealth of experience in Computer Vision, Machine Learning, and a range of Google products and services. Get in touch to learn how your business can benefit from Computer Vision or other Machine Learning models.


Check out our other Computer Vision blogs in this series:

Part 1: Computer Vision: Insights from Datatonic’s Experts

Part 2: Computer Vision: Emerging Trends and Google Cloud Technology

Part 3: Computer Vision: Object Detection and No-Code AI with AutoML

Part 4: Computer Vision: Deploying Image Segmentation Models on Vertex AI

Part 5: Computer Vision: Generative Models and Conditional Image Synthesis

View all
View all
Partner of the Year Awards
Datatonic Wins Four 2024 Google Cloud Partner of the Year Awards
Women in Data and Analytics
Coding Confidence: Inspiring Women in Data and Analytics
Prompt Engineering
Prompt Engineering 101: Using GenAI Effectively
Generative AI