Contributors: Christelle Xu, Business Analyst, Tobi Lipede, Senior Data Scientist, Matt Gela, Senior Data Scientist, Alex Thomas, Senior Data Scientist, Sofie Verrewaere, Senior Data Scientist, Daniel Miskell, Machine Learning Engineer
Computer Vision is a field of Artificial Intelligence that enables computers to derive meaningful information and data from digital images, videos and other visual inputs. This blog, the first in a series about Computer Vision, will cover a brief history of Computer Vision, and then look at some of the most exciting use cases right now. Our future blogs will explore the future of Computer Vision, how Google Cloud Platform can help deploy your Computer Vision models, and how to create your own Object Detection model!
Computer Vision has been used for decades, becoming more popular in 2012 with the release of AlexNet, a Convolutional Neural Network (CNN) architecture, that won the ImageNet Large Scale Visual Recognition Challenge. AlexNet’s success established numerous best practices for CNNs moving forward, and in the last ten years, Computer Vision has evolved rapidly due to improved computational power, better graphics processing, and modern GPUs. This has been accelerated by the creation of large labelled datasets, such as ImageNet, which is comprised of 14 million images.
Due to these advances, Machine Learning models are now much better feature extractors than humans and are more efficient in identifying them, to the extent that they can detect anomalies or differences beyond the perception of traditional human programming. As we will see, this has several practical applications.
To get a deeper understanding of Computer Vision, we spoke with Datatonic’s leading Machine Learning and Data Science experts to get their insights:
When did you first experience or hear about Computer Vision? What was the first project you worked on?
Christelle Xu, Business Analyst, former Data Scientist: I heard about Computer Vision very early on in my data science career. The first Computer Vision project I worked on was in graduate school where we used pixels to try to analyse what number was shown onscreen. It’s quite a common Computer Vision project. My first professional experience was an insurance Hackathon project with Datatonic. We were trying to develop a model that extracts relevant information in order to create a document for an insurance policy case.
Tobi Lipede, Senior Data Scientist: I think I first came across it a good few years ago now. My Master’s dissertation was about Image Segmentation for medical images, so I gained some experience there. The models were slightly different to what people are using now, but Computer Vision really caught my attention. Then, I taught myself a little bit more about it through an online Stanford University Computer Vision course.
|Image Segmentation: Partitioning a digital image into multiple image segments, and labelling the individual pixels that belong to each segment or object. This is used extensively in self-driving cars.|
Matt Gela, Senior Data Scientist: Hearing about self-driving cars is the one that comes to mind before I started my career in Data Science! The first project I worked on was with an aviation security client. We created an object detection model using AutoML to help improve the detection rate of prohibited objects.
The third blog in this series will walk you through how to create your own object detection model using AutoML, without writing a single line of code!
|Object Detection: Detection of objects in an image and their bounding box. Commonly used for:|
+ Detecting faults such as cracks on a production line, automating quality inspection
+ Vehicle detection in warehouses
+ Inventory management
What gets you excited about Computer Vision?
Alex Thomas, Senior Data Scientist: Probably its wide variety of use cases. Traditionally, Data Science has huge business value. It can help businesses in terms of marketing such as targeting the right customers, but I think Computer Vision has a wider scope. There are so many different areas where we can use Computer Vision, and I think that’s the most exciting thing about it.
Sofie Verrewaere, Senior Data Scientist: I think probably its potential in the Healthcare industry. When you look at medical diagnosis, Computer Vision is reaching a point where it is better at detecting some diseases than doctors because they have better visibility. It definitely has a lot of medical applications. I’ve seen Computer Vision being used to detect certain types of cancers on medical scans. Also, Computer Vision being used to combat climate change. Anything that’s “AI for good” I would say. It can also be used for faster recovery after natural disasters, meaning the disasters can be overcome more quickly.
More than 200 million people are threatened by floods, hurricanes, wildfires and other natural disasters each year; this will likely increase due to climate change. However, Computer Vision is already being integrated with satellite imagery and drone cameras to monitor the amount of damage caused to different buildings and areas following natural disasters. This information can be combined with other data and used to identify the best places to set up temporary schools and medical tents, and where relief workers should prioritise reconstruction efforts.
Furthermore, Computer Vision systems allow homeowners to take photos of the damage to their homes after natural disasters, enabling them to better predict repair costs and receive insurance claim payouts months faster than before these systems were used.
Daniel Miskell, Machine Learning Engineer: With recent breakthroughs, we’re now seeing very exciting applications in Computer Vision. We can create photos of humans that have never existed and we can imagine a section of fictional writing through text-to-image models. I’m very excited about the many ways these techniques are going to benefit our lives!
Some of the most exciting applications of Computer Vision are around the potential applications to the healthcare industry. Already, there are several use cases where Computer Vision is improving healthcare, such as medical imaging, cancer detection, and medical training via simulation-based surgical platforms.
“Using Computer Vision, we can now identify tumours from images of patients’ lungs more accurately and faster than ever before” – Daniel Miskell, Machine Learning Engineer
In the near future, Computer Vision could lead to increased accuracy in disease detection, as well as increased automation of time-consuming tasks, leading to better access to healthcare for patients.
Tobi Lipede: We’re just scratching the surface of all the things we can do. A lot of companies haven’t necessarily explored all of the applications that there are for Computer Vision. When I think of use cases like the project we worked on with LUSH for customers to identify different products so that LUSH could eliminate packaging, I realise there are so many different applications. So many different companies that could benefit from Computer Vision, including any company with a physical store.
You can learn more about our work with LUSH here. This use case is an example of image classification. By training a Computer Vision model to recognise multiple classes of objects, the model can learn to detect each one and assign new unseen images to each class.
|Image Classification – classification of different groups of subjects in an image by assigning them to specific labels.|
Having heard a bit more about some of our team and their experiences with Computer Vision, let’s explore some of the use cases they have worked on, as well as some of the exciting upcoming developments in the field!
Powerful pre-trained Machine Learning (ML) models can be applied directly to Computer Vision tasks or altered slightly for classifications similar to what the pre-trained network was trained on. In the next section, we’ll look at several types of Computer Vision techniques and how they are being applied today across different use cases that businesses can capitalise on.
Techniques and Use Cases
Computer Vision can be broken down into several different model types, with unlimited innovative use cases. As we’ll see, we can also combine different types of models to satisfy a specific use case. One example is combining Object Detection with Optical Character Recognition (OCR) to create a model that first detects an object within an image or video, and then detects the text within the image file. Self-driving cars use this process to read road signs.
|Optical Character Recognition (OCR) – extraction of text from images. Commonly used for:|
+ Translating signs, menus, and handwritten text into a typed document
+ Document extraction
What use cases have you seen or worked on with Computer Vision? Do you have a favourite?
Alex Thomas: There’s a use case that’s taken off in the last year called text-to-image generation. You might’ve seen stuff on social networks about a model called DALL-E. You can type in a description like “man sitting in a chair in the style of Picasso” and it’ll generate an image in a photo-realistic style. It’s a cool technology and I think it’s going to make things like generating art, assets for video games, animated movies, and websites easy and quick.
You can try out DALL-E Mini, an open-sourced text-to-image program, to automatically generate any image you like. Here is the example of “man sitting in a chair in the style of Picasso”. We can see how well it understood the input by looking at the images it generated:
Similar technology has already been used in some of the more recent Pixar movies to automatically generate high-resolution frames, drastically reducing the time required by graphic designers and animators. Another example is Hello Games’ survival game, No Man’s Sky, which uses AI for world generation, allowing players to explore 1.8×1019 unique planets.
Tobi Lipede: My favourite practical applications would probably be anything related to robotics. For example, the Boston Dynamics robot, Spot. Just thinking about the AI and Computer Vision that goes into making that work is so cool. I’m also quite interested in action recognition applications. I think there’s a lot of different stuff that can be done with that. I’d like to work on my own hand gesture recognition project – like controlling my TV without a remote. Instead of classifying one image, it’s looking at a sequence of frames to work out what is happening. Other than that, I’d like to see fully autonomous self-driving cars; I don’t have my license yet so I’m hoping they get on it soon.
|Action Recognition – detection of actions being performed in an image or video. Commonly used for:|
+ Detecting actions in manufacturing procedures to verify they have been done correctly
+ Monitoring the safety of workers
+ Machine interaction (by detecting hand signals)
Combining physical robotics with Computer Vision now means that robots have an even wider range of potential use cases. The Boston Dynamics robot, Spot, that Tobi mentioned, is now being used by the New York Fire Department to help with search and rescue missions.
Christelle Xu: I think from a business perspective, the biggest use cases are going to come down to working with massive enterprises to leverage the data that they have and support their processes. For me, the coolest use case that exists right now is probably the supermarkets where you can walk in, grab what you want, and just walk out…
These supermarkets use a combination of Computer Vision, deep learning algorithms, and sensors. Once you have scanned a QR code on your phone using a mobile app upon entering, you can pick up any items you want and simply leave the shop. You are then emailed your receipt and charged online for the items you “bought”. This completely eliminates the need to queue or scan your items, saving huge amounts of time for consumers.
…I have one near me and it’s incredible. It’s these services where you can have eyes and ears everywhere and human work is simplified and automated out that Computer Vision can be used. That is the future. Right now, it seems like a weird thing but they are going to be everywhere soon. 100%.
As we continue to build upon our experience of Computer Vision, even more innovative use cases, such as the ones discussed in this blog, will become more widespread. The next blog in this series, coming out next week, will discuss some of the likely future developments of Computer Vision, as well as what Google Cloud Platform is doing to make them happen!
As 4x Google Cloud Partner of the Year, Datatonic has a wealth of experience in both Computer Vision and a range of Google products and services. Get in touch to learn how your business can benefit from Computer Vision or other Machine Learning models.
Know exactly where and how to start your AI journey with Datatonic’s
three-week AI Innovation Jumpstart *.
* Duration dependent on data complexity and use case chosen for POC model
With your own data sets, convince your business of the value of migrating your data warehouse, data lake and/or streaming platform to the cloud in four weeks.
With your own data, see how Looker can modernise your BI needs
with Datatonic’s two-week Showcase.