Optimising News Article Detection for Leading Marketing Agency, HOME




Tech stack

Google Cloud


Computer vision


AI + Machine Learning

HOME wanted to develop a Machine Learning solution to extract stories from news websites and use these insights to inform marketing strategies and optimise bid management. Datatonic worked with them to improve their Object Detection model and create an end-to-end, fully automated, scalable Google Cloud solution to drive their centralised search engine.

Our impact

  • Developed an end-to-end solution combining object detection and natural language processing to meet the needs of HOME
  • Improved model performance ~59% to ~75% mAP@.50
  • Reduced training time from several hours to <100 minutes


The challenge

HOME is a strategic marketing agency that grows brands by helping them gain and maintain attention. HOME does this by blending business consultancy know-how with marketing agency delivery.

As part of their ongoing efforts to bring state-of-the-art technology to their clients, HOME is developing a platform (called ‘Sensible’) which renders external data, such as media pages or real word images, from multiple sources into a centralised search engine.

The original prototype for this platform focuses on extracting news stories with title and image content from news websites via Computer Vision. The insights collected by this machine learning process will be consumed internally by the strategy teams and delivered directly to clients and the Doubleclick media platform to optimise bid management. HOME wanted to build out the current prototype Computer Vision model into an automated, performant and scalable machine learning pipeline on Google Cloud Platform which would allow them to move quickly towards a deployed MVP for Sensible.

HOME successfully built a draft TensorFlow Object Detection model, despite the absence of ML expertise in-house. This model was not production-ready, and all processes from data quality & augmentation to model training and serving were not defined. They needed to move fast from this R&D environment to an improved production-ready solution to successfully drive the first release of Sensible.


The solution

HOME wanted to leverage our expertise in developing AI pipelines in Google Cloud to quickly create a state-of-the-art Object Detection solution which would require minimum maintenance, while simultaneously allowing for iterative improvements and additions.

To help HOME achieve their goals, the Datatonic team developed an end-to-end, fully automated model training and serving pipeline in 3 weeks orchestrated respectively with Google Cloud Composer and Google Cloud Functions.

The training framework enables the HOME team to automatically re-train the model on newly predicted data hosted on Google Cloud Storage on a weekly basis.

The serving framework enables the HOME team to automatically label and serve new screenshots of newspapers’ main page as they land on Google Cloud Storage, gathering all relevant information for serving in a BigQuery table. Relevant information for each story includes detection bounding boxes, width, height, area, prominence score, prediction certainty value, origin website, image attached to article header, text, keywords, sentiment.

The last 5 elements are derived with Cloud APIs: Cloud Vision API is used to extract origin website, text and presence of image; Cloud Language API is used to extract keywords and sentiment. The key component of the solution is the fine-tuning of the TF Object Detection model: the performance boost from 59% to 75% mAP@.50 is mainly due to optimised training parameter selection and data augmentation procedure, both products of an extensive R&D, which also allowed to reduce training time from hours to <100 minutes.