Building a Future-Proof Data Lake for LKQ

LKQ Lake



Tech stack

Google Cloud


Data lake


Data + Analytics

LKQ wanted to build a unified platform for their data to establish a strong data management process capable of ingesting data across over 20 European markets. Datatonic worked with LKQ to deliver one future-proof solution that automatically transforms data ready for human analysis and machine learning use cases such as stock optimisation, pricing, and customer segmentation.

Our impact

  • Developed one future-proof solution across 22 countries
  • Enabled quick insights from data: able to query data with ease in seconds
  • Led to a 99% time reduction in ingestion, transformation and loading of data


The challenge

LKQ is a provider of alternative and specialty parts to repair and accessorise cars and other vehicles. With over 45,000 employees globally and $11b+ in revenue, it is the largest used car parts provider in Europe. Datatonic’s cutting-edge skills in data engineering and machine learning on Google Cloud met LKQ’s short-, mid-, and long-term analytics needs:

  1. A reputation for turning around complex engineering projects in weeks
  2. A long-term partner who can bring cutting-edge skills in Machine Learning & Data Science to the business
  3. A future proof technology stack (Google Cloud)
  4. Full knowledge transfer and code sharing


The solution

Datatonic delivered an automated, single view solution for LKQ’s data management process that supports fast analytics and in-production machine learning models. Our solution has the ability to scale up and add new markets when required, incorporating automatic deployment of infrastructure. The ingestion and transformation of data was reduced from 48hrs to just half an hour (with Google Cloud’s Dataflow and Composer), and the data being queried was re-organised into new schemas to allow easy analysis (taking advantage of Google BigQuery nested capabilities).

The solution in place scans a Google “bucket” for any data that has dropped there (be it scheduled from an ERP system or simply manually uploaded by an analyst). Upon finding data, it will match it to the transformation pipeline it is suited to. A transformation pipeline moves data from A to B while transforming it based on the business requirements. For example, sales data will be transformed using the “sales pipeline”.

This transformation is entirely automated, with built-in validation techniques. It is also serverless, so the compute power will scale up and down depending on the requirement. The output will land in a number of places, one being the data warehouse, in which analysts can quickly query the data using Google BigQuery. Automation takes care of appending the data to new tables, updating it or adding entirely new tables.

From here, analysts can query the data directly, or plug in one of their favourite BI tools to visualise the data. If a new market wants to use the data lake, automatic deployment of all Google Cloud resources can be run from a single script, allowing them to have their own environment within one hour. The new market can then take advantage of existing data pipelines (ingest, transform and load), as well as add their own, and harness the data. The solution took two months to get up and running.