Building a Future-Proof Data Lake for LKQ

Summary

LKQ wanted to build a unified platform for their data to establish a strong data management process capable of ingesting data across over 20 European markets. Datatonic worked with LKQ to deliver one future-proof solution that automatically transforms data ready for human analysis and machine learning use cases such a stock optimisation, pricing, and customer segmentation.

impact
  • One future-proof solution across 22 countries
  • Quick insight from data - able to query data with ease in seconds
  • 99% time reduction in ingestion, transform and loading of data
Challenge

LKQ is a provider of alternative and specialty parts to repair and accessorize cars and other vehicles. With over 45,000 employees globally and $11b+ in revenue, it is the largest used car parts provider in Europe. Datatonic’s cutting-edge skills in data engineering and machine learning on GCP met LKQ’s short-, mid-, and long-term analytics needs:

  1. A reputation for turning around complex engineering projects in weeks
  2. A long-term partner who can bring cutting-edge skills in Machine Learning & Data Science to the business
  3. A future proof technology stack (GCP)
  4. Full knowledge transfer and code sharing

Our Solution

Datatonic delivered an automated, single view solution for LKQ’s data management process that supports fast analytics and in production machine learning models. Our solution has the ability to scale up and add new markets when required, incorporating automatic deployment of infrastructure.

The ingestion and transformation of data was reduced from 48hrs to just 0.5hr (thanks to Google’s Dataflow & Composer), and the data being queried was re-organised into new schemas to allow easy analysis (taking advantage of Google BigQuery nested capabilities).

Technical information

The solution in place scans a Google “bucket” for any data that has dropped there (be it scheduled from an ERP system or simply manually uploaded by an analyst). Upon finding data, it will match it to the transformation pipeline it is suited too. A transformation pipeline moves data from A to B while transforming it based on the business requirements. For example, sales data will be transformed using the “sales pipeline”. This transformation is entirely automated, with built in validation techniques. It is also serverless, so the compute power will scale up and down depending on the requirement.

The output will land in a number of places, one being the data warehouse, in which analysts can quickly query the data using Google BigQuery. Automation takes care of appending the data to new tables, updating it or adding entirely new tables. From here, analysts can query the data directly, or plug in one of their favorite BI tools to visualise the data. 

If a new market wants to use the data lake, automatic deployment of all GCP resources can be run from a single script, allowing them to have their own environment within one hour. The new market can then take advantage of existing data pipelines (ingest, transform and load), as well as add their own, and harness the data.

The solution took two months to get up and running.