localsearch Logo

Ensuring Data Quality with Google Cloud’s Dataplex

Dataplex

Client

localsearch

Tech stack

Google Cloud

Solution

Dataplex

Service

Data + Analytics

localsearch helps Swiss businesses succeed in the digital world with innovative solutions to establish and grow their business, products, and services. As localsearch scales, its Data Engineering team wanted to elevate their existing data quality testing. Datatonic collaborated with localsearch to build an enterprise-grade, production-ready data quality assurance solution using Google Cloud’s Dataplex.

Our impact

  • Increased efficiency by implementing a framework for monitoring and assessing data quality that can be managed, scaled, and deployed through automated solutions
  • Enabled the business to streamline and automate data quality tests across BigQuery datasets while ensuring amendments and customization options are easily achievable without extensive code changes
  • Upskilled + enabled the analytics team to take on ownership of the solution and scale it in-house

 

The challenge

The localsearch Data Engineering team prioritizes data quality and reliability. Their data quality assurance approach on Cloud Composer, while robust and scalable, required manual intervention and extra planning. The team sought a turnkey Data Quality solution that is scalable, reliable, cost-effective, and natively integrated with Google Cloud.

The team also wanted to develop an overall solution framework that could be managed, scaled, and deployed through management and automated solutions, such as IaC (Infrastructure as Code) and CI/CD, as well as having the ability to add custom data tests. This would enable testing and profiling to be executed smartly and more efficiently, reducing manual effort for its analytics team.

”At its core, data quality is just scheduling a series of tests and you can achieve this with any orchestration tool but we wanted more. We needed a robust framework, a repository where we could store the data quality rules for easy reuse and rapid deployment via Infrastructure as code. Since most of our tech stack is in Google Cloud, Dataplex was a natural option, but we also evaluated other solutions such as Great Expectations before making a decision.”

 

Our solution

Datatonic collaborated with localsearch to implement Dataplex’s AutoDQ & Data Profiling services with best practices in mind, to ensure high data quality and testing. The solution can be broken down into two main areas:

 

1. Rule Management

Governance

  • Enabled Data Quality (AutoDQ) tests and Data Profiling to be configured using simple formats, such as YAML, to ensure a high degree of customization to data quality rules
  • Ensured that rules are version-controlled and managed in a Git repository for higher reliability and explainability

Application

  • Unlocked customizable  rule configurations through a CI/CD process with a high degree of automation

 

2. Rule Execution

Run AutoDQ & Profile Scans

  • Enabled Data Quality (AutoDQ) tests to be scheduled, triggered, and executed automatically
  • Provided new capabilities for data profiling to be scheduled, triggered, and executed 

Discoverability & Observability

  • Stored Data Quality and Data Profiling test outputs in BigQuery tables to maintain clear records
  • Reduced manual overhead by setting up alerts for when Data Quality (AutoDQ) tests and Data Profiling tests have found issues

 

“Datatonic delivered what we requested and that wasn’t an easy feat because we are very specific with our needs. Working with Datatonic was a pleasure, they were able to bring the right people with the right experience on the project all while still maintaining a very personal connection.”