news

Release Notes for Data Orchestration Platform, DOP

DOP v0.3.0

19th August 2021

Features

  • Support for “generic” Airflow operators: you can now use regular python
    operators as part of your config files.
  • Support for “dbt docs” command to generate documentation for all dbt
    tasks
    : Users can now add “docs generate” as a target in their DOP
    configuration and additionally specify a GCS bucket with the --bucket
    and --bucket-path options where documents are copied to.
  • Serve dbt docs: Documents generated by dbt can be served as a web page by
    deploying the provided app on GAE. Note that deploying is an additional step
    that needs to be carried out after docs have been generated. See
    infrastructure/dbt-docs/README.md for details.
  • dbt tasks artifacts run_results created by dbt tasks saved to BigQuery:
    This json file contains information on completed dbt invocations and is saved
    in the BQ table “run_results” for analysis and debugging.
  • Add support for Airflow v1.10.14 and v1.10.15 local environments:
    Users can specify which version they want to use by setting
    the AIRFLOW_VERSION environment variable.
  • Pre-commit linters: added pre-commit hooks to ensure python, yaml and some
    support for plain text file consistency in formatting and style throughout DOP
    codebase.

Changes

  • Ensure DAGs using the same dbt project do not run concurrently: Safety
    feature to safely allow selective execution of workflows by calling specific
    commands or tags (e.g. dbt run --m) within a single dbt project. This avoids
    creating inter-dependant workflows to avoid overriding each other’s artifacts,
    since they will share the same target location (within the dbt container).
  • Test time-partitioning: Time-partitioning of datetime type properly
    validated as part of schema validation.
  • Use Python 3.7 and dbt 0.19.1 in Composer K8s Operator
  • Add Dataflow example task: with the introduction of “regular” in the yaml
    config Airflow Operators, it is now possible to run compute intensive Dataflow
    jobs. Check example_dataflow_template for an example on how to implement a
    Dataflow pipeline.

For more information on DOP, have a look at our announcement: Data Orchestration Platform, DOP.

You can find the DOP project on Team Datatonic’s Github to apply to your business: github.com/teamdatatonic/dop

Up next
Case Studies
View now