Upgrading to v1.9
Resources
What to know before upgrading
dbt Labs is committed to providing backward compatibility for all versions 1.x. Any behavior changes will be accompanied by a behavior change flag to provide a migration window for existing projects. If you encounter an error upon upgrading, please let us know by opening an issue.
dbt Cloud is now versionless. If you have selected "Versionless" in dbt Cloud, you already have access to all the features, fixes, and other functionality that is included in dbt Core v1.9.
For users of dbt Core, since v1.8 we recommend explicitly installing both dbt-core
and dbt-<youradapter>
. This may become required for a future version of dbt. For example:
python3 -m pip install dbt-core dbt-snowflake
New and changed features and functionality
Features and functionality new in dbt v1.9.
Microbatch incremental_strategy
If you use a custom microbatch macro, set the require_batched_execution_for_custom_microbatch_strategy
behavior flag in your dbt_project.yml
to enable batched execution. If you don't have a custom microbatch macro, you don't need to set this flag as dbt will handle microbatching automatically for any model using the microbatch strategy.
Incremental models are, and have always been, a performance optimization — for datasets that are too large to be dropped and recreated from scratch every time you do a dbt run
. Learn more about incremental models.
Historically, managing incremental models involved several manual steps and responsibilities, including:
- Add a snippet of dbt code (in an
is_incremental()
block) that uses the already-existing table (this
) as a rough bookmark, so that only new data gets processed. - Pick one of the strategies for smushing old and new data together (
append
,delete+insert
, ormerge
). - If anything goes wrong, or your schema changes, you can always "full-refresh", by running the same simple query that rebuilds the whole table from scratch.
While this works for many use-cases, there’s a clear limitation with this approach: Some datasets are just too big to fit into one query.
Starting in Core 1.9, you can use the new microbatch strategy to optimize your largest datasets -- process your event data in discrete periods with their own SQL queries, rather than all at once. The benefits include:
- Simplified query design: Write your model query for a single batch of data. dbt will use your
event_time
,lookback
, andbatch_size
configurations to automatically generate the necessary filters for you, making the process more streamlined and reducing the need for you to manage these details. - Independent batch processing: dbt automatically breaks down the data to load into smaller batches based on the specified
batch_size
and processes each batch independently, improving efficiency and reducing the risk of query timeouts. If some of your batches fail, you can usedbt retry
to load only the failed batches. - Targeted reprocessing: To load a specific batch or batches, you can use the CLI arguments
--event-time-start
and--event-time-end
.
Currently microbatch is supported on these adapters with more to come:
- postgres
- snowflake
- bigquery
- spark
Snapshots improvements
Beginning in dbt Core 1.9, we've streamlined snapshot configuration and added a handful of new configurations to make dbt snapshots easier to configure, run, and customize. These improvements include:
- New snapshot specification: Snapshots can now be configured in a YAML file, which provides a cleaner and more consistent set up.
- New
snapshot_meta_column_names
config: Allows you to customize the names of meta fields (for example,dbt_valid_from
,dbt_valid_to
, etc.) that dbt automatically adds to snapshots. This increases flexibility to tailor metadata to your needs. target_schema
is now optional for snapshots: When omitted, snapshots will use the schema defined for the current environment.- Standard
schema
anddatabase
configs supported: Snapshots will now be consistent with other dbt resource types. You can specify where environment-aware snapshots should be stored. - Warning for incorrect
updated_at
data type: To ensure data integrity, you'll see a warning if theupdated_at
field specified in the snapshot configuration is not the proper data type or timestamp. - Set a custom current indicator for the value of
dbt_valid_to
: Use thedbt_valid_to_current
config to set a custom indicator for the value ofdbt_valid_to
in current snapshot records (like a future date). By default, this value isNULL
. When configured, dbt will use the specified value instead ofNULL
fordbt_valid_to
for current records in the snapshot table. - Use the
hard_deletes
configuration to track hard deletes by adding a new record when row become "deleted" in source. This config replaces theinvalidate_hard_deletes
to give you more control on how to handle deleted rows from the source. Supported fields areignore
,invalidate
, andnew_record
.
Read more about Snapshots meta fields.
state:modified
improvements
We’ve made improvements to state:modified
behaviors to help reduce the risk of false positives and negatives. Read more about the state:modified
behavior flag that unlocks this improvement:
- Added environment-aware enhancements for environments where the logic purposefully differs (for example, materializing as a table in
prod
but aview
in dev).
Managing changes to legacy behaviors
dbt Core v1.9 has a handful of new flags for managing changes to legacy behaviors. You may opt into recently introduced changes (disabled by default), or opt out of mature changes (enabled by default), by setting True
/ False
values, respectively, for flags
in dbt_project.yml
.
You can read more about each of these behavior changes in the following links:
- (Introduced, disabled by default)
state_modified_compare_more_unrendered_values
. Set toTrue
to start persistingunrendered_database
andunrendered_schema
configs during source parsing, and do comparison on unrendered values duringstate:modified
checks to reduce false positives due to environment-aware logic when selectingstate:modified
. - (Introduced, disabled by default)
skip_nodes_if_on_run_start_fails
project config flag. If the flag is set and anyon-run-start
hook fails, mark all selected nodes as skipped.on-run-start/end
hooks are always run, regardless of whether they passed or failed last time.
- (Introduced, disabled by default) [Redshift]
restrict_direct_pg_catalog_access
. If the flag is set the adapter will use the Redshift API (through the Python client) if available, or query Redshift'sinformation_schema
tables instead of usingpg_
tables. - (Introduced, disabled by default)
require_nested_cumulative_type_params
. If the flag is set toTrue
, users will receive an error instead of a warning if they're not proprly formatting cumulative metrics using the newcumulative_type_params
nesting. - (Introduced, disabled by default)
require_batched_execution_for_custom_microbatch_strategy
. Set toTrue
if you use a custom microbatch macro to enable batched execution. If you don't have a custom microbatch macro, you don't need to set this flag as dbt will handle microbatching automatically for any model using the microbatch strategy.
Adapter specific features and functionalities
Redshift
- Support IAM Role auth
Snowflake
- Iceberg Table Format support will be available on three out-of-the-box materializations: table, incremental, dynamic tables.
Bigquery
- Can cancel running queries on keyboard interrupt
- Auto-drop intermediate tables created by incremental models to save resources
Spark
- Support overriding the ODBC driver connection string which now enables you to provide custom connections
Quick hits
We also made some quality-of-life improvements in Core 1.9, enabling you to:
- Maintain data quality now that dbt returns an error (versioned models) or warning (unversioned models) when someone removes a contracted model by deleting, renaming, or disabling it.
- Document data tests.
- Use
ref
andsource
in foreign key constraints. - Use
dbt test
with the--resource-type
/--exclude-resource-type
flag, making it possible to include or exclude data tests (test
) or unit tests (unit_test
).