Link Search Menu Expand Document

Release notes for Soda Core

[soda-core] 3.0.14

01 December 2022

New features and improvements

  • Core: Date format fixes by @vijaykiran in #1691
  • Core: Variables everywhere by @m1n0 in #1700
  • Core: Update docker by @vijaykiran in #1699
  • Core: Refactor duplicate check into two queries by @m1n0 in #1698
  • Core: Remove row-count derivation from dataset discovery by @bastienboutonnet in #1706
  • Core: Support variables in configuration by @m1n0 in #1705
  • Core: Fix schema checks with table filter by @m1n0 in #1704
  • Core: Update CI to test support for python 3.10 @vijaykiran
  • SQL Server: Fix email regex, do not allow empty string by @m1n0 in #1688
  • Spark: Respect verbose setting when running a test query by @m1n0 in #1697
  • Spark: Remove unnecessary logging by @vijaykiran
  • Cloud: Updates to HTTP Sampler by @vijaykiran in #1702
  • Cloud: Ensure that profiling does not lowercase columns by @tituskx in #1687
  • Docs: Updates to list of compatible data sources. by @janet-can in #1694
  • New data source: Duckdb support (experimental) by @vijaykiran in #1709
  • New data source: Denodo support (experimental) by @vijaykiran in #1710

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.13

15 November 2022

New features and improvements

  • Core: Support Date type in freshness checks by @m1n0 in #1667
  • Core: Log current time to logs by @m1n0 in #1676
  • Core: Fixes to sampler, add logging by @vijaykiran in #1690
  • Core: Add file/check location to scan summary by @m1n0 in #1675
  • Core: Regex: support ‘+’ in email format by @m1n0 in #1677
  • Core: Generate passing query for built-in checks by @m1n0 in #1668
  • Core: Test statistical functions on all data sources by @m1n0 in #1678
  • Core: Add samples limit to queries by @m1n0 in #1685
  • Cloud: Add message configuration option to sampler by @vijaykiran in #1686
  • Oracle Oracle DB Support by @vijaykiran in #1682
  • Scientific: feat: support sampling for distribution checks by @baturayo in #1666

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.12

03 November 2022

New features and improvements

  • Core: Duplicate percent check by @m1n0 in #1649
  • Core: Change over time - remove ‘same day last month’ by @m1n0 in #1648
  • Core: Failed rows exclude columns by @m1n0 in #1657
  • Core: Introduce http sampler by @vijaykiran in #1665
  • Core: Modify Test Column Names by @tdstark in #1652
  • Cloud: Do not send null file ref, when failed rows are disabled by @vijaykiran in #1650
  • Scientific feat: Allow use of in-check filters for distribution checks by @tituskx in #1655
  • Trino: Update trino_data_source.py by @ScottAtDisney in #1658
  • MS SQL Server: Change count to big_count by @vijaykiran in #1660

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.11

19 October 2022

New features

  • Cloud: Change over time - add same day/month support by @m1n0 in #1645
  • Core: Verify data source connection command by @m1n0 in #1636

Enhancements and bug fixes

  • Core: Parse cli variables correctly, fix cli tests to actually assert result. by @m1n0 in #1634
  • Core: variable substitution in schema check query by @ceyhunkerti in #1628
  • Redshift: use SVV_COLUMNS to get table metadata by @m1n0 in #1635
  • Scientific: fix: limit the bin size and handle zero division for continious DRO by @baturayo in #1624
  • Scientific: fix: handle DRO generation for columns with 0 rows by @baturayo in #1627
  • Scientific: chore: pin prophet to >=1.1 by @bastienboutonnet in #1629
  • Scientific: refactor: add bins and weights doc link to DRO exception handling logs by @baturayo in #1633
  • Scientific: (anomaly_check): only send outcomeReasons with severity “warn” or “error” by @tituskx in #1640
  • Snowflake: use upper case in table metadata query by @m1n0 in #1639
  • Trino: fix py310 type hints by @m1n0 in #1641
  • BiQuery: fixing bq separate compute storage project by @thiagodeschamps in #1638
  • BiQuery: fix distribution check by @m1n0 in #1647

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.10

05 October 2022

New features

  • Dremio: First version of Dremio support by @vijaykiran in #1618
  • Core: Sample size is configurable for all failed row checks by @m1n0 in #1608

Enhancements and bug fixes

  • Core: Skip change over time checks when historical measurements not available by @m1n0 in #1615
  • Core: Include psycopg2 requirement for redshift by @m1n0 in #1620
  • Core: Use correct dicts when building scan result by @m1n0 in #1612
  • Cloud/dbt: Add Check source field for cloud by @m1n0 in #1614
  • Scientific feat: check historical metrics are not None or log helpful message by @bastienboutonnet in #1600
  • Scientific fix: handle very large bin sizes by filtering out outliers for dro generation by @baturayo in #1616
  • Scientific fix: ensure PSI and SWD can deal with decimal.Decimal type by @tituskx in #1611

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.9

28 September 2022

Enhancements and bug fixes

  • Limit failed rows sample limit to 1000 by @m1n0 in #1599
  • Add scan result getter by @m1n0 in #1602
  • BigQuery separate project for compute and storage. by @m1n0 in #1598
  • Scan results file argument by @vijaykiran in #1603
  • Chore/move snowflake account by @jmarien in #1607
  • Use filename in check identity by @m1n0 in #1606

Refer to the Soda Core Release Notes for details.

Troubleshoot

Problem: When you run a scan using Soda Core 3.0.9, you get an error message that reads, from google.protobuf.pyext import _message ImportError: dlopen(.../site-packages/google/protobuf/pyext/_message.cpython-310-darwin.so, 0x0002): symbol not found in flat namespace

Solution: This is the result of a transitive dependency from open telemetry that gathers OSS usage statistics. To resolve:

  1. From the command-line, in the directory in which you installed your soda-core package, run pip uninistall protobuf.
  2. Reinstall protobuf with the command pip install protobuf==3.19.4.

[soda-core] 3.0.8

22 September 2022

  • Soda Core: Add variable resolution to queries/thresholds @vijaykiran in #1597
  • Soda Core: Scan results dict API method by @m1n0 in #1595
  • Soda Core: Minor edits to CLI help messages. by @janet-can in #1590
  • Soda Cloud: Fix change-over-time checks with percentage with no extra config by @m1n0 in #1592
  • Soda Cloud: Prevent empty message in outcomeReasons by @bastienboutonnet in #1596
  • Soda Scientific: Raise more user-friendly log messages when importing sci library fails by @bastienboutonnet in #1584
  • dbt: Fix sending correct table name to Soda Cloud @vijaykiran in #1587
  • BigQuery: Add context authentication and impersonation for BigQuery by @tooobsias in #1588
  • SQLServer: Basic Sqlserver regex support by @m1n0 in #1586
  • MySQL/MariaDB: Fix mysql/mariadb compatibility for regex by @vijaykiran in #1591

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.7

13 September 2022

  • Core: Update freshness value to be milliseconds and add measure by @vijaykiran in #1575
  • Core: Resolve variables in user defined queries by @vijaykiran in #1577
  • dbt: Add configurable API URL for dbt cloud by @vijaykiran in #1576
  • dbt: Add dbt: prefix to dbt check results in Soda Cloud by @vijaykiran in #1574
  • dbt: Fix dbt cloud ingest, improve logging. by @m1n0 in #1578
  • dbt: Fix dbt checks not being sent properly to Soda Cloud by @vijaykiran in #1580
  • MySQL: Fixed port option a @ScottAtDisney in #1579
  • MySQL: Fix regex tests for mysql by @vijaykiran in #1583

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.6

07 September 2022

  • Fixed: add identityB to add datasource name in identity by @vijaykiran in #1556
  • Databricks SQL support by @vijaykiran in #1559
  • Added application flag to snowflake connect by @tombaeyens in #1561
  • Added identites by @vijaykiran in #1569
  • Added support for custom sampler by @vijaykiran in #1570
  • Handle numerical column/table names by @m1n0 in #1572
  • dbt ingestion support by @m1n0 in #1552

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.5

24 August 2022

New features

  • Support for Trino data source by @ScottAtDisney in #1553

Enhancements and bug fixes

  • Fix ‘missing format’ in numeric metrics by @m1n0 in #1549
  • Fix duplicate query by @m1n0 in #1543
  • Refactor: turn no matching table error into a warning to avoid scan failing when all tables are excluded by @bastienboutonnet in #1533
  • Add comments explaining cloud payload by @m1n0 in #1545
  • Add data source contributing docs by @m1n0 in #1546
  • Feature, profiling: add support for extra numeric and text datatypes by @bastienboutonnet in #1534
  • Change spark installation to decouple dependencies for Hive and ODBC by @vijaykiran in #1554 Read more about installing the dependencies separately, as needed.

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.4

10 August 2022

  • Testing switch to 22.04 for GA by @jmarien in #1521
  • Log and trace Soda Cloud trace IDs by @m1n0 in #1520
  • Update docker image for sqlserver support by @vijaykiran in #1522
  • Add option to set scan datatime by @vijaykiran in #1531
  • Add MySQL Support by @vijaykiran in #1526

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.3

27 July 2022

New Features

  • MS SQLServer support by @vijaykiran in #1515
  • IBM DB2 support

Bug Fixes

  • Fix: better logging messages for profiling and discover datasets by @baturayo in #1498
  • Fix config file creation when first path is not writable by @m1n0 in #1504
  • fix: Failed rows don’t consider filter by @vijaykiran in #1505
  • Fix log message by @m1n0 in #1507
  • Fix reference check for null values in source column by @m1n0 in #1509
  • Attach sample rows to reference check by @m1n0 in #1508
  • Make sure results to sodacloud are sent when there is an exception by @vijaykiran in #1510
  • Fix for regex on collated columns in Snowflake by @ScottAtDisney in #1516

Enhancements

  • Check name refactor by @m1n0 in #1502
  • Set basic telemetry scan data even in case of exceptions by @m1n0 in #1512
  • Improve athena text fixture auth setup by @m1n0 in #1501
  • Publish data source packages for python 3.7 by @m1n0 in #1514
  • Inform about wrong check indentation in logs by @m1n0 in #1517
  • Feat: skip row count query during column profiling by @bastienboutonnet in #1518
  • Feat: support ‘text’ data type in column profiling by @bastienboutonnet in #1519

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.2

18 July 2022

Enhancements and New Features

  • IBM db2 support
  • Support cli –version to output core version
  • Warn users when quotes are present in include excludes identifiers
  • Add samples limit to failed rows checks
  • BQ expose remaining client params and auth methods
  • Enable Snowflake Tokens
  • Treat zero missing or invalid rows as zero percent

Bug Fixes

  • Make name optional for failed rows
  • Use exception rather than exc_info to render traceback in soda-core logger’s call of prophet model
  • Stored row count in cloud is wrong
  • Handle exceptions from scientific library and log them instead or letting them raise
  • Spark DF: update example api usage
  • Change default scan definition name
  • BQ: remove schema, use dataset only
  • Use default distribution comparison method when user has not provided one
  • Fix utc timezone handling
  • Improve profiling test for all tables and all columns
  • Fix utc timezone handling
  • Set redshift host before trying to fetch credentials
  • Change unassigned min and max variables for profiling logs
  • Use check name in Metric checks
  • If anomaly detection fails other check results are not sent to cloud
  • Prevent empty table list from running all tables
  • Profile column parsing fails when user provides illegal column spec
  • Join check text with newlines instead of /n

Infra/CI

  • Async Docker image building through Actions and dispatch

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.1

29 June 2022

  • Re-introduce Spark for the Docker image by @jmarien in #1458
  • Build: require strict prophet v1.0.0 in scientific library by @bastienboutonnet in #1459
  • Comment for pinned prophet version by @m1n0 in #1460
  • Fix the e parameter by @jmarien in #1461

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.0

28 June 2022

This is the general availability release for Soda Core with Soda CL.

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.0rc3 Beta

27 June 2022

  • Doc: add comment about ordinal_position ordering by @bastienboutonnet in #1428
  • Refactor: use filesystem abstractions in distribution check by @baturayo in #1423
  • Fix: distribution check athena compatibility by @bastienboutonnet in #1429
  • Feat: profile and discover view tables by @baturayo in #1416
  • Code style section in contrib docs by @m1n0 in #1432
  • Unify data source api, remove redundant code. by @m1n0 in #1433
  • Fix: support athena in column profiling by @bastienboutonnet in #1430
  • Column profiling metadata fix by @tombaeyens in #1431
  • Feat: Support profile columns inclusion/exclusion behaviour for Spark by @baturayo in #1437
  • CORE-63 Added relative percentage change over time by @tombaeyens in #1435
  • Feat: Raise a MissingBinsAndWeights exception if soda scan runs without distribution_reference present by @tituskx in #1421
  • Flatten data source configuration schema by @m1n0 in #1441
  • Fix: Suppress prophet’s pandas: frame.append deprecation warning by @tituskx in #1440
  • Feat: send outcome reason to cloud for anomaly detection and schema checks by @baturayo in #1390
  • Add private key and other extra params to snowflake by @m1n0 in #1446
  • Feat: refer to DROs by name by @tituskx in #1422
  • Change: rename the update command to update-dro as it better describes what the command is used for by @tituskx in #1444
  • Feat/fix: ensure empty bins for integer columns are not created and fix bin width derivation by @baturayo in #1447
  • Do not quote table names in for-each block by @m1n0 in #1449
  • Feat: add env based option to run tests on views by @vijaykiran in #1442

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.0rc2 Beta

22 June 2022

  • feat: add wasserstein distance and PSI methods to distribution checks by @tituskx in #1395
  • CORE-24 New freshness syntax by @tombaeyens in #1400
  • Verify that in spark-df arrays & structs don’t break anything by @tombaeyens in #1397
  • feat: add column exclusion to profile columns by @bastienboutonnet in #1396
  • feat: log no threshold error during parsing and provide more informative error during check summary by @tituskx in #1401
  • CORE-44 Fixed some extra timestamps to utc by @tombaeyens in #1405
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #1407
  • SODA-23 table dataset rename by @tombaeyens in #1404
  • feat: send distribution check results to cloud so that they can be plotted by @tituskx in #1402
  • Update README to include support for Amazon Athena by @stuart-robinson in #1409
  • refactor: Refactor scan.py to remove code duplicates by @baturayo in #1391
  • Update CONTRIBUTING to stipulate that users fork the repo. by @janet-can in #1413
  • Core 70 clean test schemas by @tombaeyens in #1415
  • fix: hotfix for historic measurements having none values by @baturayo in #1418
  • CORE-26 Fix change over time results value parsing by @vijaykiran in #1419
  • CORE-57 improved exception handling when creating data source by @tombaeyens in #1411
  • Another approach for the Docker image for Soda Core by @jmarien in #1398
  • Added 5 random chars to CI schema names by @tombaeyens in #1424
  • Fix drop table statement in test suite by @m1n0 in #1425
  • SODA-44 Added Z to timestamps in soda cloud json by @tombaeyens in #1408
  • Added docs on running tests by @tombaeyens in #1426
  • Fix schema check title by @vijaykiran in #1427
  • fix: more useful profiling warnings by @bastienboutonnet in #1420
  • CORE-37 Fixed schema type comparison for BigQuery by @tombaeyens in #1410

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.0rc1 Beta

08 June 2022

  • 1175 spark by @m1n0 in #1382

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.0b19 Beta

02 June 2022

  • fix: handle %.% in profile columns properly and other bugs by @bastienboutonnet in #1377
  • Fix: cope with cloud disabled samples. by @m1n0 in #1393
  • BQ: regex switch to ‘r’ instead of backslash escaping by @m1n0 in #1394

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.0b18 Beta

01 June 2022

  • Scientific package tests on Athena. by @m1n0 in #1374
  • Update OT with scan/check counts by @vijaykiran in #1386
  • feat: add ability to send dataset samples to soda cloud (SODA-284) by @baturayo in #1372
  • fix: typo in data source package import by @bastienboutonnet in #1387
  • 627 Added default sampler returning a sample that is is not persistent by @tombaeyens in #1385
  • feat: cap distribution check to 1M rows by default by @tituskx in #1379
  • refactor: clean up logging for anomaly detection by @bastienboutonnet in #1389
  • fix: avoid parsing DRO name in distribution check until fully implemented by @bastienboutonnet in #1388
  • Downgrade markupsafe dependency by @m1n0 in #1392

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.0b17 Beta

26 May 2022

  • Pin versions in core by @vijaykiran in #1383

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.0b16 Beta

26 May 2022

  • Fixing suffix scanning of configuration and check files by @tombaeyens in #1365
  • Refactored to actual table and actual column names by @tombaeyens in #1370
  • Send Soda Cloud logs by @tombaeyens in #1380
  • Prevent upload when no sample rows are present by @tombaeyens in #1378
  • refactor: inform when columns are skipped in profiling via logs by @bastienboutonnet in #1375

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.0b15 Beta

23 May 2022

  • refactor: remove darts dependency by @bastienboutonnet in #1362
  • refactor: remove code duplication in sodacl_parser by @baturayo in #1361
  • SODA-248 fixed change over time checks by @tombaeyens in #1366
  • Athena support by @m1n0 in #1367
  • Added for each schema check by @tombaeyens in #1368

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.0b14 Beta

19 May 2022

  • Add defaultDataSource to cloud payload by @vijaykiran in #1359
  • fix: provide docker image with soda-scientific packaged by @bastienboutonnet in #1355
  • Fixing data source validity error message by @tombaeyens in #1357
  • Added cython to the setup.py file by @tituskx in #1360
  • #1353 SODA-494 Fixing recursive loading of files by @tombaeyens in #1356

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.0b13 Beta

18 May 2022

  • 1237 samples2 by @tombaeyens in #1340
  • Getting disable samples from cloud config by @tombaeyens in #1348
  • Test anomaly detection for numeric metrics by @baturayo in #1349
  • Updated contributing, fixed logs and added hint in comment how to add… by @tombaeyens in #1350
  • Fix: Automated monitoring revert issues SODA-489 by @baturayo in #1351
  • Soda 159 - Test anomaly detection for nested metrics by @baturayo in #1354

Refer to the Soda Core Release Notes for details.


[soda-core] 3.0.0b12 Beta

16 May 2022

  • Fix date eu/us formats. by @m1n0 in #1334
  • 1237 samples by @tombaeyens in #1328
  • feat: column profiling by @bastienboutonnet in #1322
  • Switch to latest prophet by @vijaykiran in #1335
  • throw log error and return empty string if histogram assumption broken by @bastienboutonnet in #1337
  • Freshness send microseconds to cloud. by @m1n0 in #1338
  • Cloud: timestamps use seconds resolution by @m1n0 in #1339
  • Deleted docs folder by @tombaeyens in #1343
  • feat: implement automated monitoring executor/runner by @baturayo in #1323
  • feat: add table discovery by @bastienboutonnet in #1341
  • fix(profiling): allow null results in text column aggregates by @bastienboutonnet in #1344
  • Fix: update check identity in case of automated monitoring by @baturayo in #1346

Refer to the Soda Core Release Notes for details.


[soda-core] 0.0.1 Beta

22 March 2022

This release marks the launch, or first beta release, of Soda Core and Soda Checks Language.

Reference the Soda Core OSS and SodaCL documentation for information on how to use the new CLI tool and domain-specific language for reliability.


Last modified on 07-Dec-22