The missing link between data and code

Combine your code and data seamlessly and reliably

DDS is a lightweight Python package that allows you to treat your datasets as if they were pieces of code, for any size and complexity and with little overhead. It transparently registers, caches and recalculates datasets on demand, in single user or collaborative settings.

Build modern AI systems

Modern AI Systems blend computer code (model/algorithm) with data. The DDS paradigm guarantees that the data is always correct and that the code generates trustable results.

The DDS package handles out of the box a wide variety of common data representations (Apache Spark, pandas) and Machine Learning tools.

Screenshot

Collaborate fearlessly.

DDS makes sharing and updating datasets as easy as sharing and updating code.

Just like code, your data changes are isolated to your branch, without impact to other people's work.

Screenshot

Understand your data dependencies

Without running your code, see in advance how your changes will impact other datasets.

Screenshot

Lightweight and modular design

DDS is a lightweight package that only requires you to annotate your existing code. You can enjoy its benefits out of the box.

It also blends seamlessly with the modern data scientist's toolkit: Jupyter and other notebooks, MLFlow, Databricks, ...

Screenshot

It is a lifesaver.

A data scientist