climate_ref.models.dataset
#
CMIP6Dataset
#
Bases: Dataset
Represents a CMIP6 dataset
Fields that are not in the DRS are marked optional.
Source code in packages/climate-ref/src/climate_ref/models/dataset.py
instance_id = mapped_column(index=True)
class-attribute
instance-attribute
#
Unique identifier for the dataset (including the version).
Dataset
#
Bases: Base
Represents a dataset
A dataset is a collection of data files, that is used as an input to the benchmarking process. Adding/removing or updating a dataset will trigger a new diagnostic calculation.
A polymorphic association is used to capture the different types of datasets as each dataset type may have different metadata fields. This enables the use of a single table to store all datasets, but still allows for querying specific metadata fields for each dataset type.
Source code in packages/climate-ref/src/climate_ref/models/dataset.py
created_at = mapped_column(server_default=func.now())
class-attribute
instance-attribute
#
When the dataset was added to the database
dataset_type = mapped_column(nullable=False, index=True)
class-attribute
instance-attribute
#
Type of dataset
finalised = mapped_column(default=True, nullable=False)
class-attribute
instance-attribute
#
Whether the complete set of metadata for the dataset has been finalised.
For CMIP6, ingestion may initially create unfinalised datasets (False) until all metadata is extracted. For other dataset types (e.g., obs4MIPs, PMP climatology), this should be True upon creation.
slug = mapped_column(unique=True)
class-attribute
instance-attribute
#
Globally unique identifier for the dataset.
In the case of CMIP6 datasets, this is the instance_id.
updated_at = mapped_column(server_default=func.now(), onupdate=func.now())
class-attribute
instance-attribute
#
When the dataset was updated.
Updating a dataset will trigger a new diagnostic calculation.
DatasetFile
#
Bases: Base
Capture the metadata for a file in a dataset
A dataset may have multiple files, but is represented as a single dataset in the database. A lot of the metadata will be duplicated for each file in the dataset, but this will be more efficient for querying, filtering and building a data catalog.
Source code in packages/climate-ref/src/climate_ref/models/dataset.py
dataset_id = mapped_column(ForeignKey('dataset.id', ondelete='CASCADE'), nullable=False, index=True)
class-attribute
instance-attribute
#
Foreign key to the dataset table
end_time = mapped_column(nullable=True)
class-attribute
instance-attribute
#
Start time of a given file
path = mapped_column()
class-attribute
instance-attribute
#
Prefix that describes where the dataset is stored relative to the data directory
start_time = mapped_column(nullable=True)
class-attribute
instance-attribute
#
Start time of a given file
Obs4MIPsDataset
#
Bases: Dataset
Represents a obs4mips dataset
TODO: Should the metadata fields be part of the file or dataset?
Source code in packages/climate-ref/src/climate_ref/models/dataset.py
instance_id = mapped_column()
class-attribute
instance-attribute
#
Unique identifier for the dataset.
PMPClimatologyDataset
#
Bases: Dataset
Represents a climatology dataset from PMP
These data are similar to obs4MIPs datasets, but are post-processed
Source code in packages/climate-ref/src/climate_ref/models/dataset.py
instance_id = mapped_column()
class-attribute
instance-attribute
#
Unique identifier for the dataset.