climate_ref_core.dataset_registry
#
Data registries for non-published reference data
These data are placeholders until these data have been added to obs4MIPs. The CMIP7 Assessment Fas Track REF requires that reference datasets are openly licensed before it is included in any published data catalogs.
DatasetRegistryManager
#
A collection of reference datasets registries
The REF requires additional reference datasets in addition to obs4MIPs data which can be downloaded via ESGF. Each provider may have different sets of reference data that are needed. These are provider-specific datasets are datasets not yet available in obs4MIPs, or are post-processed from obs4MIPs.
A dataset registry consists of a file that contains a list of files and checksums, in combination with a base URL that is used to fetch the files. Pooch is used within the DataRegistry to manage the caching, downloading and validation of the files.
All datasets that are registered here are expected to be openly licensed and freely available.
Source code in packages/climate-ref-core/src/climate_ref_core/dataset_registry.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 | |
__getitem__(item)
#
keys()
#
register(name, base_url, package, resource, cache_name=None, version=None)
#
Register a new dataset registry
This will create a new Pooch registry and add it to the list of registries. This is typically used by a provider to register a new collections of datasets at runtime.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the registry This is used to identify the registry |
required |
base_url
|
str
|
Commmon URL prefix for the files |
required |
package
|
str
|
Name of the package containing the registry resource. |
required |
resource
|
str
|
Name of the resource in the package that contains a list of files and checksums. This must be formatted in a way that is expected by pooch. |
required |
version
|
str | None
|
The version of the data. Changing the version will invalidate the cache and force a re-download of the data. |
None
|
cache_name
|
str | None
|
Name to use to generate the cache directory. This defaults to the value of |
None
|
Source code in packages/climate-ref-core/src/climate_ref_core/dataset_registry.py
fetch_all_files(registry, name, output_dir, symlink=False, verify=True)
#
Fetch all files associated with a pooch registry and write them to an output directory.
Pooch fetches, caches and validates the downloaded files. Subsequent calls to this function will not refetch any previously downloaded files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
registry
|
Pooch
|
Pooch directory containing a set of files that should be fetched. |
required |
name
|
str
|
Name of the registry. |
required |
output_dir
|
Path | None
|
The root directory to write the files to. The directory will be created if it doesn't exist, and matching files will be overwritten. If no directory is provided, the files will be fetched from the remote server, but not copied anywhere. |
required |
symlink
|
bool
|
If True, symlink all files to this directory. Otherwise, perform a copy. |
False
|
verify
|
bool
|
If True, verify the checksums of the local files against the registry. |
True
|