Configuration#
The REF uses a tiered configuration model, where options can be sourced from different places.
Then configuration is loaded from a .toml file which overrides any default values.
However, some configuration variables can be overridden at runtime using environment variables,
which always take precedence over any other configuration values set by default or found in a .toml file.
The default values for these environment variables are generally suitable,
but if you require updating these values we recommend the use of a .env file
to make the changes easier to reproduce in future.
Configuration File Discovery#
The REF will look for a configuration file in the following locations, taking the first one it finds:
${REF_CONFIGURATION}/ref.toml~/.config/climate_ref/ref.toml(Linux)$XDG_CONFIG_HOME/climate_ref/ref.toml(Linux)~/Library/Application Support/climate_ref/ref.toml(macOS)%USERPROFILE%\AppData\Local\climate_ref\ref.toml(Windows)
If no configuration file is found, the REF will use the default configuration.
This directory may contain significant amounts of data,
so for HPC systems it is recommended to set the REF_CONFIGURATION environment variable to a directory on a scratch filesystem.
This default configuration is equivalent to the following:
log_level = "INFO"
[paths]
log = "${REF_CONFIGURATION}/log"
scratch = "${REF_CONFIGURATION}/scratch"
software = "${REF_CONFIGURATION}/software"
results = "${REF_CONFIGURATION}/results"
dimensions_cv = "${REF_INSTALLATION_DIR}/packages/climate-ref-core/src/climate_ref_core/pycmec/cv_cmip7_aft.yaml"
[db]
database_url = "sqlite:///${REF_CONFIGURATION}/db/climate_ref.db"
run_migrations = true
[executor]
executor = "climate_ref.executor.LocalExecutor"
[executor.config]
[[diagnostic_providers]]
provider = "climate_ref_esmvaltool:provider"
[diagnostic_providers.config]
[[diagnostic_providers]]
provider = "climate_ref_ilamb:provider"
[diagnostic_providers.config]
[[diagnostic_providers]]
provider = "climate_ref_pmp:provider"
[diagnostic_providers.config]
Additional Environment Variables#
Environment variables are used to control some aspects of the framework outside of the configuration file.
REF_DATASET_CACHE_DIR#
Path where any datasets that are fetched via the ref datasets fetch-data command are stored.
This directory will be several GB in size,
so it is recommended to set this to a directory on a scratch filesystem
rather than a directory on your home filesystem.
This is used to cache the datasets so that they are not downloaded multiple times.
It is not recommended to ingest datasets from this directory (see --output-dir argument for ref datasets fetch-data).
This defaults to the following locations:
* ~/Library/Caches/climate_ref (MacOS)
* ~/.cache/climate_ref or the value of the $XDG_CACHE_HOME/climate_ref
environment variable, if defined. (Linux)
* %USERPROFILE%\AppData\Local\climate_ref\Cache (Windows)
REF_TEST_DATA_DIR#
Override the location of the test data directory. If this is not set, the test data directory will be inferred from the location of the test suite.
If this is set, then the sample data won't be updated.
REF_TEST_OUTPUT#
Path where the test output is stored. This is used to store the output of the tests that are run in the test suite for later inspection.
Configuration Options#
Top-level#
Configuration that is used by the REF
cmip6_parser#
Parser to use for CMIP6 datasets
This can be either drs or complete.
drs: Use the DRS parser, which parses the dataset based on the DRS naming conventions.complete: Use the complete parser, which parses the dataset based on all available metadata.
Default: 'complete'
Type: Literal
Environment Variable: 'REF_CMIP6_PARSER'
ignore_datasets_file#
Path to the file containing the ignore datasets
This file is a YAML file that contains a list of facets to ignore per diagnostic.
The format is:
If this is not specified, a default ignore datasets file will be used. The default file is downloaded from the Climate-REF GitHub repository if it does not exist or is older than 6 hours.
Default: PosixPath('/home/docs/.cache/climate_ref/default_ignore_datasets.yaml')
Type: Path
log_format#
Format of the log messages that are displayed by the REF via the CLI
Examples of the formatting options are available in the loguru documentation.
Default: '
Type: str
Environment Variable: 'REF_LOG_FORMAT'
log_level#
Log level of messages that are displayed by the REF via the CLI
This value is overridden if a value is specified via the CLI.
Default: 'INFO'
Type: str
db#
Database configuration
We support SQLite and PostgreSQL databases.
The default is to use SQLite, which is a file-based database that is stored in the
REF_CONFIGURATION directory.
This is a good option for testing and development, but not recommended for production use.
For production use, we recommend using PostgreSQL.
database_url#
Database URL that describes the connection to the database.
Defaults to sqlite:///{config.paths.db}/climate_ref.db.
This configuration value will be overridden by the REF_DATABASE_URL environment variable.
Schemas
The following schemas are supported:
postgresql://USER:PASSWORD@HOST:PORT/NAME
sqlite:///RELATIVE_PATH or sqlite:////ABS_PATH or sqlite:///:memory:
Default: 'sqlite:///$REF_CONFIGURATION/db/climate_ref.db'
Type: str
Environment Variable: 'REF_DATABASE_URL'
max_backups#
Maximum number of database backups to keep.
When running migrations for on-disk SQLite databases, a backup of the database is created. This setting controls how many of these backups are retained. The oldest backups are automatically removed when this limit is exceeded.
Default: 5
Type: int
Environment Variable: 'REF_MAX_BACKUPS'
run_migrations#
No description provided.
Default: True
Type: bool
diagnostic_providers#
Defining the diagnostic providers used by the REF.
Each diagnostic provider is a package that contains the logic for running a specific set of diagnostics. This configuration determines which diagnostic providers are loaded and used when solving.
Multiple diagnostic providers can be specified as shown in the example below.
[[diagnostic_providers]]
provider = "climate_ref_esmvaltool:provider"
[diagnostic_providers.config]
[[diagnostic_providers]]
provider = "climate_ref_ilamb:provider"
[diagnostic_providers.config]
[[diagnostic_providers]]
provider = "climate_ref_pmp:provider"
[diagnostic_providers.config]
config#
Additional configuration for the diagnostic provider.
See the documentation for the diagnostic package for the available configuration options.
Default: {}
Type: dict
provider#
Package that contains the diagnostic provider
This should be the fully qualified name of the diagnostic provider.
Default: 'climate_ref_example:provider'
Type: str
executor#
Configuration to define the executor to use for running diagnostics
config#
Additional configuration for the executor.
See the documentation for the executor for the available configuration options. These options will be passed to the executor class when it is created.
Default: {}
Type: dict
executor#
Executor class to use for running diagnostics
This should be the fully qualified name of the executor class
(e.g. climate_ref.executor.LocalExecutor).
The default is to use the local executor which runs the executions locally, in-parallel
using a process pool.
This class will be used for all executions of diagnostics.
Default: 'climate_ref.executor.LocalExecutor'
Type: str
Environment Variable: 'REF_EXECUTOR'
paths#
Common paths used by the REF application
Warning
These paths must be common across all systems that the REF is being run. Generally, this means that they should be mounted in the same location on all systems.
If any of these paths are specified as relative paths, they will be resolved to absolute paths. These absolute paths will be used for all operations in the REF.
dimensions_cv#
Path to a file containing the controlled vocabulary for the dimensions in a CMEC diagnostics bundle
This defaults to the controlled vocabulary for the CMIP7 Assessment Fast Track diagnostics,
which is included in the climate_ref_core package.
This controlled vocabulary is used to validate the dimensions in the diagnostics bundle. If custom diagnostics are implemented, this file may need to be extended to include any new dimensions.
Default: '$REF_INSTALL_DIRECTORY/cv_cmip7_aft.yaml'
Type: Path
Environment Variable: 'REF_DIMENSIONS_CV_PATH'
log#
Directory to store log files from the compute engine
This is not currently used by the REF, but is included for future use.
Default: '$REF_CONFIGURATION/log'
Type: Path
Environment Variable: 'REF_LOG_ROOT'
results#
Path to store the executions
Default: '$REF_CONFIGURATION/results'
Type: Path
Environment Variable: 'REF_RESULTS_ROOT'
scratch#
Shared scratch space for the REF.
This directory is used to write the intermediate executions of a diagnostic execution. After the diagnostic has been run, the executions will be copied to the executions directory.
This directory must be accessible by all the diagnostic services that are used to run the diagnostics, but does not need to be mounted in the same location on all the diagnostic services.
Default: '$REF_CONFIGURATION/scratch'
Type: Path
Environment Variable: 'REF_SCRATCH_ROOT'
software#
Shared software space for the REF.
This directory is used to store software environments.
This directory must be accessible by all the diagnostic services that are used to run the diagnostics, and should be mounted in the same location on all the diagnostic services.
Default: '$REF_CONFIGURATION/software'
Type: Path
Environment Variable: 'REF_SOFTWARE_ROOT'