API Reference

This page contains the API reference for the public classes in pyhanami.

Last update: Jul 21, 2026

class pyhanami.DataDiagnostics(datasets=None)[source]

Bases: object

Perform diagnostic comparisons between climate simulation ensembles.

This class provides functionality for computing and visualizing differences in climate variables between simulation ensembles. It includes methods for computing annual time series, absolute differences, effect sizes and significance differences at grid point level.

Parameters:: datasets (SimulationData or Iterable[SimulationData], optional) – Ensemble or list of ensembles containing simulation data and metadata.

datasets

List of ensembles containing simulation data and metadata.

Type:: list[SimulationData]

variables

Configuration dictionary mapping variable names to display metadata.

Type:: dict

max_workers_grid

Number of parallel workers used for grid-level computations.

Type:: int

add_datasets(datasets)[source]

Add new datasets to the DataDiagnostics object.

Parameters:: datasets (SimulationData or Iterable[SimulationData])) – Ensemble or list of ensembles containing simulation data and metadata to add.

time_series_plot(var_name, data_names=None, output_path=None, obs=False, obs_paths=None, obs_names=None, time_freq='annual', start_year=None, end_year=None, plot_ens=False)[source]

Generate time series plot for the given datasets and variable for the selected period and time frequency.

Parameters:

var_name (str) – Climate variable name.
data_names (str or list[str], optional) – Name or list of names of simulation ensembles to plot. If None, all datasets in the DataDiagnostics object are used.
output_path (str, optional) – Path to save the time series plot.
obs (bool) – If True, also plot observational data if available (default: False).
obs_paths (str or list[str], optional) – Path to the observations database/s.
obs_names (str or list[str], optional) – Name of the observational dataset/s.
time_freq (str) – Resampling frequency (default: ‘annual’).
start_year (int) – Start year to plot.
end_year (int) – End year to plot.
plot_ens (bool) – Whether to plot individual ensemble members trajectories (default: False).

abs_diff_plot(var_name, data_names=None, output_path=None, start_year=None, end_year=None, clon=0)[source]

Generate absolute difference plot for the given datasets and variable.

Parameters:

var_name (str) – Climate variable name.
data_names (list[str], optional) – List of names of two simulation ensembles to compare. If None, the first two datasets in the diagnostics object are used.
output_path (str, optional) – Path to save the spatial plots.
start_year (int) – Start year to plot.
end_year (int) – End year to plot.
clon (int) – Central longitude for the spatial map (default: 0).

eff_size_plot(var_name, data_names=None, output_path=None, start_year=None, end_year=None, clon=0, alpha=0.05, stat=scipy.stats.ttest_ind)[source]

Generate effect size plot for the given datasets and variable marking grid points with statistically significant differences.

Parameters:

var_name (str) – Climate variable name.
data_names (list[str], optional) – List of names of two simulation ensembles to compare. If None, the first two datasets in the diagnostics object are used.
output_path (str, optional) – Path to save the spatial plots.
start_year (int) – Start year to plot.
end_year (int) – End year to plot.
clon (int) – Central longitude for the spatial map (default: 0).
alpha (float) – Significance level for the statistical test (default: 0.05).
stat (Callable) – Statistical test function to use for significance testing (default: ttest_ind).

bias_plot(var_name, data_name=None, output_path=None, obs_path=None, obs_name=None, start_year=None, end_year=None, clon=0)[source]

Generate bias plot for the given dataset and variable comparing with observations.

Parameters:

var_name (str) – Climate variable name.
data_name (str, optional) – Name of the simulation ensemble to plot. If None, the first dataset in the DataDiagnostics object is used.
output_path (str, optional) – Path to save the spatial plot.
obs_path (str) – Path to the observations database.
obs_name (str) – Name of the observational dataset.
start_year (int) – Start year to plot.
end_year (int) – End year to plot.
clon (int) – Central longitude for the spatial map (default: 0).

class pyhanami.ObservationData(data_path, sim, name='obs', realization=0, regrid_method='bilinear')[source]

Bases: object

Retrieves and processes observational datasets for evaluation of simulations.

This class interfaces with an external observational data source to retrieve datasets that match the variables and time period of a given simulation dataset. Retrieved data are then regridded to match the spatial resolution of the input simulation data.

Parameters:

data_path (str) – Path to an observations database.data.
sim (xr.Dataset) – Input simulation dataset.
name (str) – Name of the observations instance (default: obs).
realization (int) – Realization number to select from the observations dataset if more than one member is present (default: 0).
regrid_method (str) – Regridding method (default: bilinear).

data_path

Path to the observations database.

Type:: Path

data

Processed observational data, regridded to match the input simulation.

Type:: xr.Dataset

name

Name of the observations instance (default: obs).

Type:: str

realization

Realization number to select from the observations dataset if more than one member is present (default: 0).

Type:: int

regrid_method

Regridding method (default: bilinear).

Type:: str

load_and_process(sim)[source]

Retrieve and regrid observational data for the variables and period available in the given simulation ensemble.

Parameters:: sim (xr.Dataset) – Input simulation dataset.
Returns:: data_new_grid – Regridded observational dataset matching the input simulation.
Return type:: xr.Dataset

class pyhanami.ReplicabilityTest(datasets=None, alpha=0.05, power=0.8)[source]

Bases: object

Perform replicability test between two climate simulation ensembles.

This class compares two climate simulation ensembles using a variety of metrics and statistical tests to assess whether both climates are statistically significantly different. The test is conducted over multiple variables, regions, seasons, and ensemble members. It also supports plotting results and generating summary reports.

Parameters:

datasets (Iterable[SimulationData], optional) – Ensemble or list of ensembles containing simulation data and metadata.
alpha (float) – Significance level for the statistical tests (default: 0.05).
power (float) – Statistical power to compute minimum detectable effect size for the t-test (default: 0.8).

datasets

List of ensembles containing simulation data and metadata.

Type:: list[SimulationData]

variables

Configuration dictionary mapping variable names to display metadata.

Type:: dict

alpha

Significance level for the statistical tests.

Type:: float

power

Statistical power to compute minimum detectable effect size for the t-test.

Type:: float

max_workers_grid

Number of parallel workers used for variable-wise computations.

Type:: int

metrics

List of metrics with names and corresponding functions to compute scores.

Type:: list of dict

tests

Dictionary of statistical tests for comparing score distributions.

Type:: dict

seasons

List of seasons to compute scores over.

Type:: list of str

regions

Dictionary mapping region names to latitude bounds.

Type:: dict

effect_sizes

Dictionary to store effect sizes between the replicability test scores for each pair of datasets for all variables, seasons, regions, and metrics.

Type:: dict[str, xr.DataArray]

test_results

Dictionary to store results of the replicability test for each pair of datasets for all variables, seasons, regions, metrics, and statistical tests.

Type:: dict[str, xr.DataArray]

add_datasets(datasets)[source]

Add new datasets to the ReplicabilityTest object.

Parameters:: datasets (SimulationData or Iterable[SimulationData]) – Ensemble or list of ensembles containing simulation data and metadata to add.

perform_rep_test(data_names=None, obs_path=None, start_year=None, end_year=None)[source]

Perform replicability test comparing the given simulation ensembles.

Parameters:

data_names (list[str], optional) – List of names of two simulation ensembles to compare. If None, the first two datasets in the ReplicabilityTest object are used.
obs_path (str) – Path to the observations database.
start_year (int) – Start year for the test.
end_year (int) – End year for the test.

get_effect_sizes(data_names, start_year=None, end_year=None)[source]

Return precomputed effect sizes between the replicability test scores for the given simulation ensembles.

Parameters:

data_names (list[str]) – List of names of two simulation ensembles to compare.
start_year (int) – Start year for effect sizes.
end_year (int) – End year for effect sizes.

Returns:

effect_sizes_ds – Effect sizes for all variables, seasons, regions, and metrics.

Return type:

xr.DataArray

get_test_results(data_names, start_year=None, end_year=None)[source]

Return replicability test results for the given simulation ensembles.

Parameters:

data_names (list[str]) – List of names of two simulation ensembles to compare.
start_year (int) – Start year for effect sizes.
end_year (int) – End year for effect sizes.

Returns:

test_results_ds – Results of the replicability test for all variables, seasons, regions, metrics, and tests.

Return type:

xr.DataArray

save_data(data_names, output_path, start_year=None, end_year=None)[source]

Save computed effect size between the replicability test scores and test results to NetCDF files.

Parameters:

data_names (list[str]) – List of names of two simulation ensembles to compare.
output_path (str) – Path to save the data files.
start_year (int) – Start year for test output.
end_year (int) – End year for test output.

matrix_plot(data_names, output_path=None, start_year=None, end_year=None)[source]

Generate matrix plot with effect sizes and replicability test results.

Parameters:

data_names (list[str]) – List of names of two simulation ensembles to compare.
output_path (str, optional) – Path to save the matrix plot.
start_year (int) – Start year for test output.
end_year (int) – End year for test output.

report(output_path, time_series=False, spatial=False)[source]

Generate a summary report with the results of the replicability test and the selected plots.

Parameters:

output_path (str) – Path to save the report.
time_series (bool) – Whether to include time series plots in the report (default: False).
spatial (bool) – Whether to include spatial plots in the report (default: False).

class pyhanami.ScientificEvaluation(datasets=None)[source]

Bases: object

Compute and plot scores for scientific model skill evaluation.

This class provides functionality for computing and visualizing metric to evaluate how well a model reproduces several phenomena. Currently, it includes methods for bimodal ISO indices.

Parameters:: datasets (SimulationData or Iterable[SimulationData], optional) – Ensemble or list of ensembles containing simulation data and metadata.

datasets

List of ensembles containing simulation data and metadata.

Type:: list[SimulationData]

variables

Configuration dictionary mapping variable names to display metadata.

Type:: dict

_general_scores

Dictionary to store general evaluation output for each dataset.

Type:: dict[str, general.GeneralEvaluation]

_iso_scores

Dictionary to store ISO evaluation output for each dataset.

Type:: dict[str, iso.ISOEvaluation]

_mjo_scores

Dictionary to store MJO evaluation output for each dataset.

Type:: dict[str, mjo.MJOEvaluation]

_tc_scores

Dictionary to store TCs evaluation output for each dataset.

Type:: dict[str, tc.TCEvaluation]

add_datasets(datasets)[source]

Add new datasets to the ScientificEvaluation object.

Parameters:: datasets (SimulationData or Iterable[SimulationData])) – Ensemble or list of ensembles containing simulation data and metadata to add.

compute_general_scores(var_names=None, data_names=None, obs_name=None, obs_path=None, start_year=None, end_year=None, ensemble_mode='mean', member=None)[source]

Initialize and compute general model skill evaluation scores for selected dataset/s.

Parameters:

var_names (str or list[str], optional) – Climate variable/s name/s. If None, all variables in the simulated dataset will be used.
data_names (str or list[str], optional) – Name/s of simulation ensemble/s to use. If None, all datasets in the ScientificEvaluation object are used.
obs_name (str) – Name of the observational dataset to compare to (default: config_params.GEN_OBS_NAME).
obs_path (str) – Path to the observations database (default: config_params.GEN_OBS_PATH).
start_year (int) – Initial and end years to compute the general scores for.
end_year (int) – Initial and end years to compute the general scores for.
ensemble_mode (str) – Strategy to handle simulation ensembles (datasets with realization coordinate) either taking the ensemble mean over all members (“mean”) or selecting one specific member (“member”) (default: “mean”).
member (int) – Ensemble member to use when ensemble_mode=”member” is selected.

compute_iso_scores(data_names=None, start_year_eeof=None, end_year_eeof=None, start_year_pc=None, end_year_pc=None, obs=False, obs_path=None, correct_pc=False, iso_config=None, ensemble_mode='mean', member=None)[source]

Initialize and compute bimodal ISO indices (following (K. Kikuchi, 2020)) and derive scalar scores (following (M. Nakano et al., 2019)) for selected datasets.

Parameters:

data_names (str or list[str], optional) – Name/s of simulation ensemble/s to use. If None, all datasets in the ScientificEvaluation object are used.
start_year_eeof (int) – Initial and end years to perform the Extended Empirical Orthogonal Function (EEOF) analysis for (not needed if obs=True).
end_year_eeof (int) – Initial and end years to perform the Extended Empirical Orthogonal Function (EEOF) analysis for (not needed if obs=True).
start_year_pc (int) – Initial and end years to compute Principal Components (PCs) for.
end_year_pc (int) – Initial and end years to compute Principal Components (PCs) for.
obs (bool) – If True, use EEOFs from observational data (default: False).
obs_path (str) – Path to the observational NOAA data file. As of now, only necessary if the resolution of the NOAA data (2.5°x2.5°) is higher than that of the simulation data.
correct_pc (bool) – Whether to adjust simulated PCs by dividing by alpha (default: False).
iso_config (ISOConfig) – Configuration dataclass with parameters necessary for the ISO evaluation. If None, default values from the configuration file pyhanami.config.scientific_evaluation_parameters.yaml will be used.
ensemble_mode (str) – Strategy to handle simulation ensembles (datasets with realization coordinate) either taking the ensemble mean over all members (“mean”) or selecting one specific member (“member”) (default: “mean”).
member (int) – Ensemble member to use when ensemble_mode=”member” is selected.

compute_mjo_scores(data_names=None, obs_path=None, start_year_mjo=None, end_year_mjo=None, start_year_ref=None, end_year_ref=None, threshold_active_days=None, mjo_config=None, mjo_vars=None, ensemble_mode='mean', member=None)[source]

Initialize and compute Real-Time Multivariate MJO (RMM) indices following (M.C. Wheeler & H.H. Hendon, 2004) and MJO wavenumber-frequency power spectra following (M.C. Wheeler & G.N. Kiladis, 1999) and derived scalar scores following (M.-S. Ahn et al., 2017) for a selected dataset.

Parameters:

data_names (str or list[str], optional) – Name/s of simulation ensemble/s to use. If None, all datasets in the ScientificEvaluation object are used.
obs_path (str) – Path to the observational data file with the necessary variables for the MJO analysis.
start_year_mjo (int) – Initial and end years to perform the analysis for.
end_year_mjo (int) – Initial and end years to perform the analysis for.
start_year_ref (int) – Initial and end years for computing the reference seasonal cycle. If None, taken as the initial and end years for the whole MJO analysis.
end_year_ref (int) – Initial and end years for computing the reference seasonal cycle. If None, taken as the initial and end years for the whole MJO analysis.
threshold_active_days (float) – Threshold for the amplitude of the first two PCs to consider the MJO active at a given day. If None, the mean MJO amplitude over the entire period is used as a threshold.
mjo_config (MJOConfig) – Configuration dataclass with parameters necessary for the MJO evaluation. If None, default values from the configuration file pyhanami.config.scientific_evaluation_parameters.yaml will be used.
mjo_vars (list[str]) – Variables to be usd for the MJO analysis (default: [‘ua850’, ‘ua200’, ‘rlut’]).
ensemble_mode (str) – Strategy to handle simulation ensembles (datasets with realization coordinate) either taking the ensemble mean over all members (“mean”) or selecting one specific member (“member”) (default: “mean”).
member (int) – Ensemble member to use when ensemble_mode=”member” is selected.

compute_tc_scores(data_names=None, start_year_tc=None, end_year_tc=None, obs=True, wind_factor=1.0, min_wind=10, basin=-1, bin_size=2.5, tc_config=None, ensemble_mode='mean', member=None)[source]

Compute Tropical Cyclones (TCs) metrics and derive scalar scores following (C.M. Zarzycki et al., 2021) and plot results.

Parameters:

data_names (str or list[str], optional) – Name/s of simulation ensemble/s to use. If None, all datasets in the ScientificEvaluation object are used.
start_year_tc (int, optional) – Initial and end years to compute the TCs metrics for.
end_year_tc (int, optional) – Initial and end years to compute the TCs metrics for.
obs (bool) – If True, include observational data if available (default: True).
wind_factor (float) – Wind speed correction factor (to normalize the provided wind to 10 m wind) for simulations (default: 1.0).
min_wind (float) – Minimum 10 m wind speed in m/s for TCs detection (default: 10.0).
basin (int) –
Basin/hemisphere to consider for the analysis (default: -1). Codes are:
- <0 → GLOB (Global domain)
- 1 → NATL (North Atlantic)
- 2 → EPAC (Eastern Pacific)
- 3 → CPAC (Central Pacific)
- 4 → WPAC (Western Pacific)
- 5 → NIO (North Indian Ocean)
- 6 → SIO (South Indian Ocean)
- 7 → SPAC (South Pacific)
- 8 → SATL (South Atlantic)
- 9 → FLA (Florida)
- 20 → NHEMI (Northern Hemisphere)
- 21 → SHEMI (Southern Hemisphere)
- otherwise → NONE (unrecognized)
bin_size (float) – Size of the bins in degrees for computing the TCs metrics with CyMeP (default: 2.5).
tc_config (TCConfig) – Configuration dataclass with parameters necessary for the TC evaluation. If None, default values from the configuration file pyhanami.config.scientific_evaluation_parameters.yaml will be used.
ensemble_mode (str) – Strategy to handle simulation ensembles (datasets with realization coordinate) either taking the ensemble mean over all members (“mean”) or selecting one specific member (“member”) (default: “mean”).
member (int) – Ensemble member to use when ensemble_mode=”member” is selected.

general_scores(data_names=None)[source]

Access general evaluation output and its corresponding methods for the given dataset/s.

Parameters:: data_names (str or list[str]) – Name/s of the dataset/s to access the general evaluation output for.
Returns:: general_handlers – Instance of the ScientificEvaluationWrapper class containing the general evaluation output for the given dataset/s and providing access to its corresponding methods.
Return type:: ScientificEvaluationWrapper

iso_scores(data_names=None)[source]

Access ISO evaluation output and its corresponding methods for the given dataset/s.

Parameters:: data_names (str or list[str]) – Name/s of the dataset/s to access the ISO evaluation output for.
Returns:: iso_handlers – Instance of the ScientificEvaluationWrapper class containing the ISO evaluation output for the given dataset/s and providing access to its corresponding methods.
Return type:: ScientificEvaluationWrapper

mjo_scores(data_names=None)[source]

Access MJO evaluation output and its corresponding methods for the given dataset/s.

Parameters:: data_names (str or list[str]) – Name/s of the dataset/s to access the MJO evaluation output for.
Returns:: mjo_handlers – Instance of the ScientificEvaluationWrapper class containing the MJO evaluation output for the given dataset/s and providing access to its corresponding methods.
Return type:: ScientificEvaluationWrapper

tc_scores(data_names=None)[source]

Access TC evaluation output and its corresponding methods for the given dataset/s.

Parameters:: data_names (str or list[str]) – Name/s of the dataset/s to access the TC evaluation output for.
Returns:: tc_handlers – Instance of the ScientificEvaluationWrapper class containing the TC evaluation output for the given dataset/s and providing access to its corresponding methods.
Return type:: ScientificEvaluationWrapper

class pyhanami.SimulationData(data_source, name='sim')[source]

Bases: object

Loads and processes climate simulation data from a NetCDF file or a catalogue interface.

This class provides functionality to read input data, perform validation, and store metadata such as the simulation name and file path.

Parameters:

data_source (str or Path or xr.Dataset) – Path to a dataset file or catalogue interface, or an already loaded xarray.Dataset object.
name (str) – Name of the simulation instance (default: ‘sim’).

data_path

Path to the dataset file or catalogue interface if provided; None if dataset was passed directly.

Type:: Path or None

name

Name of the simulation instance.

Type:: str

data

Loaded dataset object with climate variables.

Type:: xr.Dataset

check_data()[source]

Check and correct provided data (available variables, units, coordinates names and format, …).

Returns:: data_sim – Checked simulation data.
Return type:: xr.Dataset

class pyhanami.ISOConfig(lat_range: tuple = (-30, 30), window_size: int = 141, low_freq: float = 0.011111111111111112, high_freq: float = 0.04, lag: int = 5, n_lags: int = 3, n_modes: int = 2)

Bases: object

high_freq: float = 0.04

lag: int = 5

lat_range: tuple = (-30, 30)

low_freq: float = 0.011111111111111112

n_lags: int = 3

n_modes: int = 2

window_size: int = 141

class pyhanami.MJOConfig(lat_range: tuple = (-15, 15), rolling_window_size: int = 120, n_harmonics: int = 3, normalize_std: bool = False, n_modes: int = 2, seg_size: int = 96, n_overlap: int = 60, mjo_freq_bounds: tuple = (0.0125, 0.03333333333333333), mjo_wavenum_bounds: tuple = (0, 4))

Bases: object

lat_range: tuple = (-15, 15)

mjo_freq_bounds: tuple = (0.0125, 0.03333333333333333)

mjo_wavenum_bounds: tuple = (0, 4)

n_harmonics: int = 3

n_modes: int = 2

n_overlap: int = 60

normalize_std: bool = False

rolling_window_size: int = 120

seg_size: int = 96

class pyhanami.TCConfig(psl_delta: float = 200.0, psl_dist: float = 5.5, z_delta: float = -6.0, z_dist: float = 6.5, z_offset: float = 1.0, merge_dist: float = 6.0, traj_range: float = 8.0, traj_min_length: int = 10, traj_max_gap: int = 3, min_len: int = 10, max_lat: float = 50.0, max_topo: float = 1500.0, truncate_years: bool = True, do_defineMIbypres: bool = False, do_fill_missing_pw: bool = True, do_special_filter_obs: bool = False, threshold_ace_wind: float = -1.0, threshold_pace_pres: float = -100.0)

Bases: object

do_defineMIbypres: bool = False

do_fill_missing_pw: bool = True

do_special_filter_obs: bool = False

max_lat: float = 50.0

max_topo: float = 1500.0

merge_dist: float = 6.0

min_len: int = 10

psl_delta: float = 200.0

psl_dist: float = 5.5

threshold_ace_wind: float = -1.0

threshold_pace_pres: float = -100.0

traj_max_gap: int = 3

traj_min_length: int = 10

traj_range: float = 8.0

truncate_years: bool = True

z_delta: float = -6.0

z_dist: float = 6.5

z_offset: float = 1.0