API Reference

This page contains the API reference for the public classes in pyhanami.

Last update: Apr 19, 2026

class pyhanami.DataDiagnostics(datasets=None)[source]

Bases: object

Perform diagnostic comparisons between climate simulation ensembles.

This class provides functionality for computing and visualizing differences in climate variables between simulation ensembles. It includes methods for computing annual time series, absolute differences, effect sizes and significance differences at grid point level.

Parameters:

datasets (SimulationData or Iterable[SimulationData], optional) – Ensemble or list of ensembles containing simulation data and metadata.

datasets

List of ensembles containing simulation data and metadata.

Type:

list[SimulationData]

variables

Configuration dictionary mapping variable names to display metadata.

Type:

dict

max_workers_grid

Number of parallel workers used for grid-level computations.

Type:

int

add_datasets(datasets)[source]

Add new datasets to the DataDiagnostics object.

Parameters:

datasets (SimulationData or Iterable[SimulationData])) – Ensemble or list of ensembles containing simulation data and metadata to add.

time_series_plot(var_name, data_names=None, output_path=None, obs=False, obs_paths=None, obs_names=None, time_freq='annual', start_year=None, end_year=None, plot_ens=False)[source]

Generate time series plot for the given datasets and variable for the selected period and time frequency.

Parameters:
  • var_name (str) – Climate variable name.

  • data_names (str or list[str], optional) – Name or list of names of simulation ensembles to plot. If None, all datasets in the DataDiagnostics object are used.

  • output_path (str, optional) – Path to save the time series plot.

  • obs (bool) – If True, also plot observational data if available (default: False).

  • obs_paths (str or list[str], optional) – Path to the observations database/s.

  • obs_names (str or list[str], optional) – Name of the observational dataset/s.

  • time_freq (str) – Resampling frequency (default: ‘annual’).

  • start_year (int) – Start year to plot.

  • end_year (int) – End year to plot.

  • plot_ens (bool) – Whether to plot individual ensemble members trajectories (default: False).

abs_diff_plot(var_name, data_names=None, output_path=None, start_year=None, end_year=None, clon=0)[source]

Generate absolute difference plot for the given datasets and variable.

Parameters:
  • var_name (str) – Climate variable name.

  • data_names (list[str], optional) – List of names of two simulation ensembles to compare. If None, the first two datasets in the diagnostics object are used.

  • output_path (str, optional) – Path to save the spatial plots.

  • start_year (int) – Start year to plot.

  • end_year (int) – End year to plot.

  • clon (int) – Central longitude for the spatial map (default: 0).

eff_size_plot(var_name, data_names=None, output_path=None, start_year=None, end_year=None, clon=0, alpha=0.05, stat=scipy.stats.ttest_ind)[source]

Generate effect size plot for the given datasets and variable marking grid points with statistically significant differences.

Parameters:
  • var_name (str) – Climate variable name.

  • data_names (list[str], optional) – List of names of two simulation ensembles to compare. If None, the first two datasets in the diagnostics object are used.

  • output_path (str, optional) – Path to save the spatial plots.

  • start_year (int) – Start year to plot.

  • end_year (int) – End year to plot.

  • clon (int) – Central longitude for the spatial map (default: 0).

  • alpha (float) – Significance level for the statistical test (default: 0.05).

  • stat (Callable) – Statistical test function to use for significance testing (default: ttest_ind).

bias_plot(var_name, data_name=None, output_path=None, obs_path=None, obs_name=None, start_year=None, end_year=None, clon=0)[source]

Generate bias plot for the given dataset and variable comparing with observations.

Parameters:
  • var_name (str) – Climate variable name.

  • data_name (str, optional) – Name of the simulation ensemble to plot. If None, the first dataset in the DataDiagnostics object is used.

  • output_path (str, optional) – Path to save the spatial plot.

  • obs_path (str) – Path to the observations database.

  • obs_name (str) – Name of the observational dataset.

  • start_year (int) – Start year to plot.

  • end_year (int) – End year to plot.

  • clon (int) – Central longitude for the spatial map (default: 0).

class pyhanami.ObservationData(data_path, sim, name='obs', realization=0, regrid_method='bilinear')[source]

Bases: object

Retrieves and processes observational datasets for evaluation of simulations.

This class interfaces with an external observational data source to retrieve datasets that match the variables and time period of a given simulation dataset. Retrieved data are then regridded to match the spatial resolution of the input simulation data.

Parameters:
  • data_path (str) – Path to an observations database.data.

  • sim (xr.Dataset) – Input simulation dataset.

  • name (str) – Name of the observations instance (default: obs).

  • realization (int) – Realization number to select from the observations dataset if more than one member is present (default: 0).

  • regrid_method (str) – Regridding method (default: bilinear).

data_path

Path to the observations database.

Type:

Path

data

Processed observational data, regridded to match the input simulation.

Type:

xr.Dataset

name

Name of the observations instance (default: obs).

Type:

str

realization

Realization number to select from the observations dataset if more than one member is present (default: 0).

Type:

int

regrid_method

Regridding method (default: bilinear).

Type:

str

load_and_process(sim)[source]

Retrieve and regrid observational data for the variables and period available in the given simulation ensemble.

Parameters:

sim (xr.Dataset) – Input simulation dataset.

Returns:

data_new_grid – Regridded observational dataset matching the input simulation.

Return type:

xr.Dataset

class pyhanami.ReplicabilityTest(datasets=None, obs_path=None, alpha=0.05)[source]

Bases: object

Perform replicability test between two climate simulation ensembles.

This class compares two climate simulation ensembles using a variety of metrics and statistical tests to assess whether both climates are statistically significantly different. The test is conducted over multiple variables, regions, seasons, and ensemble members. It also supports plotting results and generating summary reports.

Parameters:
  • datasets (Iterable[SimulationData], optional) – Ensemble or list of ensembles containing simulation data and metadata.

  • obs_path (str) – Path to the observations database.

  • alpha (float) – Significance level for the statistical tests (default: 0.05).

datasets

List of ensembles containing simulation data and metadata.

Type:

list[SimulationData]

obs_path

Path to the observations database.

Type:

str

obs

Instance containing observational data for comparison.

Type:

ObservationData

variables

Configuration dictionary mapping variable names to display metadata.

Type:

dict

alpha

Significance level for the statistical tests.

Type:

float

max_workers_grid

Number of parallel workers used for variable-wise computations.

Type:

int

metrics

List of metrics with names and corresponding functions to compute scores.

Type:

list of dict

tests

Dictionary of statistical tests for comparing score distributions.

Type:

dict

seasons

List of seasons to compute scores over.

Type:

list of str

regions

Dictionary mapping region names to latitude bounds.

Type:

dict

eff_sizes

Dictionary to store effect sizes between the replicability test scores for each pair of datasets.

Type:

dict

test_results

Dictionary to store results of the replicability test for each pair of datasets.

Type:

dict

add_datasets(datasets)[source]

Add new datasets to the ReplicabilityTest object.

Parameters:

datasets (SimulationData or Iterable[SimulationData]) – Ensemble or list of ensembles containing simulation data and metadata to add.

perform_rep_test(data_names=None)[source]

Perform replicability test comparing the given simulation ensembles.

Parameters:

data_names (list[str], optional) – List of names of two simulation ensembles to compare. If None, the first two datasets in the ReplicabilityTest object are used.

get_eff_sizes(data_names)[source]

Return precomputed effect sizes between the replicability test scores for the given simulation ensembles.

Parameters:

data_names (list[str]) – List of names of two simulation ensembles to compare.

Returns:

eff_sizes – Effect sizes for all variables, seasons, regions and metrics.

Return type:

xr.DataArray

get_test_results(data_names)[source]

Return replicability test results for the given simulation ensembles.

Parameters:

data_names (list[str]) – List of names of two simulation ensembles to compare.

Returns:

test_results – Results of the replicability test for all variables, seasons, regions and tests.

Return type:

xr.DataArray

save_data(data_names, output_path)[source]

Save computed effect size between the replicability test scores and test results to NetCDF files.

Parameters:
  • data_names (list[str]) – List of names of two simulation ensembles to compare.

  • output_path (str) – Path to save the data files.

matrix_plot(data_names, output_path=None)[source]

Generate matrix plot with effect sizes and replicability test results.

Parameters:
  • data_names (list[str]) – List of names of two simulation ensembles to compare.

  • output_path (str, optional) – Path to save the matrix plot.

report(output_path, time_series=False, spatial=False)[source]

Generate a summary report with the results of the replicability test and the selected plots.

Parameters:
  • output_path (str) – Path to save the report.

  • time_series (bool) – Whether to include time series plots in the report (default: False).

  • spatial (bool) – Whether to include spatial plots in the report (default: False).

class pyhanami.ScientificEvaluation(datasets=None)[source]

Bases: object

Compute and plot scores for scientific model skill evaluation.

This class provides functionality for computing and visualizing metric to evaluate how well a model reproduces several phenomena. Currently, it includes methods for bimodal ISO indices.

Parameters:

datasets (SimulationData or Iterable[SimulationData], optional) – Ensemble or list of ensembles containing simulation data and metadata.

datasets

List of ensembles containing simulation data and metadata.

Type:

list[SimulationData]

variables

Configuration dictionary mapping variable names to display metadata.

Type:

dict

add_datasets(datasets)[source]

Add new datasets to the ScientificEvaluation object.

Parameters:

datasets (SimulationData or Iterable[SimulationData])) – Ensemble or list of ensembles containing simulation data and metadata to add.

compute_general_scores(var_names=None, data_name=None, obs_name=None, obs_path=None, start_year=None, end_year=None)[source]

Initialize and compute general model skill evaluation scores for a selected dataset.

Parameters:
  • var_names (str or list[str], optional) – Climate variable(s) name(s). If None, all variables in the simulated dataset will be used.

  • data_name (str, optional) – Name of simulation ensemble to use. If None, the first dataset in the ScientificEvaluation object is used.

  • obs_name (str) – Name of the observational dataset to compare to (default: config_params.GEN_OBS_NAME).

  • obs_path (str) – Path to the observations database (default: config_params.GEN_OBS_PATH).

  • start_year (int) – Initial and end years to compute the general scores for.

  • end_year (int) – Initial and end years to compute the general scores for.

Returns:

general_analysis – GeneralEvaluation object containing the computed general scientific skill scalar scores.

Return type:

GeneralEvaluation

compute_iso_scores(data_name=None, start_year_eeof=None, end_year_eeof=None, start_year_pc=None, end_year_pc=None, obs=False, obs_path=None, correct_pc=False, iso_config=None)[source]

Initialize and compute bimodal ISO indices (following (K. Kikuchi, 2020)) and derive scalar scores (following (M. Nakano et al., 2019)) for a selected dataset.

Parameters:
  • data_name (str, optional) – Name of simulation ensemble to use. If None, the first dataset in the ScientificEvaluation object is used.

  • start_year_eeof (int) – Initial and end years to perform the Extended Empirical Orthogonal Function (EEOF) analysis for (not needed if obs=True).

  • end_year_eeof (int) – Initial and end years to perform the Extended Empirical Orthogonal Function (EEOF) analysis for (not needed if obs=True).

  • start_year_pc (int) – Initial and end years to compute Principal Components (PCs) for.

  • end_year_pc (int) – Initial and end years to compute Principal Components (PCs) for.

  • obs (bool) – If True, use EEOFs from observational data (default: False).

  • obs_path (str) – Path to the observational NOAA data file. As of now, only necessary if the resolution of the NOAA data (2.5°x2.5°) is higher than that of the simulation data.

  • correct_pc (bool) – Whether to adjust simulated PCs by dividing by alpha (default: False).

  • iso_config (ISOConfig) – Configuration dataclass with parameters necessary for the ISO evaluation. If None, default values from the configuration file pyhanami.config.scientific_evaluation_parameters.yaml will be used.

Returns:

iso_analysis – ISOEvaluation object containing the computed bimodal ISO indices and related scalar scores.

Return type:

ISOEvaluation

compute_mjo_scores(data_name=None, obs_path=None, start_year_mjo=None, end_year_mjo=None, start_year_ref=None, end_year_ref=None, threshold_active_days=None, mjo_config=None, mjo_vars=['ua850', 'ua200', 'rlut'])[source]

Initialize and compute Real-Time Multivariate MJO (RMM) indices following (M.C. Wheeler & H.H. Hendon, 2004) and MJO wavenumber-frequency power spectra following (M.C. Wheeler & G.N. Kiladis, 1999) and derived scalar scores following (M.-S. Ahn et al., 2017) for a selected dataset.

Parameters:
  • data_name (str, optional) – Name of simulation ensemble to use. If None, the first dataset in the ScientificEvaluation object is used.

  • obs_path (str) – Path to the observational data file with the necessary variables for the MJO analysis.

  • start_year_mjo (int) – Initial and end years to perform the analysis for.

  • end_year_mjo (int) – Initial and end years to perform the analysis for.

  • start_year_ref (int) – Initial and end years for computing the reference seasonal cycle. If None, taken as the initial and end years for the whole MJO analysis.

  • end_year_ref (int) – Initial and end years for computing the reference seasonal cycle. If None, taken as the initial and end years for the whole MJO analysis.

  • threshold_active_days (float) – Threshold for the amplitude of the first two PCs to consider the MJO active at a given day. If None, the mean MJO amplitude over the entire period is used as a threshold.

  • mjo_config (MJOConfig) – Configuration dataclass with parameters necessary for the MJO evaluation. If None, default values from the configuration file pyhanami.config.scientific_evaluation_parameters.yaml will be used.

  • mjo_vars (list[str]) – Variables to be usd for the MJO analysis (default: [‘ua850’, ‘ua200’, ‘rlut’]).

Returns:

mjo_analysis – MJOEvaluation object containing the computed RMM MJO indices, power spectra and scalar scores.

Return type:

MJOEvaluation

compute_tc_scores(data_name=None, start_year_tc=None, end_year_tc=None, obs=True, wind_factor=1.0, min_wind=10, basin=-1, bin_size=2.5, tc_config=None)[source]

Compute Tropical Cyclones (TCs) metrics and derive scalar scores following (C.M. Zarzycki et al., 2021) and plot results.

Parameters:
  • data_name (str, optional) – Name of simulation ensemble to use. If None, the first dataset in the ScientificEvaluation object is used.

  • start_year_tc (int, optional) – Initial and end years to compute the TCs metrics for.

  • end_year_tc (int, optional) – Initial and end years to compute the TCs metrics for.

  • obs (bool) – If True, include observational data if available (default: True).

  • wind_factor (float) – Wind speed correction factor (to normalize the provided wind to 10 m wind) for simulations (default: 1.0).

  • min_wind (float) – Minimum 10 m wind speed in m/s for TCs detection (default: 10.0).

  • basin (int) –

    Basin/hemisphere to consider for the analysis (default: -1). Codes are:
    • <0 → GLOB (Global domain)

    • 1 → NATL (North Atlantic)

    • 2 → EPAC (Eastern Pacific)

    • 3 → CPAC (Central Pacific)

    • 4 → WPAC (Western Pacific)

    • 5 → NIO (North Indian Ocean)

    • 6 → SIO (South Indian Ocean)

    • 7 → SPAC (South Pacific)

    • 8 → SATL (South Atlantic)

    • 9 → FLA (Florida)

    • 20 → NHEMI (Northern Hemisphere)

    • 21 → SHEMI (Southern Hemisphere)

    • otherwise → NONE (unrecognized)

  • bin_size (float) – Size of the bins in degrees for computing the TCs metrics with CyMeP (default: 2.5).

  • tc_config (TCConfig) – Configuration dataclass with parameters necessary for the TC evaluation. If None, default values from the configuration file pyhanami.config.scientific_evaluation_parameters.yaml will be used.

Returns:

tc_analysis – TCEvaluation object containing the computed TCs metrics and scalar scores.

Return type:

TCEvaluation

class pyhanami.SimulationData(data_source, name='sim')[source]

Bases: object

Loads and processes climate simulation data from a NetCDF file or a catalogue interface.

This class provides functionality to read input data, perform validation, and store metadata such as the simulation name and file path.

Parameters:
  • data_source (str or Path or xr.Dataset) – Path to a dataset file or catalogue interface, or an already loaded xarray.Dataset object.

  • name (str) – Name of the simulation instance (default: ‘sim’).

data_path

Path to the dataset file or catalogue interface if provided; None if dataset was passed directly.

Type:

Path or None

name

Name of the simulation instance.

Type:

str

data

Loaded dataset object with climate variables.

Type:

xr.Dataset

check_data()[source]

Check and correct provided data (available variables, units, coordinates names and format, …).

Returns:

data_sim – Checked simulation data.

Return type:

xr.Dataset

class pyhanami.ISOConfig(lat_range: tuple = (-30, 30), window_size: int = 141, low_freq: float = 0.011111111111111112, high_freq: float = 0.04, lag: int = 5, n_lags: int = 3, n_modes: int = 2)

Bases: object

high_freq: float = 0.04
lag: int = 5
lat_range: tuple = (-30, 30)
low_freq: float = 0.011111111111111112
n_lags: int = 3
n_modes: int = 2
window_size: int = 141
class pyhanami.MJOConfig(lat_range: tuple = (-15, 15), rolling_window_size: int = 120, n_harmonics: int = 3, normalize_std: bool = False, n_modes: int = 2, seg_size: int = 96, n_overlap: int = 60, mjo_freq_bounds: tuple = (0.0125, 0.03333333333333333), mjo_wavenum_bounds: tuple = (0, 4))

Bases: object

lat_range: tuple = (-15, 15)
mjo_freq_bounds: tuple = (0.0125, 0.03333333333333333)
mjo_wavenum_bounds: tuple = (0, 4)
n_harmonics: int = 3
n_modes: int = 2
n_overlap: int = 60
normalize_std: bool = False
rolling_window_size: int = 120
seg_size: int = 96
class pyhanami.TCConfig(psl_delta: float = 200.0, psl_dist: float = 5.5, z_delta: float = -6.0, z_dist: float = 6.5, z_offset: float = 1.0, merge_dist: float = 6.0, traj_range: float = 8.0, traj_min_length: int = 10, traj_max_gap: int = 3, min_len: int = 10, max_lat: float = 50.0, max_topo: float = 1500.0, truncate_years: bool = True, do_defineMIbypres: bool = False, do_fill_missing_pw: bool = True, do_special_filter_obs: bool = False, threshold_ace_wind: float = -1.0, threshold_pace_pres: float = -100.0)

Bases: object

do_defineMIbypres: bool = False
do_fill_missing_pw: bool = True
do_special_filter_obs: bool = False
max_lat: float = 50.0
max_topo: float = 1500.0
merge_dist: float = 6.0
min_len: int = 10
psl_delta: float = 200.0
psl_dist: float = 5.5
threshold_ace_wind: float = -1.0
threshold_pace_pres: float = -100.0
traj_max_gap: int = 3
traj_min_length: int = 10
traj_range: float = 8.0
truncate_years: bool = True
z_delta: float = -6.0
z_dist: float = 6.5
z_offset: float = 1.0