diag_manager/diag_yaml_format.md

   1 ## Diag Table Yaml Format:
   2
   3 The purpose of this document is to explain the diag_table yaml format.
   4
   5 ## Contents
   6 - [1. Converting from legacy ascii diag_table format](diag_yaml_format.md#1-converting-from-legacy-ascii-diag_table-format)
   7 - [2. Diag table yaml sections](diag_yaml_format.md#2-diag-table-yaml-sections)
   8 - [2.1 Global Section](diag_yaml_format.md#21-global-section)
   9 - [2.2 File Section](diag_yaml_format.md#22-file-section)
  10 - [2.2.1 Flexible output timings](diag_yaml_format.md#221-flexible-output-timings)
  11 - [2.2.2 Coupled Model Diag Files](diag_yaml_format.md#222-coupled-model-diag-files)
  12 - [2.3 Variable Section](diag_yaml_format.md#23-variable-section)
  13 - [2.4 Variable Metadata Section](diag_yaml_format.md#24-variable-metadata-section)
  14 - [2.5 Global Meta Data Section](diag_yaml_format.md#25-global-meta-data-section)
  15 - [2.6 Sub_region Section](diag_yaml_format.md#26-sub_region-section)
  16 - [3. More examples](diag_yaml_format.md#3-more-examples)
  17 - [4. Schema](diag_yaml_format.md#4-schema)
  18 - [5. Ensemble and Nest Support](diag_yaml_format.md#5-ensemble-and-nest-support)
  19
  20 ### 1. Converting from legacy ascii diag_table format
  21
  22 To convert the legacy ascii diad_table format to this yaml format, the python script [**diag_table_to_yaml.py**](https://github.com/NOAA-GFDL/fms_yaml_tools/blob/aafc3293d45df2fc173d3c7afd8b8b0adc18fde4/fms_yaml_tools/diag_table/diag_table_to_yaml.py#L23-L26) can be used. To confirm that your diag_table.yaml was created correctly, the python script [**is_valid_diag_table_yaml.py**](https://github.com/NOAA-GFDL/fms_yaml_tools/blob/aafc3293d45df2fc173d3c7afd8b8b0adc18fde4/fms_yaml_tools/diag_table/is_valid_diag_table_yaml.py#L24-L27) can be used.
  23
  24 ### 2. Diag table yaml sections
  25 The diag_table.yaml is organized by file.  Each file has the required and optional key/value pairs for the file, an optional subsection defining any additional global metadata to add to the file, an optional subsection defining a subregion of the grid to output the data for and a required subsection for all of the variables in the file. Each variable has the required and optional key/value pairs for the variable and an optional subsection defining any additional variable attributes to add to the file. The hierarchical structure looks like this:
  26
  27 ```yaml
  28 title:
  29 base_date:
  30 diag_files:
  31 - file1
  32   - #key/value pairs for file1
  33   varlist:
  34   - var1
  35     - #key/value pairs for var1
  36     attributes:
  37     - #atributes for var1
  38   global_metadata:
  39   - #global attributes for file1
  40   subregion:
  41   - #subregion for file1
  42 ```
  43
  44 ### 2.1 Global Section
  45 The diag_yaml requires “title” and the “baseDate”.
  46 - The **title** is a string that labels the diag yaml.  The equivalent in the legacy diag_table would be the experiment.  It is recommended that each diag_yaml have a separate title label that is descriptive of the experiment that is using it.
  47 - The **basedate** is an array of 6 integers indicating the base_date in the format [year month day hour minute second].
  48
  49 **Example:**
  50
  51 In the YAML format:
  52 ```yaml
  53 title: ESM4_piControl
  54 base_date: 2022 5 26 12 3 1
  55 ```
  56
  57 In the legacy ascii format:
  58 ```
  59 ESM4_piControl
  60 2022 5 26 12 3 1
  61 ```
  62
  63 ### 2.2 File Section
  64 The files are listed under the diagFiles section as a dashed array.
  65
  66 Below are the **required** keys needed to define each file.
  67 - **file_name** is a string that defines the name of the file. Do not add ".nc" and "tileX" to the filename as this will be handled by FMS.
  68 - **freq** defines the frequency and the units that data will be written
  69   - The acceptable values for freq are:
  70     - =-1: output at the end of the run only
  71     - =0: output every timestep
  72     - \>0 units: output frequency and units (with a space between the frequency number and units e.g 24 hours)
  73   - Values of -1 or 0 do not require units.
  74   - The acceptable values for units are seconds, minutes, hours, days, months, years.
  75 - **time_units** is a string that defines units for time. The acceptable values are seconds, minutes, hours, days, months, years.
  76 - **unlimdim** is a string that defines the name of the unlimited dimension in the output netcdf file, usually “time”.
  77 - **varlist** is a subsection that list all of the variable in the file
  78
  79 **Example:** The following creates a file with data written every 6 hours.
  80
  81 In the YAML format:
  82 ```yaml
  83 diag_files:
  84 - file_name: atmos_6hours
  85   freq: 6 hours
  86   time_units: hours
  87   unlimdim: time
  88   varlist:
  89   - varinfo
  90 ```
  91
  92 In the legacy ascii format:
  93 ```
  94 "atmos_6hours",      6,  "hours", 1, "hours", "time"
  95 ```
  96
  97 **NOTE:** The fourth column (file_format) has been deprecated. Netcdf files will always be written.
  98
  99 Below are some *optional* keys that may be added.
 100 - **write_file** is a logical that indicates if you want the file to be created (default is true). This is a new feature that is not supported by the legacy ascii data_table.
 101 - **new_file_freq** is a string that defines the frequency and the frequency units (with a space between the frequency number and units) for closing the existing file
 102 - **start_time** is an array of 6 integer indicating when to start the file for the first time. It is in the format [year month day hour minute second]. Requires “new_file_freq”
 103 - **filename_time** is the time used to set the name of new files when using new_file_freq. The acceptable values are begin (which will use the begining of the file's time bounds), middle (which will use the middle of the file's time bounds), and end (which will use the end of the file's time bounds). The default is middle
 104 - **reduction** is the reduction method that will be used for all the variables in the file. This is overriden if the reduction is specified at the variable level. The acceptable values are average, diurnalXX (where XX is the number of diurnal samples), powXX (whre XX is the power level), min, max, none, rms, and sum.
 105 - **kind** is a string that defines the type of variable  as it will be written out in the file. This is overriden if the kind is specified at the variable level. Acceptable values are r4, r8, i4, and i8.
 106 - **module** is a string that defines the module where the variable is registered in the model code. This is overriden if the module is specified at the variable level.
 107
 108 **Example:** The following will create a new file every 6 hours starting at Jan 1 2020. Variable data will be written to the file every 6 hours.
 109
 110 In the YAML format:
 111 ```yaml
 112 - file_name: ocn%4yr%2mo%2dy%2hr
 113   freq: 6 hours
 114   freq_units: hours
 115   unlimdim: time
 116   new_file_freq: 6 hours
 117   start_time: 2020 1 1 0 0 0
 118 ```
 119
 120 In the legacy ascii format:
 121 ```
 122 "ocn%4yr%2mo%2dy%2hr",      6,  "hours", 1, "hours", "time", 6, "hours", "1901 1 1 0 0 0"
 123 ```
 124
 125 Because this is using the default `filename_time` (middle), this example will create the files:
 126 ```
 127 ocn_2020_01_01_03.nc for time_bnds [0,6]
 128 ocn_2020_01_01_09.nc for time_bnds [6,12]
 129 ocn_2020_01_01_15.nc for time_bnds [12,18]
 130 ocn_2020_01_01_21.nc for time_bnds [18,24]
 131 ```
 132
 133 **NOTE** If using the new_file_freq, there must be a way to distinguish each file, as it was done in the example above.
 134
 135 - **file_duration** is a string that defines how long the file should receive data after start time in “file_duration_units”.  This optional field can only be used if the start_time field is present.  If this field is absent, then the file duration will be equal to the frequency for creating new files.
 136 - **global_meta** is a subsection that lists any additional global metadata to add to the file. This is a new feature that is not supported by the legacy ascii data_table.
 137 - **sub_region** is a subsection that defines the four corners of a subregional section to capture.
 138
 139 ### 2.2.1 Flexible output timings
 140
 141 In order to provide more flexibility in output timings, the diag_table yaml format allows for different file frequencies for the same file by allowing the `freq`, `new_file_freq`, and  `file_duration` keys to accept a comma seperated list.
 142
 143 For example,
 144 ``` yaml
 145 - file_name: flexible_timing%4yr%2mo%2dy%2hr
 146   freq: 1 hours, 1 hours, 1 hours
 147   time_units: hours
 148   unlimdim: time
 149   new_file_freq: 6 hours, 3 hours, 1 hours
 150   start_time: 2 1 1 0 0 0
 151   file_duration: 12 hours, 3 hours, 9 hours
 152   filename_time: begin
 153   varlist:
 154   - module: ocn_mod
 155     var_name: var1
 156     reduction: average
 157     kind: r4
 158 ```
 159 This will create a file every 6 hours for 12 hours
 160 ```
 161 flexible_timing_0002_01_01_00.nc - using hourly averaged data from hour 0 to hour 6
 162 flexible_timing_0002_01_01_06.nc - using hourly averaged data from hour 6 to hour 12
 163 ```
 164
 165 Then it will create a file every 3 hours for 3 hours
 166 ```
 167 flexible_timing_0002_01_01_12.nc - using hourly averaged data from hour 12 to hour 15
 168 ```
 169
 170 Then it will create a file every 1 hour for 9 hours.
 171 ```
 172 flexible_timing_0002_01_01_15.nc - using data from hour 15 to hour 16
 173 flexible_timing_0002_01_01_16.nc - using data from hour 16 to hour 17
 174 flexible_timing_0002_01_01_17.nc - using data from hour 17 to hour 18
 175 flexible_timing_0002_01_01_18.nc - using data from hour 18 to hour 19
 176 flexible_timing_0002_01_01_19.nc - using data from hour 19 to hour 20
 177 flexible_timing_0002_01_01_20.nc - using data from hour 20 to hour 21
 178 flexible_timing_0002_01_01_21.nc - using data from hour 21 to hour 22
 179 flexible_timing_0002_01_01_22.nc - using data from hour 22 to hour 23
 180 flexible_timing_0002_01_01_23.nc - using data from hour 23 to hour 24
 181
 182 ```
 183
 184 ### 2.2.2 Coupled Model Diag Files
 185 In the *legacy ascii diag_table*, when running a coupled model (ATM + OCN) in a seperate PE list:
 186   - The ATM PEs ignored the files in the diag_table that contain "OCEAN" in the filename
 187   - The OCN PEs ignored the files in the diag_table that did not contain "OCEAN" in the filename
 188
 189 In the *yaml diag_table*:
 190   - The ATM PEs will ignore the files in the diag_table.yaml that contain the key/value pair `is_ocean: true`
 191   - The OCN PEs will ignore the files in the diag_table.yaml that do not contain the key/value pair `is_ocean: true`
 192
 193 ### 2.3 Variable Section
 194 The variables in each file are listed under the varlist section as a dashed array.
 195
 196 - **var_name:**  is a string that defines the variable name as it is defined in the register_diag_field call in the model
 197 - **reduction:** is a string that describes the data reduction method to perform prior to writing data to disk. Acceptable values are average, diurnalXX (where XX is the number of diurnal samples), powXX (whre XX is the power level), min, max, none, rms, and sum.
 198 - **module:**  is a string that defines the module where the variable is registered in the model code
 199 - **kind:** is a string that defines the type of variable  as it will be written out in the file. Acceptable values are r4, r8, i4, and i8
 200
 201 **Example:**
 202
 203 In the YAML format:
 204 ```yaml
 205   varlist:
 206   - module: moist
 207     var_name: precip
 208     reduction: average
 209     kind: r4
 210 ```
 211
 212 In the legacy ascii format:
 213 ```
 214 "moist",     "precip",                         "precip",           "atmos_8xdaily",   "all", .true.,  "none", 2
 215 ```
 216 **NOTE:** The fifth column (time_sampling) has been deprecated. The reduction_method (`.true.`) has been replaced with `average`. The output name was not included in the yaml because it is the same as the var_name.
 217
 218 which corresponds to the following model code
 219 ```F90
 220 id_precip = register_diag_field ( 'moist', 'precip', axes, Time)
 221 ```
 222 where:
 223 - `moist` corresonds to the module key in the diag_table.yaml
 224 - `precip` corresponds to the var_name key in the diag_table.yaml
 225 - `axes` are the ids of the axes the variable is a function of
 226 - `Time` is the model time
 227
 228 Below are some *optional* keys that may be added.
 229 - **write_var:** is a logical that is set to false if the user doesn’t want the variable to be written to the file (default: true).
 230 - **out_name:** is a string that defines the name of the variable that will be written to the file (default same as var_name)
 231 - **long_name:** is a string defining the long_name attribute of the variable. It overwrites the long_name in the variable's register_diag_field call
 232 - **attributes:** is a subsection with any additional metadata to add to the variable in the netcdf file. This is a new feature that is not supported by the legacy ascii data_table.
 233 - **zbounds:** is a 2 member array of integers that define the bounds of the z axis (zmin, zmin), optional default is no limits.
 234
 235 ### 2.4 Variable Metadata Section
 236 Any aditional variable attributes can be added for each variable can be listed under the attributes section as a dashed array. The key is attribute name and the value is the attribute value.
 237
 238 **Example:**
 239
 240 ```yaml
 241     attributes:
 242     - attribute_name: attribute_value
 243       attribute_name: attribute_value
 244 ```
 245
 246 Although this was not supported by the legacy ascii data_table, with the legacy diag_manager, a call to `diag_field_add_attribute` could have been used to do the same thing.
 247
 248 ```F90
 249 call diag_field_add_attribute(diag_field_id, attribute_name, attribute_value)
 250 ```
 251
 252 ### 2.5 Global Meta Data Section
 253 Any aditional global attributes can be added for each file can be listed under the global_meta section as a dashed array.  The key is the attribute name and the value is the attribute value.
 254
 255 ```yaml
 256   global_meta:
 257   - attribute_name: attribute_value
 258     attribute_name: attribute_value
 259 ```
 260
 261 ### 2.6 Sub_region Section
 262 The sub region can be listed under the sub_region section as a dashed array. The legacy ascii diag_table only allows regions to be defined using the latitude and longitude, and it only allowed rectangular sub regions. With the yaml diag_table, you can use indices to defined the sub_region and you can define **any** four corner shape. Each file can only have 1 sub_region defined. These are keys that can be used:
 263 - **grid_type:** is a **required** string defining the method used to define the  fourth sub_region corners. The acceptable values are "latlon" if using latitude/longitude or "indices" if using the indices of the corners.
 264 - **corner1:** is a **required** 2 member array of reals if using (grid_type="latlon") or integers if using (grid_type="indices") defining the x and y points of the first corner of a sub_grid.
 265 - **corner2:** is a **required** 2 member array of reals if using (grid_type="latlon") or integers if using (grid_type="indices") defining the x and y points of the second corner of a sub_grid.
 266 - **corner3:** is a **required** 2 member array of reals if using (grid_type="latlon") or integers if using (grid_type="indices") defining the x and y points of the third corner of a sub_grid.
 267 - **corner4:** is a **required** 2 member array of reals if using (grid_type="latlon") or integers if using (grid_type="indices") defining the x and y points of the fourth corner of a sub_grid.
 268 - **tile:** is an integer defining the tile number the sub_grid is on. It is **required** only if using (grid_type="indices").
 269
 270 **Exampe:**
 271
 272 ```yaml
 273   sub_region:
 274   - grid_type: latlon
 275     corner1: -80, 0
 276     corner2: -80, 75
 277     corner3: -60, 0
 278     corner4: -60, 75
 279 ```
 280
 281 ### 3. More examples
 282 Bellow is a complete example of diag_table.yaml:
 283 ```yaml
 284 title: test_diag_manager
 285 base_date: 2 1 1 0 0 0
 286 diag_files:
 287 - file_name: wild_card_name%4yr%2mo%2dy%2hr
 288   freq: 6 hours
 289   time_units: hours
 290   unlimdim: time
 291   new_file_freq: 6 hours
 292   start_time: 2 1 1 0 0 0
 293   file_duration: 12 hours
 294   varlist:
 295   - module: test_diag_manager_mod
 296     var_name: sst
 297     reduction: average
 298     kind: r4
 299   global_meta:
 300   - is_a_file: true
 301 - file_name: normal
 302   freq: 24 days
 303   time_units: hours
 304   unlimdim: records
 305   varlist:
 306   - module: test_diag_manager_mod
 307     var_name: sst
 308     reduction: average
 309     kind: r4
 310     write_var: true
 311     attributes:
 312     - do_sst: .true.
 313   sub_region:
 314   - grid_type: latlon
 315     corner1: -80, 0
 316     corner2: -80, 75
 317     corner3: -60, 0
 318     corner4: -60, 75
 319 - file_name: normal2
 320   freq: -1 days
 321   time_units: hours
 322   unlimdim: records
 323   write_file: true
 324   varlist:
 325   - module: test_diag_manager_mod
 326     var_name: sstt
 327     reduction: average
 328     kind: r4
 329     long_name: S S T
 330   - module: test_diag_manager_mod
 331     var_name: sstt2
 332     reduction: average
 333     kind: r4
 334     write_var: false
 335   sub_region:
 336   - grid_type: index
 337     tile: 1
 338     corner1: 10, 15
 339     corner2: 20, 15
 340     corner3: 10, 25
 341     corner4: 20, 25
 342 - file_name: normal3
 343   freq: -1 days
 344   time_units: hours
 345   unlimdim: records
 346   write_file: false
 347 ```
 348
 349 ### 4. Schema
 350 A formal specification of the file format, in the form of a JSON schema, can be
 351 found in the [gfdl_msd_schemas](https://github.com/NOAA-GFDL/gfdl_msd_schemas)
 352 repository on Github.
 353
 354 ### 5. Ensemble and Nest Support
 355 When using nests, it may be desired for a nest to have a different file frequency or number of variables from the parent grid. This may allow users to save disk space and reduce simulations time. In order to supports, FMS allows each nest to have a different diag_table.yaml from the parent grid. For example, if running with 1 test FMS will use diag_table.yaml for the parent grid and diag_table.nest_01.yaml for the first nest Similary, each ensemble member can have its own diag_table (diag_table_ens_XX.yaml, where XX is the ensemble number). However, for the ensemble case if both the diag_table.yaml and the diag_table_ens_* files are present, the code will crash as only 1 option is allowed.