Harnessing instrument simulators for model evaluation

In this 45 minute lecture, we will cover the basics of how to evaluate model simulations using instrument simulators. This course covers:

The basic principles of instrument simulators for models
Examining E3SM single column model data using xarray
Using the Earth Model Column Collaboratory to create simulated radar moments for evaluation
Loading ARM KAZR data at COMBLE for comparison against model simulated radar moments

Instrument simulators

Models do not typically output parameters that are directly comparable to atmospheric measurements. This complicates the evaluation of model performance using atmospheric observations, which are typically the only metric available for assessing the performance of such simulations. In order to assess how well a model simulation represents the target atmospheric phenomena – whether it be convection, cloud cover, or aerosols, instrument simulators provide a key bridge between the measurements and model simulations. They do this by simulating the instrument output from within the model simulation.

How do instrument simulators work?

This section, in particular, will focus on how instrument simulators for instruments designed to detect clouds and precipitation such as radars and lidars.

Bulk cloud microphysics parameterizationst will typically predict either one or two moments of the hydrometeor size distribution $N(D)$. The “classic” Morrison-Gettleman scheme will predict these two moments (usually number concentration $N$ and mass mixing ratio $Q$) for four hydrometeor categories: cloud liquid (cl), cloud ice (ci), precipitating liquid (pl), and precipitating ice (pi) [1]. However, $N$ and $Q$ are not measured by radars and lidars. Instead, radar and lidars first emit a beam of microwave (radar) or near-infrared (lidar) radiation. They then measure the in-phase and quadrature signals of the signal recieved by the transciever that recieves the backscattered signal from the hydrometers in the sample volume. These signals are then converted into Doppler spectra, then subsequently converted into momemts such as radar reflectivity factor $Z_{e}$ and radial velocity $v_{d}$.

The method by which we then have to create an apples-to-apples comparison of the model simulated cloud properties against those observed by the radar is to simulate the radar inside the model simulation. This is done by taking in the assumptions made by the model’s radiative transfer scheme with regards to forward scattering. For example, to calculate the reflectivity using Equation (1), we need to know the backscattering cross section of the hydrometeor $\beta_{hyd}$, the wavelength of the radar $\lambda$ and the dielectric constant of the scattering medium (ice or water) $K_{w}$. Then, once we know these, which are all inputs either assumed by the model or are a property of the instrument we are simulating, can calculate $Z_{e}$ as $$Z_{e, hyd} = \frac{\beta_{hyd} \lambda^{4}}{\pi^{5}K_{w}^2}$$

In this section of the course, we will utilize the Earth Column Model Collaboratory (EMC^2) to simulate the Ka-band Scanning Zenith Radar (KAZR) inside a DHARMA LES simulation. EMC^2 is an open source framework that will provide simulated radar moments from single column model (SCM) simulations, utilizing a sub-column generator to generate a distribution of realizations of the SCM column according to the model cloud fraction. This takes into account the potential uncertainty due to a vertically pointing radar only capturing a fraction of the SCM column. The simulated radar or lidar moments from the subcolumns can then be compared against the observed moments from the instrument. For more information on both the calculation of the simulated moments and sub-column generation, see [2].

In addition, EMC^2 has an LES mode where the model subcolumns are mapped directly to the x and y coordinates of the LES simulation. For this tutorial, we will focus on the LES simulations from DHARMA for a case observed during COMBLE.

References

Morrison, H., J. A. Milbrandt, G. Bryan, K. Ikeda, S. A. Tessendorf, and G. Thompson, 2015: Parameterization of cloud microphysics based on the prediction of bulk ice particle properties. Part II: Case study comparisons with observations and other schemes. J. Atmos. Sci., 72, 312–339, https://doi.org/10.1175/JAS-D-14-0066.1.
Silber, I., Jackson, R. C., Fridlind, A. M., Ackerman, A. S., Collis, S., Verlinde, J., and Ding, J.: The Earth Model Column Collaboratory (EMC2) v1.1: an open-source ground-based lidar and radar instrument simulator and subcolumn generator for large-scale models, Geosci. Model Dev., 15, 901–927, https://doi.org/10.5194/gmd-15-901-2022, 2022.

# For OLCF JupyterHub only
import os
os.environ["HDF5_USE_FILE_LOCKING"] = "FALSE"
import xarray as xr
import emc2
import numpy as np
import matplotlib.pyplot as plt
import radtraq

from dask.distributed import LocalCluster, Client
%matplotlib inline

Subsetting the data

Running EMC^2 over the entire simulation domain at once is computationally expensive. Therefore, we will subset the domain to include an along-wind track along the region of interest. In this case, we will select the region of interest to be where the highest precipitation ice mixing ratios are near the surface, indicating the heaviest precipitation bands in the simulation domain.

First, let us obtain the mean wind direction below 1 km.

u_mean = ds.u_wind.where(ds.z < 1000).mean()
v_mean = ds.v_wind.where(ds.z < 1000).mean()
v_mean

Then, we create a line whose slope is along the mean wind direction and goes through a point ($x_{0}$, $y_{0}$). We select the ($x_{0}$, $y_{0}$) = (-4 km, -3 km), since at this point in the domain the ice precipitation mixing ratio near the surface is the highest.

slope = v_mean/u_mean
x0 = -4400
y0 = -3000
y_line = slope * (ds.x - x0) + y0 
fig, ax = plt.subplots(1, 1, figsize=(4, 4))
ds["qpi"].T.isel(hgt=2).plot(ax=ax)
ax.plot(ds.x, y_line)

Earth Model Column Collaboratory

The Earth Model Column Collaboratory (EMC^2) is an open source Python instrument simulator that implements a Model class that contains the scattering characteristics of the hydrometeor species assumed by the model as well as the list of model variables that are needed to generate the simulated instrument parameters. In addition, EMC^2 has an Instrument class that stores the necessary characteristics for each instrument. Currently, EMC^2 only supports vertically pointing radars and lidars.

These two classes are then inputs to a subcolumn generator that will, in the case of general circulation model grid cells, simulate the sub-grid variability inside the grid cell. This allows for the comparison to account for uncertainties due to model sub-grid cell variability, since the instrument’s beamwidth is going to be much smaller than a GCM grid cell. In the case of Large Eddy Scale simulations, EMC^2 will disable this subcolumn generator if the user specifies only one subcolumn. Finally, the generated moments in each subcolumn are used to generate the simulated radar and lidar moments.

https://gmd.copernicus.org/articles/15/901/2022/gmd-15-901-2022-f01-web.png

Model sub classes:

E3SM (EAM)
CESM 2 (CAM6)
ModelE 3
WRF
DHARMA

Instrument sub-classes:

Radars:

C SAPR
X SACR
Ka SACR
KAZR
WACR
BASTA
NEXRAD

Lidars:

Raman
Micropulse lidar (MPL)
HSRL
Ceilometer
1064 nm (NIR) lidars
CALIOP (CALIPSO)

Loading into EMC^2

First, we will use EMC^2’s Model structure to load the DHARMA data.

dharma = emc2.core.model.DHARMA('/data/project/ARM_Summer_School_2024_Data/emc2/emc2_prep.nc', single_pi_class=False)

emc2_prep.nc is a regional output dataset; Stacking the time, lat, and lon dims for processing with EMC^2.

Using the subsetting code above, we will modify the .ds attribute containing the xarray dataset with the model simulation data. We will store only the along-wind subset of the data inside this attribute.

We are using a subset for this tutorial to save on computational resources so that the moment generation can complete within the tutorial time limit. We encourage the students to include larger subsets of the domain, building off of this code, to generate their simulated moments to increase the sample size for comparison against observations.

This will help reduce the statistical uncertainty in the comparison.

u_mean = dharma.ds.u_wind.where(dharma.ds.hgt < 1000).mean()
v_mean = dharma.ds.v_wind.where(dharma.ds.hgt < 1000).mean()
slope = v_mean/u_mean
x0 = -4000
y0 = -3000
y_line = slope * (dharma.ds.x - x0) + y0 
dharma.ds["xsection"] = ('time_lat_lon', xr.where(np.isclose(y_line, dharma.ds.y, atol=50), 1, 0))
dharma.ds = dharma.ds.where(dharma.ds["xsection"] == 1).dropna(dim="time_lat_lon")
dharma.ds["distance"] = np.sqrt(
    (dharma.ds["y"] - dharma.ds["y"].values[0])**2 + (dharma.ds["x"] - dharma.ds["x"].values[0]))
dharma.ds = dharma.ds.set_coords(["distance", "hgt"])

As you can see, we have successfully stored the along-wind subset of the data.

Generating simulated KAZR moments is as easy as these two lines of code. There are two options that EMC^2 supports for generating the simulated moments.

* The radiation approach uses lookup tables from the model's radiative transfer schame to calculate the scattering cross sections for quick generation of moments. However, this approach does not generate velocity or spectral width, just reflectivity. T
* The approach we will use here, the "microphysics approach," explictly integrates the PSD calculated from the model parameters to generate the simulated moments. This takes much more computational resources, but more accurately represents the model microphysics.

KAZR = emc2.core.instruments.KAZR('nsa')
subcols = emc2.simulator.main.make_simulated_data(dharma, KAZR, 1, use_rad_logic=False)

## Creating subcolumns...
No convective processing for DHARMA
num_subcolumns == 1 (subcolumn generator turned off); setting subcolumns frac fields to 1 for startiform cl and ci based on q > 0. kg/kg
Done! total processing time = 0.00s
num_subcolumns == 1 (subcolumn generator turned off); setting subcolumns frac fields to 1 for strat precip based on q > 0. kg/kg
Done! total processing time = 0.00s
num_subcolumns == 1 (subcolumn generator turned off); setting subcolumns q (and N micro logic) fields for cl equal to grid-cell mean
num_subcolumns == 1 (subcolumn generator turned off); setting subcolumns q (and N micro logic) fields for ci equal to grid-cell mean
num_subcolumns == 1 (subcolumn generator turned off); setting subcolumns q (and N micro logic) fields for pl equal to grid-cell mean
num_subcolumns == 1 (subcolumn generator turned off); setting subcolumns q (and N micro logic) fields for pi equal to grid-cell mean
num_subcolumns == 1 (subcolumn generator turned off); setting subcolumns q (and N micro logic) fields for pir equal to grid-cell mean
Generating radar moments...
Generating stratiform radar variables using microphysics logic (slowest processing)
Calculating moments for hydrometeor cl
Doing parallel radar calculations for cl
Processing column 100/107
Processing column 50/107
Processing column 0/107
Calculating moments for hydrometeor ci
Doing parallel radar calculation for ci
Processing column 100/107
Processing column 0/107
Calculating moments for hydrometeor pl
Doing parallel radar calculation for pl
Processing column 100/107
Processing column 0/107
Calculating moments for hydrometeor pi
Doing parallel radar calculation for pi
Processing column 100/107
Processing column 0/107
Calculating moments for hydrometeor pir
Doing parallel radar calculation for pir
Processing column 100/107
Processing column 0/107
Now calculating total spectral width (this may take some time)
Stratiform moment for class cl progress: 50/159
Stratiform moment for class cl progress: 0/159
Stratiform moment for class cl progress: 100/159
Stratiform moment for class progress: 100/107
Stratiform moment for class progress: 50/107
Stratiform moment for class progress: 0/107
Stratiform moment for class progress: 50/107
Stratiform moment for class progress: 100/107
Stratiform moment for class progress: 0/107
Stratiform moment for class progress: 50/107
Stratiform moment for class progress: 100/107
Stratiform moment for class progress: 0/107
Stratiform moment for class progress: 50/107
Stratiform moment for class progress: 0/107
Stratiform moment for class progress: 100/107
Done! total processing time = 54.84s

Plot Contour Frequency by Altitude Diagram

A common diagram for comparing the statistics from model simulations against observations is a Contour Frequency by Altitude Diagram (CFAD) [1]. The CFAD shows how the normalized frequency of values changes with height. For each height level, the frequency histogram is normalized by the number of data points available at each level. Therefore, this takes into account having an uneven number of observations at each height level.

We use a tool called Radar Tracking of Quality (RadTraQ), ARM’s open source toolkit for Quality Controlling ARM scanning radar data. This toolkit will automatically generate the CFAD values and plots for you with the radtraq.plotting.calc_cfad function.

References:

Yuter, S. E., and R. A. Houze , 1995: Three-Dimensional Kinematic and Microphysical Evolution of Florida Cumulonimbus. Part II: Frequency Distributions of Vertical Velocity, Reflectivity, and Differential Reflectivity. Mon. Wea. Rev., 123, 1941–1963, https://doi.org/10.1175/1520-0493(1995)123<1941:TDKAME>2.0.CO;2.

cfad_array = radtraq.plotting.calc_cfad(subcols.ds.sel(subcolumn=0), "sub_col_Ze_tot_strat",
                                        height_variable='hgt', xbins=np.linspace(-30., 30., 40))
cfad_array.plot(cmap='Spectral_r', cbar_kwargs=dict(label="Normalized Frequency"))
plt.xlim([-30, 30])
plt.xlabel('Reflectivity [dBZ]')
plt.ylabel('Height [m]')
plt.ylim([0, 5000])

(0.0, 5000.0)

../../_images/d03c54ed58bd1d7d4147b432efa115ea7b558b19045ae150322fb174fb1fbb46.png

Full domain comparison – dask requirement

In order to enable parallel processing in EMC^2 for quicker calclation of the full domain, you will need to run this line to enable the dask multiprocessing on the cluster before starting the above code. This creates a local dask cluster on the machine for parallel computation with three workers. For this example, you may need more workers to process the full domain.

from dask.distributed import LocalCluster, Client
client = Client(LocalCluster(n_workers=3))

In addition, another tip to process the full domain would be to subset the domain into 4-16 rectangular portions then concatenatethe results. To process only one subset of the domain, use the xarray .sel operator to do so in this fashion. This code will process the bottom left portion of the domain. Expanding this code and concatenating the results is left as an exercise for the student.

ds = xr.open_dataset('/data/project/ARM_Summer_School_2024_Data/emc2/emc2_prep.nc')
dharma = emc2.core.model.DHARMA('/data/project/ARM_Summer_School_2024_Data/emc2/emc2_prep.nc', single_pi_class=False)
dharma.ds = ds.sel(x=slice(-12500., 0), y=slice(-12500., 0.)) 
dharma.ds = dharma.ds.stack(time_lat_lon=["x", "y"])
KAZR = emc2.core.instruments.KAZR('nsa')
subcols_dharma = emc2.simulator.main.make_simulated_data(dharma, KAZR, 1, use_rad_logic=False)

Harnessing instrument simulators for model evaluation

Contents

Harnessing instrument simulators for model evaluation

Instrument simulators

How do instrument simulators work?

Preview DHARMA simulation

Subsetting the data

Earth Model Column Collaboratory

Loading into EMC^2

EMC^2 outputs

Plot Contour Frequency by Altitude Diagram

Plot KAZR observations

Second example – WRF output

Full domain comparison – dask requirement