Atmospheric Radiation Measurement user facility (ARM)TRacking Aerosol Convection interations ExpeRiment (TRACER)Notebook for learning the basics of ACT with TRACER data Corresponding Author: Adam Theisen (atheisen@anl.gov) |
Overview#
The ARM TRACER campaign collected a lot of very interesting data in Houston, TX from October 1, 2021 to September 30, 2022. One event that stands out is a dust event that occurred from July 16 to July 19, 2022. This notebook will give an introduction to basic features in ACT, using one of the datastreams from this event
Intro to ACT
Instrument Overview
Downloading and Reading in PSAP Data
Quality Controlling Data
Visualizing Data
Questions for the User to Explore
Prerequisites#
This notebook will rely heavily on Python and the Atmospheric data Community Toolkit (ACT). Don’t worry if you don’t have experience with either, this notebook will walk you though what you need to know.
You will also need an account and token to download data using the ARM Live webservice. Navigate to the webservice information page and log in to get your token. Your account username will be your ARM username.
Concepts |
Importance |
Notes |
---|---|---|
Helpful |
Time to learn: 60 Minutes
System requirements:
Python 3.11 or latest
ACT v2.0.0 or latest
numpy
xarray
matplotlib
Intro to ACT#
The Atmospheric data Community Toolkit (ACT) is an open-source Python toolkit for exploring and analyzing atmospheric time-series datasets. Examples can be found in the ACT Example Gallery. The toolkit has modules for many different parts of the scientific process, including:
Data Discovery (act.discovery)#The discovery module houses functions to download or access data from different groups. Currently it includes function to get data for ARM, NOAA, EPA, NEON, and more! Input/Output (act.io)#io contains functions for reading and writing data from various sources and formats. Visualization (act.plotting)#plotting contains various routines, built on matplotlib, to help visualize and explore data. These include
Corrections (act.corrections)#corrections apply different corrections to data based on need. A majority of the existing corrections are for lidar data. Quality Control (act.qc)#The qc module has a lot of functions for working with quality control information, apply new tests, or filtering data based on existing tests. We will explore some of that functionality in this notebook. Retrievals (act.retrievals)#There are many cases in which some additional calculations are necessary to get more value from the instrument data. The retrievals module houses some functions for performing these advanced calculations. Utilities (act.utils)#The utils module has a lot of general utilities to help with the data. Some of these include adding in a solar variable to indicate day/night (useful in filtering data), unit conversions, decoding WMO weather codes, performing weighted averaging, etc… |
|
Instrument Overview#
Particle Soot Absorption Photometer (PSAP)#The particle soot absorption photometer collects aerosol particles on a substrate (filter) and measurements the change in light transmission relative to a reference filter. Bulk particle absorption is derived after correcting for scattering effects. Lean more Single Particle Soot Photometer (SP2)#The single-particle soot photometer (SP2) measures the soot (black carbon) mass of individual aerosol particles by laser-induced incandescence down to concentrations as low as ng/m^3. Learn more Aerodynamic Particle Sizer (APS)#The aerodynamic particle sizer (APS) is a particle size spectrometer that measures both the particle aerodynamic diameter based on particle time of flight and optical diameter based on scattered light intensity. The APS provides the number size distribution for particles with aerodynamic diameters from 0.5 to 20 micrometers and with optical diameters from 0.3 to 20 micrometers. Learn more Doppler Lidar (DL)#The Doppler lidar (DL) is an active remote-sensing instrument that provides range- and time-resolved measurements of the line-of-sight component of air velocity (i.e., radial velocity) and attenuated aerosol backscatter. The DL operates in the near-infrared and is sensitive to backscatter from atmospheric aerosol, which are assumed to be ideal tracers of atmospheric wind fields. Learn more Aerosol Chemical Speciation Monitor (ACSM)#The aerosol chemical speciation monitor is a thermal vaporization, electron impact, ionization mass spectrometer that measures bulk chemical composition of the rapidly evaporating component of sub-micron aerosol particles in real time. Standard measurements include mass concentrations of organics, sulfate, nitrate, ammonium, and chloride. Learn more Scanning mobility particle sizer (SMPS)#The scanning mobility particle sizer (SMPS) is a particle size spectrometer that measures the aerosol number size distribution by sizing particles based on their electrical mobility diameter using a differential mobility analyzer (DMA) and by counting particles using a condensation particle counter (CPC). It measures aerosol concentration and aerosol particle size distribution. Learn more Surface Meteorological Instrumentation (MET)#The ARM Surface Meteorology Systems (MET) use mainly conventional in situ sensors to obtain 1-minute statistics of surface wind speed, wind direction, air temperature, relative humidity, barometric pressure, and rain-rate. Learn more Micropulse Lidar (MPL)#The micropulse lidar (MPL) is a ground-based, optical, remote-sensing system designed primarily to determine the altitude of clouds; however, it is also used for detection of atmospheric aerosols. Learn more |
|
Imports#
Let’s get started with some data! But first, we need to import some libraries.
import act
import numpy as np
import matplotlib.pyplot as plt
Downloading and Reading ARM’s NetCDF Data#
ARM’s standard file format is NetCDF (network Common Data Form) which makes it very easy to work with in Python! ARM data are available through a data portal called Data Discovery or through a webservice. If you didn’t get your username and token earlier, please go back and see the Prerequisites!
Let’s download some of the PSAP data first but let’s just start with one day.
# Set your username and token here!
username = 'YourUserName'
token = 'YourToken'
# Set the datastream and start/enddates
datastream = 'houaospsap3w1mM1.b1'
startdate = '2022-07-16'
enddate = '2022-07-16'
# Use ACT to easily download the data. Watch for the data citation! Show some support
# for ARM's instrument experts and cite their data if you use it in a publication
result = act.discovery.download_arm_data(username, token, datastream, startdate, enddate)
# Let's read in the data using ACT and check out the data
ds = act.io.read_arm_netcdf(result)
ds
# We're going to be focusing on the following variable, so let's get some more information about it
# We can do this by looking at it's attributes
variable = 'Ba_B_Weiss'
ds[variable].attrs
# There's a lot of great functionality in ACT, but there's also a lot in the base xarray Dataset!
ds[variable].plot()
Quality Controlling Data#
ARM has multiple methods that it uses to communicate data quality information out to the users. One of these methods is through “embedded QC” variables. These are variables within the file that have information on automated tests that have been applied. Many times, they include Min, Max, and Delta tests but as is the case with the AOS instruments, there can be more complicated tests that are applied.
The results from all these different tests are stored in a single variable using bit-packed QC. We won’t get into the full details here, but it’s a way to communicate the results of multiple tests in a single integer value by utilizing binary and bits! You can learn more about bit-packed QC here but ACT also has many of the tools for working with ARM QC.
Other Sources of Quality Control#
ARM also communicates problems with the data quality through Data Quality Reports (DQR). These reports are normally submitted by the instrument mentor when there’s been a problem with the instrument. The categories include:
Data Quality Report Categories
Missing: Data are not available or set to -9999
Suspect: The data are not fully incorrect but there are problems that increases the uncertainty of the values. Data should be used with caution.
Bad: The data are incorrect and should not be used.
Note: Data notes are a way to communicate information that would be useful to the end user but does not rise to the level of suspect or bad data
Additionally, data quality information can be found in the Instrument Handbooks, which are included on most instrument pages. Here is an example of the PSAP handbook.
# We can see that there's some missing data in the plot above so let's take a look at the embedded QC!
# First, for many of the ACT QC features, we need to get the dataset more to CF standard and that
# involves cleaning up some of the attributes and ways that ARM has historically handled QC
ds.clean.cleanup()
# Next, let's take a look at visualizing the quality control information
# Create a plotting display object with 2 plots
display = act.plotting.TimeSeriesDisplay(ds, figsize=(15, 10), subplot_shape=(2,))
# Plot up the variable in the first plot
display.plot(variable, subplot_index=(0,))
# Plot up a day/night background
display.day_night_background(subplot_index=(0,))
# Plot up the QC variable in the second plot
display.qc_flag_block_plot(variable, subplot_index=(1,))
plt.show()
What do you observe?#
There are 5 tests being applied to the data. The main ones that are flagged are tests 1, 4, and 5. Tests 4 and 1 are tripped at the same time and if we look at the description for 4, when that test fails, it sets the data to missing_value for us. That leaves test 5 which is suspect data so let’s try and filter that data out as well to see what it looks like
# Let's filter out test 5 using ACT. Yes, it's that simple!
ds.qcfilter.datafilter(variable, rm_tests=[5], del_qc_var=False)
# There are other ways we can filter data out as well. Using the
# rm_assessments will filter out by all Bad/Suspect tests that are failing
# ds.qcfilter.datafilter(variable, rm_assessments=['Bad', 'Suspect'], del_qc_var=False)
# Let's check out the attributes of the variable
# Whenever data are filtered out using the datafilter function
# a comment will be added to the variable history for provenance purposes
print(ds[variable].attrs['history'])
# And plot it all again!
# Create a plotting display object with 2 plots
display = act.plotting.TimeSeriesDisplay(ds, figsize=(15, 10), subplot_shape=(2,))
# Plot up the variable in the first plot
display.plot(variable, subplot_index=(0,))
# Plot up a day/night background
display.day_night_background(subplot_index=(0,))
# Plot up the QC variable in the second plot
display.qc_flag_block_plot(variable, subplot_index=(1,))
plt.show()
ARM Data Quality Reports (DQR)!#
ARM’s DQRs can be easily pulled in and added to the QC variables using ACT. We can do that with the below one line command. However, for this case, there won’t be any DQRs on the data but let’s visualize it just in case!
# Query the ARM DQR Webservice
ds = act.qc.add_dqr_to_qc(ds, variable=variable)
#And plot again!
# Create a plotting display object with 2 plots
display = act.plotting.TimeSeriesDisplay(ds, figsize=(15, 10), subplot_shape=(2,))
# Plot up the variable in the first plot
display.plot(variable, subplot_index=(0,))
# Plot up a day/night background
display.day_night_background(subplot_index=(0,))
# Plot up the QC variable in the second plot
display.qc_flag_block_plot(variable, subplot_index=(1,))
plt.show()
Visualizing Data#
We’ve already worked with visualizing the data in basic ways but what other options are there in ACT? This section will show you how to create a variety of different plots. More plotting examples can be found in ACT’s Documentation.
Distribution Display#
For the first example, we will go over some functions within ACT’s distribution display. Functions such as, the stacked bar plot, scatter and groupby.
# First, let's plot up a histogram of the data
# All the ACT plotting is very similar to what we
# did earlier, first we create a display object
display = act.plotting.DistributionDisplay(ds)
# And then we can plot the data! Note that we are passing a range into the
# histogram function to set the min/max range of the data
display.plot_stacked_bar(variable, hist_kwargs={'range': [0, 10]})
plt.show()
# We can create these plots in groups as well but we need to know
# how many there will be ahead of time for the shape
display = act.plotting.DistributionDisplay(ds, figsize=(12, 15), subplot_shape=(6, 4))
groupby = display.group_by('hour')
# And then we can plot the data in groups! The main issue is that it doesn't automatically
# Annotate the group on the plot. We're also setting the titile to blank to save space
groupby.plot_group('plot_stacked_bar', None, field=variable, set_title='', hist_kwargs={'range': [0, 10]})
# We want these graphs to have the same axes, so we can easily run through
# each plot and modify the axes. Right now, we can just hard code these in
for i in range(len(display.axes)):
for j in range(len(display.axes[i])):
display.axes[i, j].set_xlim([0, 10])
display.axes[i, j].set_ylim([0, 50])
plt.show()
# Next up, let's do some scatter plots to compare some variables
# Scatter plots are also found in the DistributionDisplay module
display = act.plotting.DistributionDisplay(ds)
# And then we can plot the data!
display.plot_scatter(variable, 'transmittance_blue', m_field='time')
# You can adjust the x-range as you need
# display.set_xrng([0, 20])
plt.show()
# Sometimes these scatter plots hide the number of points there actually
# are in some areas so let's try a heatmap as well
display = act.plotting.DistributionDisplay(ds, figsize=(12, 5), subplot_shape=(1, 2))
# And then we can plot the data!
display.plot_scatter(variable, 'transmittance_blue', m_field='time', subplot_index=(0, 0))
# This can be used to adjust the axes limits
# display.set_xrng([0, 20], subplot_index=(0, 0))
# we can also pass in an array of values for the bins using np.arange(start, stop, step)
display.plot_heatmap(variable, 'transmittance_blue', subplot_index=(0, 1), x_bins=25, y_bins=25)
plt.show()
# Let's try one last plot type with this dataset
# Violin plots!
display = act.plotting.DistributionDisplay(ds)
# And then we can plot the data!
display.plot_violin(variable, positions=[1.0])
# And we can add more variables to it as well!
display.plot_violin('Ba_R_Weiss', positions=[2.0])
display.plot_violin('Ba_G_Weiss', positions=[3.0])
# Let's add some more information to the plots
# Update the tick information
display.axes[0].set_xticks([0.5, 1, 2, 3, 3.5])
display.axes[0].set_xticklabels(['',
'Blue Channel\nAbsoprtion',
'Red Channel\nAbsorption',
'Green Channel\nAbsorption',
'']
)
# Update the y-axis label
display.axes[0].set_ylabel('Aerosol Light Absorption Coefficient')
plt.show()
Pie Chart Display#
We can also do a quick visualization of aerosol data using the new pie chart display in ACT!
# Download the data as before
# Read an ARM AOS dataset
datastream = 'houaosacsmM1.b2'
startdate = '2022-07-16'
enddate = '2022-07-16'
result = act.discovery.download_arm_data(username, token, datastream, startdate, enddate)
# and read it in
ds_aos = act.io.read_arm_netcdf(result)
# Let us print out the fields in the dataset and see what it contains.
print(ds_aos.data_vars.keys())
# Knowing what fields the dataset contains, let's create a list of fields
# to use in the plot.
fields = ['sulfate', 'ammonium', 'nitrate', 'chloride']
# We also want to provide some keyword arguments to avoid invalid data such
# as negative values.
threshold = 0.0
fill_value = 0.0
# Create a DistributionDisplay object to compare fields
display = act.plotting.DistributionDisplay(ds_aos)
# We can set one of the slices to explode and give it a nice shadow.
explode = (0, 0.1, 0, 0)
shadow = True
# Create a pie chart using the fields list. The percentages of the
# fields will be calculated using act.utils.calculate_percentages.
display.plot_pie_chart(
fields,
threshold=threshold,
fill_value=fill_value,
explode=explode,
shadow=True,
)
plt.show()
Questions for the User to Explore#
What does the data look like for the full month of July? Do we have to do more to properly visualize the data?
What do the scatter plots look like when plotted against the red channel absorption coefficient vs the transmittance?
Can you change the groupby plot to be for each day instead of by hour?
Next Steps#
The next notebook in this series will help the users explore other datasets that can be utilized to analyze this dust event. These include using data from the instruments previously noted here.
Data Used in this Notebook#
Ermold, B., & Flynn, C. Particle Soot Absorption Photometer (AOSPSAP3W1M). Atmospheric Radiation Measurement (ARM) User Facility. https://doi.org/10.5439/1225037