Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Dataset Construction

Authors
Affiliations
University of California Berkeley
British Antarctic Survey
University of California Berkeley
University of California Berkeley

This notebook demonstrates some of our software workflows that can be used to construct tabular datasets of Antarctic AR events, including MERRA-2 streaming.

Setup

First, we’ll load up any packages and modules we might need.

# load external packages
import os
import pandas as pd
import xarray as xr
from pathlib import Path
import earthaccess
import ray
from tqdm import tqdm
from huggingface_hub import login, HfApi
import logging
import sys
# load artools
import artools
from artools.loading_utils import load_ais, load_cell_areas, EarthdataGatekeeper, load_catalog
from artools.display_utils import display_catalog
from artools.attribute_utils import *
from artools.compute_attributes_streaming import *
/home/jovyan/antarctic_AR_dataset/notebooks
login()

Next, let’s load up the catalog. We provide a catalog of the first 250 storms in subset_storms.h5.

# load the catalog
storms = load_catalog('epsspace0.5_epstime12_minpts5_nreppts10_seed12345.h5')
# take only those that are landfalling
storms = storms[storms.is_landfalling]

Finally, we load up a mask of grid cells for Antarctica, as well as a mapping of grid cell to cell area.

ais_mask = load_ais()
cell_areas = load_cell_areas()

Catalog Overview

Before we get started with the masking workflow, we can first take a quick look at the catalog. The catalog is a clustering of AR pixels identified in the Wille (2021) catalog, which takes an Eulerian approach to identifying atmoshperic rivers.

display_catalog(storms, 5)
Loading...

The storms object is a pandas.DataFrame, with rows corresponding to different identified AR storm events that made landfall. The data_array column contains binary-valued xarray.DataArray masks of the full spatiotemporal footprint of the storm. Note, we must use the display_catalog function from our local display_utils module to view the dataframe. Otherwise, if one attempts to print the dataframe as usual, the data_array column will be filled with string representations of the masks which (1) take forever to render and (2) are unpleasant to look at.

storms.loc[1].data_array
Loading...

Organizing the catalog in this way means we can appeal to pandas very powerful and compact API to compute relevant storm quantities.

Filling in Storm Attributes and Impacts

Now let’s put these masks to good use and start grabbing interesting quantities associated with each storm. There are two kinds of quantities we can compute, ones which need reanalysis data (geographic attributes, when and where, etc.) and ones which don’t (atmospheric quantities). We’ll start with computing quantities that don’t yet require reanalysis data and finish with a fully cloud-based workflow to compute quantities that do require reanalysis.

Geographic Quantities

Let’s start with computing geographic attributes of each landfalling storm. This includes the following quantities:

  • max_area: the largest areal extent of the storm over its complete lifetime

  • mean_landfalling_area: the average landfalling areal extent of the storm

  • cumulative_landfalling_area: the cumulative time x space the storm spent over the ice sheet

  • duration: how long did it last

  • start_date: when did the storm first appear

  • end_date: when did it dissipate

  • max_south_extent: how far south did it go, in degrees latitude?

  • region: the region of Antarctica it made landfall in (either West, East 1, or East 2)

Helper functions to compute each of these attributes given a storm’s xarray.DataArray can be found in the attributes_utils module.

storms['max_area'] = storms['data_array'].apply(lambda x: compute_max_area(x, cell_areas)).astype(int)
storms['mean_landfalling_area'] = storms['data_array'].apply(lambda x: compute_mean_area(x, cell_areas, ais_mask)).astype(int)
storms['cumulative_landfalling_area'] = storms['data_array'].apply(lambda x: compute_cumulative_spacetime(x, cell_areas, ais_mask)).astype(int)
storms['duration'] = storms['data_array'].apply(compute_duration)
storms['start_date'] = storms['data_array'].apply(add_start_date)
storms['end_date'] = storms['data_array'].apply(add_end_date)
storms['max_south_extent'] = storms['data_array'].apply(compute_max_southward_extent)

# regions defined by ranges of longitudes
region_defs = {'West': [-150, -30], 
               'East 1': [-30, 75],
               'East 2': [75, -150]}
# use helper function to divide full AIS mask into sectors corresponding to each region
region_masks = find_region_masks(region_defs, ais_mask)
storms['region'] = storms['data_array'].apply(lambda x: find_landfalling_region(x, cell_areas, region_masks))

Let’s take a look at what we were able to create.

display_catalog(storms, 5)
Loading...

Let’s save our intermediate output.

storms.to_hdf('../output/data_products/storms_setting.h5', key='df')
/tmp/ipykernel_8462/701074315.py:1: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block5_values] [items->Index(['data_array'], dtype='str')]

  storms.to_hdf('../output/data_products/storms_setting.h5', key='df')

Atmospheric Quantities

Now for the more interesting part, let’s compute atmospheric characteristics and attributes for each storm. This includes things like their landfalling moisture content, wind speeds, as well as their impacts, like cumulative snowfall on the ice sheet or maximum surface temperature anomaly. In this tutorial, we will be focusing on gathering three quantities related to AR impacts: maxmimum surface temperature anomaly underneath the footprint over the AIS, cumulative snowfall on the AIS, and cumulative rainfall on the AIS.

We’ll be using MERRA-2 reanalysis data to get these quantities. Of course, one might just execute this workflow by loading it up from disk at some local file system you have access to, but that’s not very reproducible, accessible, or open! Instead, we’ll be streaming MERRA-2 from AWS S3 buckets using earthaccess, a workflow that is much more reproducible, accessible, open, and lightweight!

For this tutorial, we’ll just be getting all of the variables from one MERRA-2 dataset. In our attempts with multiple datasets, we have found difficulties with the parallelization (hanging processes, etc.) So, we’ll be demonstrating on a single dataset for now while we work out the kinks for using other datasets.

data_dois = {'climatology': '10.5067/5ESKGQTZG7FO',
             'T2M': '10.5067/3Z173KIE2TPD',
             'PRECIP': '10.5067/Q5GVUVUIVGO7'}

Climatology

To compute surface temperature anomaly, we’ll need to compute a climatology. We’ll compute the climatology of both of these variables via streaming first. The climatology is computed by taking monthly averaged dataset of our variables from MERRA-2, and then further averaging those across the years. Since the original Wille (2021) catalog on which ours is based goes from 1980 to 2022, we will be computing monthly averages over this range.

# grab granules for monthly data
granule_lst = earthaccess.search_data(doi=data_dois['climatology'], 
                                  temporal=('1980-01-01', '2022-12-31'))
# open pointers to nc4 stored in AWS S3 buckets
pointers = earthaccess.open(granule_lst, show_progress=False)

monthly_means = xr.open_mfdataset(pointers)
monthly_means = monthly_means[['T2M']]
climatology_ds = monthly_means.groupby(monthly_means.time.dt.month).mean().compute()
climatology_ds = climatology_ds.assign_coords(lat=climatology_ds.lat.round(5), lon=climatology_ds.lon.round(5))
climatology_ds.to_netcdf('output/data_products/climatology.nc4')
climatology_ds = xr.load_dataset('../output/data_products/climatology.nc4')

Streaming Setup

We will be streaming quantities from a MERRA-2 dataset and extracting summary statistics on a storm-by-storm basis, executing the loop by parallelizing using Ray. To ensure workers receive a roughly equal amount of work, we first sort the storms in order of decreasing duration, as storms with longer durations and larger spatiotemporal footprints require more time to compute their quantities.

sorted_storms = storms.sort_values(by='duration', ascending=False)

So each worker has a sufficient amount of work for each iteration of the parallel loop, we will also split the full list of storms into chunks of fixed size. Each worker will then take a chunk of storms to process. Below, we define global variables for how many workers and how many chunks we would like to use.

NUM_WORKERS = 4
CHUNK_SIZE = 10

The full streaming workflow may take a few hours to execute, depending on how many storms and how many quantities you are dealing with. We recommend using jupyter-keepalive to keep your Jupyter session alive even after exiting. Instructions on how to use this feature can be found here. This functionality is already included in our image environment, so there is no need to install it.

However, once you leave the notebook, even though the server is still alive and the computation is still running, output to track the computation will no longer be rendered in the notebook. To track the computation, we will set up a logger that will output the progress of our computations to /output/logs/.

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(sys.stdout),
        logging.FileHandler('../output/logs/streaming_progress.log')
    ]
)
logging.info("Starting streaming workflow...")
2026-03-15 21:02:14,808 - INFO - Starting streaming workflow...

2m-Temperature Anomaly

In our workflow, we go through each storm and execute a function called compute_summaries on each storm. compute_summaries computes scalar summaries of all desired variables from the particular MERRA-2 dataset we are working with. To indicate which variables and which scalar summaries you would like, you must pass in a dictionary of dictionaries which we will call func_vars_dict. The keys of the outer dictionary are the variable names in the MERRA-2 dataset we’d like to compute aggregates of, and the values are themselves dictionaries where each item represents some desired aggregation function of that variable. The provided keys are the names of that scalar summary that will appear in the resulting dataset.

In this way, when we stream a set of days of data for a particular storm, we compute all relevant quantities on that storm before throwing the data away and re-streaming for the next storm.

First, we initialize our Ray cluster with the number of CPUs we wish to parallelize over, 1 worker per CPU. For each MERRA-2 dataset we stream from, we will be reinitializing and then closing the cluster. This is to reduce the memory footprint of the computation, ensuring a fresh start after each computation is complete.

ray.init(num_cpus=NUM_WORKERS, logging_level='ERROR', 
         _metrics_export_port=-1, include_dashboard=False, 
         log_to_driver=False, runtime_env={'py_modules': [artools.attribute_utils, artools.loading_utils]})

# put the climatologies and cell areas in the ray object store
climatology_ref = ray.put(climatology_ds)
cell_areas_ref = ray.put(cell_areas)
# prevents parallel open requests from being made to NASA servers
gatekeeper = EarthdataGatekeeper.remote()
/srv/conda/envs/notebook/lib/python3.12/site-packages/ray/_private/worker.py:2052: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
  warnings.warn(
func_vars_dict = {'T2M': {'max_T2M_anomaly_ais': lambda storm_da, var_da, area_da: 
                              compute_max_intensity(storm_da, var_da, area_da, ais_mask)}}

func_vars_dict_ref = ray.put(func_vars_dict)

data_doi = data_dois['T2M']
# list with the computed results
results_T2M = []

chunks = [sorted_storms[i:i + CHUNK_SIZE] for i in range(0, sorted_storms.shape[0], CHUNK_SIZE)]

logging.info(f"Starting T2M computations in chunks...")

chunk_refs = [ 
    compute_chunk_summaries.remote(
        chunk,
        func_vars_dict_ref,
        cell_areas_ref,
        data_doi,
        gatekeeper=gatekeeper,
        climatology_ds=climatology_ref
        ) for chunk in chunks ]

for j, ref in enumerate(chunk_refs):
    results = ray.get(ref)
    results_T2M.extend(results)
    # clean up the reference immediately so Ray empties the Object Store
    chunk_refs[j] = None    
    logging.info(f"T2M: Finished chunk {j + 1}/{len(chunks)}")

logging.info("T2M computations complete.")
2026-03-15 21:02:18,067 - INFO - Starting T2M computations in chunks...
2026-03-15 21:06:09,458 - INFO - T2M: Finished chunk 1/317
2026-03-15 21:06:09,460 - INFO - T2M: Finished chunk 2/317
2026-03-15 21:06:09,461 - INFO - T2M: Finished chunk 3/317
2026-03-15 21:06:09,462 - INFO - T2M: Finished chunk 4/317
2026-03-15 21:06:09,463 - INFO - T2M: Finished chunk 5/317
2026-03-15 21:06:09,464 - INFO - T2M: Finished chunk 6/317
2026-03-15 21:08:17,852 - INFO - T2M: Finished chunk 7/317
2026-03-15 21:08:17,854 - INFO - T2M: Finished chunk 8/317
2026-03-15 21:08:17,854 - INFO - T2M: Finished chunk 9/317
2026-03-15 21:08:28,594 - INFO - T2M: Finished chunk 10/317
2026-03-15 21:08:37,318 - INFO - T2M: Finished chunk 11/317
2026-03-15 21:08:53,857 - INFO - T2M: Finished chunk 12/317
2026-03-15 21:10:25,384 - INFO - T2M: Finished chunk 13/317
2026-03-15 21:10:26,209 - INFO - T2M: Finished chunk 14/317
2026-03-15 21:11:02,495 - INFO - T2M: Finished chunk 15/317
2026-03-15 21:11:02,496 - INFO - T2M: Finished chunk 16/317
2026-03-15 21:11:22,675 - INFO - T2M: Finished chunk 17/317
2026-03-15 21:12:09,187 - INFO - T2M: Finished chunk 18/317
2026-03-15 21:12:18,101 - INFO - T2M: Finished chunk 19/317
2026-03-15 21:12:30,974 - INFO - T2M: Finished chunk 20/317
2026-03-15 21:13:17,642 - INFO - T2M: Finished chunk 21/317
2026-03-15 21:13:36,382 - INFO - T2M: Finished chunk 22/317
2026-03-15 21:14:17,949 - INFO - T2M: Finished chunk 23/317
2026-03-15 21:14:47,600 - INFO - T2M: Finished chunk 24/317
2026-03-15 21:14:48,651 - INFO - T2M: Finished chunk 25/317
2026-03-15 21:15:05,357 - INFO - T2M: Finished chunk 26/317
2026-03-15 21:15:37,150 - INFO - T2M: Finished chunk 27/317
2026-03-15 21:16:05,241 - INFO - T2M: Finished chunk 28/317
2026-03-15 21:16:59,106 - INFO - T2M: Finished chunk 29/317
2026-03-15 21:16:59,107 - INFO - T2M: Finished chunk 30/317
2026-03-15 21:16:59,108 - INFO - T2M: Finished chunk 31/317
2026-03-15 21:17:23,024 - INFO - T2M: Finished chunk 32/317
2026-03-15 21:18:05,859 - INFO - T2M: Finished chunk 33/317
2026-03-15 21:18:32,631 - INFO - T2M: Finished chunk 34/317
2026-03-15 21:18:55,779 - INFO - T2M: Finished chunk 35/317
2026-03-15 21:18:55,783 - INFO - T2M: Finished chunk 36/317
2026-03-15 21:19:39,287 - INFO - T2M: Finished chunk 37/317
2026-03-15 21:19:44,667 - INFO - T2M: Finished chunk 38/317
2026-03-15 21:20:09,562 - INFO - T2M: Finished chunk 39/317
2026-03-15 21:20:47,766 - INFO - T2M: Finished chunk 40/317
2026-03-15 21:20:47,767 - INFO - T2M: Finished chunk 41/317
2026-03-15 21:21:26,494 - INFO - T2M: Finished chunk 42/317
2026-03-15 21:21:37,673 - INFO - T2M: Finished chunk 43/317
2026-03-15 21:21:43,956 - INFO - T2M: Finished chunk 44/317
2026-03-15 21:22:36,345 - INFO - T2M: Finished chunk 45/317
2026-03-15 21:22:36,511 - INFO - T2M: Finished chunk 46/317
2026-03-15 21:22:51,099 - INFO - T2M: Finished chunk 47/317
2026-03-15 21:23:51,346 - INFO - T2M: Finished chunk 48/317
2026-03-15 21:23:51,347 - INFO - T2M: Finished chunk 49/317
2026-03-15 21:23:53,679 - INFO - T2M: Finished chunk 50/317
2026-03-15 21:24:07,959 - INFO - T2M: Finished chunk 51/317
2026-03-15 21:24:32,904 - INFO - T2M: Finished chunk 52/317
2026-03-15 21:24:51,279 - INFO - T2M: Finished chunk 53/317
2026-03-15 21:24:51,285 - INFO - T2M: Finished chunk 54/317
2026-03-15 21:25:47,084 - INFO - T2M: Finished chunk 55/317
2026-03-15 21:26:01,484 - INFO - T2M: Finished chunk 56/317
2026-03-15 21:26:16,665 - INFO - T2M: Finished chunk 57/317
2026-03-15 21:26:29,742 - INFO - T2M: Finished chunk 58/317
2026-03-15 21:26:48,781 - INFO - T2M: Finished chunk 59/317
2026-03-15 21:27:00,184 - INFO - T2M: Finished chunk 60/317
2026-03-15 21:27:49,966 - INFO - T2M: Finished chunk 61/317
2026-03-15 21:27:56,671 - INFO - T2M: Finished chunk 62/317
2026-03-15 21:28:30,732 - INFO - T2M: Finished chunk 63/317
2026-03-15 21:28:47,612 - INFO - T2M: Finished chunk 64/317
2026-03-15 21:29:12,391 - INFO - T2M: Finished chunk 65/317
2026-03-15 21:29:12,392 - INFO - T2M: Finished chunk 66/317
2026-03-15 21:29:43,378 - INFO - T2M: Finished chunk 67/317
2026-03-15 21:29:54,485 - INFO - T2M: Finished chunk 68/317
2026-03-15 21:30:06,609 - INFO - T2M: Finished chunk 69/317
2026-03-15 21:30:32,097 - INFO - T2M: Finished chunk 70/317
2026-03-15 21:30:59,353 - INFO - T2M: Finished chunk 71/317
2026-03-15 21:31:01,174 - INFO - T2M: Finished chunk 72/317
2026-03-15 21:31:20,694 - INFO - T2M: Finished chunk 73/317
2026-03-15 21:31:52,900 - INFO - T2M: Finished chunk 74/317
2026-03-15 21:31:58,486 - INFO - T2M: Finished chunk 75/317
2026-03-15 21:32:33,640 - INFO - T2M: Finished chunk 76/317
2026-03-15 21:32:44,785 - INFO - T2M: Finished chunk 77/317
2026-03-15 21:32:57,803 - INFO - T2M: Finished chunk 78/317
2026-03-15 21:33:07,453 - INFO - T2M: Finished chunk 79/317
2026-03-15 21:33:10,899 - INFO - T2M: Finished chunk 80/317
2026-03-15 21:34:09,891 - INFO - T2M: Finished chunk 81/317
2026-03-15 21:34:24,648 - INFO - T2M: Finished chunk 82/317
2026-03-15 21:34:48,887 - INFO - T2M: Finished chunk 83/317
2026-03-15 21:34:52,529 - INFO - T2M: Finished chunk 84/317
2026-03-15 21:34:52,534 - INFO - T2M: Finished chunk 85/317
2026-03-15 21:35:16,530 - INFO - T2M: Finished chunk 86/317
2026-03-15 21:35:50,777 - INFO - T2M: Finished chunk 87/317
2026-03-15 21:36:43,883 - INFO - T2M: Finished chunk 88/317
2026-03-15 21:36:56,354 - INFO - T2M: Finished chunk 89/317
2026-03-15 21:37:06,063 - INFO - T2M: Finished chunk 90/317
2026-03-15 21:37:06,064 - INFO - T2M: Finished chunk 91/317
2026-03-15 21:37:36,873 - INFO - T2M: Finished chunk 92/317
2026-03-15 21:37:36,874 - INFO - T2M: Finished chunk 93/317
2026-03-15 21:37:44,930 - INFO - T2M: Finished chunk 94/317
2026-03-15 21:37:52,599 - INFO - T2M: Finished chunk 95/317
2026-03-15 21:38:39,857 - INFO - T2M: Finished chunk 96/317
2026-03-15 21:39:00,158 - INFO - T2M: Finished chunk 97/317
2026-03-15 21:39:07,871 - INFO - T2M: Finished chunk 98/317
2026-03-15 21:39:20,553 - INFO - T2M: Finished chunk 99/317
2026-03-15 21:39:38,650 - INFO - T2M: Finished chunk 100/317
2026-03-15 21:40:05,914 - INFO - T2M: Finished chunk 101/317
2026-03-15 21:40:14,295 - INFO - T2M: Finished chunk 102/317
2026-03-15 21:40:14,424 - INFO - T2M: Finished chunk 103/317
2026-03-15 21:41:14,091 - INFO - T2M: Finished chunk 104/317
2026-03-15 21:41:15,058 - INFO - T2M: Finished chunk 105/317
2026-03-15 21:41:24,780 - INFO - T2M: Finished chunk 106/317
2026-03-15 21:41:58,854 - INFO - T2M: Finished chunk 107/317
2026-03-15 21:41:58,856 - INFO - T2M: Finished chunk 108/317
2026-03-15 21:42:01,495 - INFO - T2M: Finished chunk 109/317
2026-03-15 21:42:41,730 - INFO - T2M: Finished chunk 110/317
2026-03-15 21:43:03,562 - INFO - T2M: Finished chunk 111/317
2026-03-15 21:43:30,135 - INFO - T2M: Finished chunk 112/317
2026-03-15 21:43:46,389 - INFO - T2M: Finished chunk 113/317
2026-03-15 21:44:07,861 - INFO - T2M: Finished chunk 114/317
2026-03-15 21:44:15,652 - INFO - T2M: Finished chunk 115/317
2026-03-15 21:44:27,048 - INFO - T2M: Finished chunk 116/317
2026-03-15 21:44:48,656 - INFO - T2M: Finished chunk 117/317
2026-03-15 21:44:54,904 - INFO - T2M: Finished chunk 118/317
2026-03-15 21:45:30,520 - INFO - T2M: Finished chunk 119/317
2026-03-15 21:45:42,031 - INFO - T2M: Finished chunk 120/317
2026-03-15 21:45:49,846 - INFO - T2M: Finished chunk 121/317
2026-03-15 21:46:05,420 - INFO - T2M: Finished chunk 122/317
2026-03-15 21:46:24,928 - INFO - T2M: Finished chunk 123/317
2026-03-15 21:46:32,962 - INFO - T2M: Finished chunk 124/317
2026-03-15 21:47:11,951 - INFO - T2M: Finished chunk 125/317
2026-03-15 21:47:11,954 - INFO - T2M: Finished chunk 126/317
2026-03-15 21:47:18,977 - INFO - T2M: Finished chunk 127/317
2026-03-15 21:47:40,884 - INFO - T2M: Finished chunk 128/317
2026-03-15 21:47:55,598 - INFO - T2M: Finished chunk 129/317
2026-03-15 21:48:23,421 - INFO - T2M: Finished chunk 130/317
2026-03-15 21:49:08,594 - INFO - T2M: Finished chunk 131/317
2026-03-15 21:49:08,595 - INFO - T2M: Finished chunk 132/317
2026-03-15 21:49:22,193 - INFO - T2M: Finished chunk 133/317
2026-03-15 21:49:41,542 - INFO - T2M: Finished chunk 134/317
2026-03-15 21:49:50,559 - INFO - T2M: Finished chunk 135/317
2026-03-15 21:49:50,562 - INFO - T2M: Finished chunk 136/317
2026-03-15 21:49:53,629 - INFO - T2M: Finished chunk 137/317
2026-03-15 21:50:33,891 - INFO - T2M: Finished chunk 138/317
2026-03-15 21:50:42,492 - INFO - T2M: Finished chunk 139/317
2026-03-15 21:51:17,208 - INFO - T2M: Finished chunk 140/317
2026-03-15 21:51:17,211 - INFO - T2M: Finished chunk 141/317
2026-03-15 21:51:28,534 - INFO - T2M: Finished chunk 142/317
2026-03-15 21:51:28,536 - INFO - T2M: Finished chunk 143/317
2026-03-15 21:51:53,534 - INFO - T2M: Finished chunk 144/317
2026-03-15 21:52:12,747 - INFO - T2M: Finished chunk 145/317
2026-03-15 21:52:36,545 - INFO - T2M: Finished chunk 146/317
2026-03-15 21:52:57,486 - INFO - T2M: Finished chunk 147/317
2026-03-15 21:53:03,364 - INFO - T2M: Finished chunk 148/317
2026-03-15 21:53:12,543 - INFO - T2M: Finished chunk 149/317
2026-03-15 21:53:16,673 - INFO - T2M: Finished chunk 150/317
2026-03-15 21:53:26,089 - INFO - T2M: Finished chunk 151/317
2026-03-15 21:53:45,563 - INFO - T2M: Finished chunk 152/317
2026-03-15 21:53:57,111 - INFO - T2M: Finished chunk 153/317
2026-03-15 21:54:11,704 - INFO - T2M: Finished chunk 154/317
2026-03-15 21:54:14,481 - INFO - T2M: Finished chunk 155/317
2026-03-15 21:54:20,972 - INFO - T2M: Finished chunk 156/317
2026-03-15 21:54:46,778 - INFO - T2M: Finished chunk 157/317
2026-03-15 21:55:01,078 - INFO - T2M: Finished chunk 158/317
2026-03-15 21:55:08,295 - INFO - T2M: Finished chunk 159/317
2026-03-15 21:55:24,553 - INFO - T2M: Finished chunk 160/317
2026-03-15 21:55:34,974 - INFO - T2M: Finished chunk 161/317
2026-03-15 21:55:47,943 - INFO - T2M: Finished chunk 162/317
2026-03-15 21:56:11,101 - INFO - T2M: Finished chunk 163/317
2026-03-15 21:56:26,676 - INFO - T2M: Finished chunk 164/317
2026-03-15 21:56:27,261 - INFO - T2M: Finished chunk 165/317
2026-03-15 21:57:04,401 - INFO - T2M: Finished chunk 166/317
2026-03-15 21:57:15,127 - INFO - T2M: Finished chunk 167/317
2026-03-15 21:57:16,932 - INFO - T2M: Finished chunk 168/317
2026-03-15 21:57:41,735 - INFO - T2M: Finished chunk 169/317
2026-03-15 21:58:01,174 - INFO - T2M: Finished chunk 170/317
2026-03-15 21:58:05,589 - INFO - T2M: Finished chunk 171/317
2026-03-15 21:58:29,406 - INFO - T2M: Finished chunk 172/317
2026-03-15 21:58:30,876 - INFO - T2M: Finished chunk 173/317
2026-03-15 21:58:45,035 - INFO - T2M: Finished chunk 174/317
2026-03-15 21:59:29,032 - INFO - T2M: Finished chunk 175/317
2026-03-15 21:59:36,899 - INFO - T2M: Finished chunk 176/317
2026-03-15 21:59:41,438 - INFO - T2M: Finished chunk 177/317
2026-03-15 21:59:53,179 - INFO - T2M: Finished chunk 178/317
2026-03-15 22:00:11,948 - INFO - T2M: Finished chunk 179/317
2026-03-15 22:00:20,857 - INFO - T2M: Finished chunk 180/317
2026-03-15 22:00:43,829 - INFO - T2M: Finished chunk 181/317
2026-03-15 22:01:14,072 - INFO - T2M: Finished chunk 182/317
2026-03-15 22:01:22,796 - INFO - T2M: Finished chunk 183/317
2026-03-15 22:01:22,798 - INFO - T2M: Finished chunk 184/317
2026-03-15 22:01:37,806 - INFO - T2M: Finished chunk 185/317
2026-03-15 22:01:50,581 - INFO - T2M: Finished chunk 186/317
2026-03-15 22:02:05,852 - INFO - T2M: Finished chunk 187/317
2026-03-15 22:02:05,854 - INFO - T2M: Finished chunk 188/317
2026-03-15 22:02:35,996 - INFO - T2M: Finished chunk 189/317
2026-03-15 22:02:57,459 - INFO - T2M: Finished chunk 190/317
2026-03-15 22:03:10,321 - INFO - T2M: Finished chunk 191/317
2026-03-15 22:03:16,256 - INFO - T2M: Finished chunk 192/317
2026-03-15 22:03:20,101 - INFO - T2M: Finished chunk 193/317
2026-03-15 22:03:42,510 - INFO - T2M: Finished chunk 194/317
2026-03-15 22:04:08,486 - INFO - T2M: Finished chunk 195/317
2026-03-15 22:04:17,006 - INFO - T2M: Finished chunk 196/317
2026-03-15 22:04:29,594 - INFO - T2M: Finished chunk 197/317
2026-03-15 22:04:36,765 - INFO - T2M: Finished chunk 198/317
2026-03-15 22:04:58,369 - INFO - T2M: Finished chunk 199/317
2026-03-15 22:05:12,495 - INFO - T2M: Finished chunk 200/317
2026-03-15 22:05:31,770 - INFO - T2M: Finished chunk 201/317
2026-03-15 22:05:43,021 - INFO - T2M: Finished chunk 202/317
2026-03-15 22:06:10,198 - INFO - T2M: Finished chunk 203/317
2026-03-15 22:06:10,201 - INFO - T2M: Finished chunk 204/317
2026-03-15 22:06:10,202 - INFO - T2M: Finished chunk 205/317
2026-03-15 22:06:48,017 - INFO - T2M: Finished chunk 206/317
2026-03-15 22:07:01,952 - INFO - T2M: Finished chunk 207/317
2026-03-15 22:07:01,953 - INFO - T2M: Finished chunk 208/317
2026-03-15 22:07:09,978 - INFO - T2M: Finished chunk 209/317
2026-03-15 22:07:18,950 - INFO - T2M: Finished chunk 210/317
2026-03-15 22:07:24,169 - INFO - T2M: Finished chunk 211/317
2026-03-15 22:08:21,964 - INFO - T2M: Finished chunk 212/317
2026-03-15 22:08:23,020 - INFO - T2M: Finished chunk 213/317
2026-03-15 22:08:36,915 - INFO - T2M: Finished chunk 214/317
2026-03-15 22:08:36,917 - INFO - T2M: Finished chunk 215/317
2026-03-15 22:08:36,929 - INFO - T2M: Finished chunk 216/317
2026-03-15 22:08:41,951 - INFO - T2M: Finished chunk 217/317
2026-03-15 22:09:01,297 - INFO - T2M: Finished chunk 218/317
2026-03-15 22:09:16,776 - INFO - T2M: Finished chunk 219/317
2026-03-15 22:09:22,522 - INFO - T2M: Finished chunk 220/317
2026-03-15 22:09:24,471 - INFO - T2M: Finished chunk 221/317
2026-03-15 22:09:47,841 - INFO - T2M: Finished chunk 222/317
2026-03-15 22:09:47,845 - INFO - T2M: Finished chunk 223/317
2026-03-15 22:10:03,339 - INFO - T2M: Finished chunk 224/317
2026-03-15 22:10:25,938 - INFO - T2M: Finished chunk 225/317
2026-03-15 22:10:26,290 - INFO - T2M: Finished chunk 226/317
2026-03-15 22:10:34,236 - INFO - T2M: Finished chunk 227/317
2026-03-15 22:10:45,826 - INFO - T2M: Finished chunk 228/317
2026-03-15 22:10:58,002 - INFO - T2M: Finished chunk 229/317
2026-03-15 22:11:16,882 - INFO - T2M: Finished chunk 230/317
2026-03-15 22:11:20,717 - INFO - T2M: Finished chunk 231/317
2026-03-15 22:11:43,660 - INFO - T2M: Finished chunk 232/317
2026-03-15 22:12:24,955 - INFO - T2M: Finished chunk 233/317
2026-03-15 22:12:24,958 - INFO - T2M: Finished chunk 234/317
2026-03-15 22:12:44,367 - INFO - T2M: Finished chunk 235/317
2026-03-15 22:12:44,368 - INFO - T2M: Finished chunk 236/317
2026-03-15 22:12:53,958 - INFO - T2M: Finished chunk 237/317
2026-03-15 22:13:13,706 - INFO - T2M: Finished chunk 238/317
2026-03-15 22:13:38,154 - INFO - T2M: Finished chunk 239/317
2026-03-15 22:13:44,652 - INFO - T2M: Finished chunk 240/317
2026-03-15 22:14:00,299 - INFO - T2M: Finished chunk 241/317
2026-03-15 22:14:03,684 - INFO - T2M: Finished chunk 242/317
2026-03-15 22:14:07,519 - INFO - T2M: Finished chunk 243/317
2026-03-15 22:14:15,601 - INFO - T2M: Finished chunk 244/317
2026-03-15 22:14:20,528 - INFO - T2M: Finished chunk 245/317
2026-03-15 22:14:52,302 - INFO - T2M: Finished chunk 246/317
2026-03-15 22:14:57,173 - INFO - T2M: Finished chunk 247/317
2026-03-15 22:14:57,174 - INFO - T2M: Finished chunk 248/317
2026-03-15 22:15:10,160 - INFO - T2M: Finished chunk 249/317
2026-03-15 22:15:24,344 - INFO - T2M: Finished chunk 250/317
2026-03-15 22:15:24,347 - INFO - T2M: Finished chunk 251/317
2026-03-15 22:15:41,101 - INFO - T2M: Finished chunk 252/317
2026-03-15 22:15:41,102 - INFO - T2M: Finished chunk 253/317
2026-03-15 22:15:58,842 - INFO - T2M: Finished chunk 254/317
2026-03-15 22:16:04,614 - INFO - T2M: Finished chunk 255/317
2026-03-15 22:16:15,072 - INFO - T2M: Finished chunk 256/317
2026-03-15 22:16:28,371 - INFO - T2M: Finished chunk 257/317
2026-03-15 22:16:37,215 - INFO - T2M: Finished chunk 258/317
2026-03-15 22:16:41,134 - INFO - T2M: Finished chunk 259/317
2026-03-15 22:16:50,297 - INFO - T2M: Finished chunk 260/317
2026-03-15 22:17:16,496 - INFO - T2M: Finished chunk 261/317
2026-03-15 22:17:36,249 - INFO - T2M: Finished chunk 262/317
2026-03-15 22:17:39,560 - INFO - T2M: Finished chunk 263/317
2026-03-15 22:17:49,727 - INFO - T2M: Finished chunk 264/317
2026-03-15 22:17:54,492 - INFO - T2M: Finished chunk 265/317
2026-03-15 22:17:56,752 - INFO - T2M: Finished chunk 266/317
2026-03-15 22:18:18,549 - INFO - T2M: Finished chunk 267/317
2026-03-15 22:18:34,265 - INFO - T2M: Finished chunk 268/317
2026-03-15 22:18:35,933 - INFO - T2M: Finished chunk 269/317
2026-03-15 22:18:47,238 - INFO - T2M: Finished chunk 270/317
2026-03-15 22:18:54,842 - INFO - T2M: Finished chunk 271/317
2026-03-15 22:19:12,895 - INFO - T2M: Finished chunk 272/317
2026-03-15 22:19:12,899 - INFO - T2M: Finished chunk 273/317
2026-03-15 22:19:23,521 - INFO - T2M: Finished chunk 274/317
2026-03-15 22:19:40,706 - INFO - T2M: Finished chunk 275/317
2026-03-15 22:20:01,181 - INFO - T2M: Finished chunk 276/317
2026-03-15 22:20:03,610 - INFO - T2M: Finished chunk 277/317
2026-03-15 22:20:06,088 - INFO - T2M: Finished chunk 278/317
2026-03-15 22:20:06,090 - INFO - T2M: Finished chunk 279/317
2026-03-15 22:20:41,199 - INFO - T2M: Finished chunk 280/317
2026-03-15 22:20:41,201 - INFO - T2M: Finished chunk 281/317
2026-03-15 22:21:01,907 - INFO - T2M: Finished chunk 282/317
2026-03-15 22:21:01,908 - INFO - T2M: Finished chunk 283/317
2026-03-15 22:21:01,910 - INFO - T2M: Finished chunk 284/317
2026-03-15 22:21:09,590 - INFO - T2M: Finished chunk 285/317
2026-03-15 22:21:15,528 - INFO - T2M: Finished chunk 286/317
2026-03-15 22:21:45,251 - INFO - T2M: Finished chunk 287/317
2026-03-15 22:21:45,451 - INFO - T2M: Finished chunk 288/317
2026-03-15 22:21:50,522 - INFO - T2M: Finished chunk 289/317
2026-03-15 22:22:04,249 - INFO - T2M: Finished chunk 290/317
2026-03-15 22:22:04,251 - INFO - T2M: Finished chunk 291/317
2026-03-15 22:22:24,312 - INFO - T2M: Finished chunk 292/317
2026-03-15 22:22:36,493 - INFO - T2M: Finished chunk 293/317
2026-03-15 22:22:46,266 - INFO - T2M: Finished chunk 294/317
2026-03-15 22:23:00,039 - INFO - T2M: Finished chunk 295/317
2026-03-15 22:23:04,654 - INFO - T2M: Finished chunk 296/317
2026-03-15 22:23:24,243 - INFO - T2M: Finished chunk 297/317
2026-03-15 22:23:24,245 - INFO - T2M: Finished chunk 298/317
2026-03-15 22:23:29,228 - INFO - T2M: Finished chunk 299/317
2026-03-15 22:23:31,578 - INFO - T2M: Finished chunk 300/317
2026-03-15 22:23:44,045 - INFO - T2M: Finished chunk 301/317
2026-03-15 22:23:52,270 - INFO - T2M: Finished chunk 302/317
2026-03-15 22:23:52,724 - INFO - T2M: Finished chunk 303/317
2026-03-15 22:24:16,449 - INFO - T2M: Finished chunk 304/317
2026-03-15 22:24:28,019 - INFO - T2M: Finished chunk 305/317
2026-03-15 22:24:28,020 - INFO - T2M: Finished chunk 306/317
2026-03-15 22:24:35,412 - INFO - T2M: Finished chunk 307/317
2026-03-15 22:24:46,287 - INFO - T2M: Finished chunk 308/317
2026-03-15 22:24:48,874 - INFO - T2M: Finished chunk 309/317
2026-03-15 22:25:25,301 - INFO - T2M: Finished chunk 310/317
2026-03-15 22:25:25,303 - INFO - T2M: Finished chunk 311/317
2026-03-15 22:25:33,137 - INFO - T2M: Finished chunk 312/317
2026-03-15 22:25:37,411 - INFO - T2M: Finished chunk 313/317
2026-03-15 22:25:39,099 - INFO - T2M: Finished chunk 314/317
2026-03-15 22:26:03,177 - INFO - T2M: Finished chunk 315/317
2026-03-15 22:26:03,178 - INFO - T2M: Finished chunk 316/317
2026-03-15 22:26:03,180 - INFO - T2M: Finished chunk 317/317
2026-03-15 22:26:03,181 - INFO - T2M computations complete.
# construct dataframe to concatenate into the original dataframe
labels = [key for quantity_dict in func_vars_dict.values() for key in quantity_dict.keys()]
df_T2M = pd.DataFrame(results_T2M, columns=labels, index=sorted_storms.index)
df_T2M_original_order = df_T2M.loc[storms.index]
df_T2M_original_order.to_csv('../output/data_products/results_T2M.csv')
logging.info("T2M results saved!")
2026-03-15 22:26:03,211 - INFO - T2M results saved!
ray.shutdown()

Snowfall and Rainfall

Snowfall and rainfall come from the same MERRA-2 dataset as poleward integrated vapor transport. However, we treat them separately because precipitation quantities are computed not on the spatiotemporal footprint of the mask as the previous quantities, but an augmented mask where even if the AR has left a pixel at a particular time, that pixel is still considered part of the AR footprint for the next 24 hours. Because of this extra preprocessing, we pass a precip_func function, the aggregation function we wish to use, into a summary computation function particular to this case (compute_precip_summaries).

ray.init(num_cpus=NUM_WORKERS, logging_level='ERROR', 
         _metrics_export_port=-1, include_dashboard=False, 
         log_to_driver=False, runtime_env={'py_modules': [artools.attribute_utils, artools.loading_utils]})

# put the climatologies and cell areas in the ray object store
climatology_ref = ray.put(climatology_ds)
cell_areas_ref = ray.put(cell_areas)
# prevents parallel open requests from being made to NASA servers
gatekeeper = EarthdataGatekeeper.remote()
data_doi = data_dois['PRECIP']

# storing the aggregation function for precip in the ray store
def precip_func(storm_da, var_da, area_da):
    return compute_cumulative(storm_da, var_da, area_da, ais_mask)
precip_func_ref = ray.put(precip_func)
results_precip = []

chunks = [sorted_storms[i:i + CHUNK_SIZE] for i in range(0, sorted_storms.shape[0], CHUNK_SIZE)]

logging.info(f"Starting Precipitation computations in chunks...")

chunk_refs = [
        compute_precip_chunk_summaries.remote(
            chunk, 
            cell_areas_ref, 
            precip_func_ref,
            data_doi,
            gatekeeper=gatekeeper
        ) for chunk in chunks
    ]

for j, ref in enumerate(chunk_refs):
    results = ray.get(ref)
    results_precip.extend(results)
    # clean up the reference immediately so Ray empties the Object Store
    chunk_refs[j] = None    
    logging.info(f"Precip: Finished chunk {j + 1}/{len(chunks)}")

logging.info("Precipitation computations complete.")
2026-03-15 22:26:07,809 - INFO - Starting Precipitation computations in chunks...
# construct dataframe to concatenate into the original dataframe
labels = ['cumulative_rainfall_ais', 'cumulative_snowfall_ais']
df_precip = pd.DataFrame(results_precip, columns=labels, index=sorted_storms.index)
df_precip_original_order = df_precip.loc[storms.index]
df_precip_original_order.to_csv('../output/data_products/results_precip.csv')
logging.info("Precip results saved!")
ray.shutdown()

Taking it all in

Let’s take a look at all of our hard work! If you used jupyter-keepalive to execute this notebook, the variables with DataFrames will no longer be available for us to call up. Instead, let’s load up all of the intermediate products we created along the way, concatenate, and view them!

# re-load the modules needed to load dataframes and display them
from artools.display_utils import display_catalog
import pandas as pd
storms_setting = pd.read_hdf('../output/data_products/storms_setting.h5')
storms_temp = pd.read_csv('../output/data_products/results_T2M.csv', index_col='cluster')
storms_precip = pd.read_csv('../output/data_products/results_precip.csv', index_col='cluster')
full_df = pd.concat([storms_setting, storms_temp, storms_precip], axis=1)
display_catalog(full_df, 10)
Loading...
References
  1. Wille, J. (2020). jwille45/Antarctic-lab: v2.3. Zenodo. 10.5281/ZENODO.4009663