nenya_umap

The nenya_umap module provides functionality for UMAP dimensionality reduction and analysis of latent spaces.

Functions

nenya.nenya_umap.DT_interval(inp)

Generate a DT (temperature difference) interval from the input.

Parameters:: inp (tuple or None) – DT central value and dDT, or None for all
Returns:: Range of DT values as (min, max)
Return type:: tuple

nenya.nenya_umap.load(model_name, DT=None, use_s3=False)

Load a UMAP model.

Parameters:

model_name (str) – Model name (‘LLC’, ‘LLC_local’, ‘CF’, ‘v4’, ‘v5’, ‘viirs_v1’)
DT (float, optional) – DT value (K). Defaults to None.
use_s3 (bool, optional) – Whether to use S3 storage. Defaults to False.

Returns:

Tuple of (UMAP model, table file path)

Return type:

tuple

Raises:

IOError – If model name is invalid or S3 is requested but not configured

nenya.nenya_umap.umap_subset(tbl, opt_path, outfile, DT_cut=None, alpha_cut=None, max_cloud_fraction=None, ntrain=200000, remove=True, DT_key='DT40', umap_savefile=None, train_umap=True, local=True, CF=False, debug=False)

Run UMAP on a subset of the data. First 2 dimensions are written to the table.

Parameters:

tbl (pandas.DataFrame) – Data table
opt_path (str) – Path to options file
outfile (str) – Output file path
DT_cut (str, optional) – DT cut to apply (e.g., ‘DT2’, ‘DT4’). Defaults to None.
alpha_cut (str, optional) – Alpha cut to apply (e.g., ‘a1’, ‘a2’). Defaults to None.
max_cloud_fraction (float, optional) – Maximum cloud fraction to include. Defaults to None.
ntrain (int, optional) – Number of samples to use for training UMAP. Defaults to 200000.
remove (bool, optional) – Whether to remove temporary files. Defaults to True.
DT_key (str, optional) – Key for DT values in the table. Defaults to ‘DT40’.
umap_savefile (str, optional) – File to save the UMAP model. Defaults to None.
train_umap (bool, optional) – Whether to train a new UMAP model. Defaults to True.
local (bool, optional) – Whether to use local files. Defaults to True.
CF (bool, optional) – Whether to use cloud-free dataset. Defaults to False.
debug (bool, optional) – Whether to run in debug mode. Defaults to False.

nenya.nenya_umap.grid_umap(U0, U1, nxy=16, percent=[0.05, 99.95], verbose=False)

Generate a grid on the UMAP domain.

Parameters:

U0 (numpy.ndarray) – First UMAP dimension coordinates
U1 (numpy.ndarray) – Second UMAP dimension coordinates
nxy (int, optional) – Number of grid cells in each dimension. Defaults to 16.
percent (list, optional) – Percentile range for grid boundaries. Defaults to [0.05, 99.95].
verbose (bool, optional) – Whether to print details. Defaults to False.

Returns:

Dictionary containing grid information

Return type:

dict

nenya.nenya_umap.cutouts_on_umap_grid(tbl, nxy, umap_keys, min_pts=1)

Generate a list of cutouts uniformly distributed on the UMAP grid.

Parameters:

tbl (pandas.DataFrame) – Data table
nxy (int) – Number of grid cells in each dimension
umap_keys (tuple) – Tuple of column names for UMAP coordinates
min_pts (int, optional) – Minimum points required in each grid cell. Defaults to 1.

Returns:

Tuple of (filtered table, cutouts, umap_grid)

Return type:

tuple

nenya.nenya_umap.regional_analysis(geo_region, tbl, nxy, umap_keys, min_counts=200)

Analyze the distribution of a geographic region in UMAP space.

Parameters:

geo_region (str) – Name of the geographic region (defined in defs.py)
tbl (pandas.DataFrame) – Data table
nxy (int) – Number of grid cells in each dimension
umap_keys (tuple) – Tuple of column names for UMAP coordinates
min_counts (int, optional) – Minimum counts for normalization. Defaults to 200.

Returns:

Tuple of (counts, counts_geo, tbl, grid, xedges, yedges)

Return type:

tuple