nenya_umap

The nenya_umap module provides functionality for UMAP dimensionality reduction and analysis of latent spaces.

Functions

nenya.nenya_umap.DT_interval(inp)

Generate a DT (temperature difference) interval from the input.

Parameters:

inp (tuple or None) – DT central value and dDT, or None for all

Returns:

Range of DT values as (min, max)

Return type:

tuple

nenya.nenya_umap.load(model_name, DT=None, use_s3=False)

Load a UMAP model.

Parameters:
  • model_name (str) – Model name (‘LLC’, ‘LLC_local’, ‘CF’, ‘v4’, ‘v5’, ‘viirs_v1’)

  • DT (float, optional) – DT value (K). Defaults to None.

  • use_s3 (bool, optional) – Whether to use S3 storage. Defaults to False.

Returns:

Tuple of (UMAP model, table file path)

Return type:

tuple

Raises:

IOError – If model name is invalid or S3 is requested but not configured

nenya.nenya_umap.umap_subset(tbl, opt_path, outfile, DT_cut=None, alpha_cut=None, max_cloud_fraction=None, ntrain=200000, remove=True, DT_key='DT40', umap_savefile=None, train_umap=True, local=True, CF=False, debug=False)

Run UMAP on a subset of the data. First 2 dimensions are written to the table.

Run UMAP on a subset of the data. First 2 dimensions are written to the table.

Parameters:
  • tbl (pandas.DataFrame) – Data table

  • opt_path (str) – Path to options file

  • outfile (str) – Output file path

  • DT_cut (str, optional) – DT cut to apply (e.g., ‘DT2’, ‘DT4’). Defaults to None.

  • alpha_cut (str, optional) – Alpha cut to apply (e.g., ‘a1’, ‘a2’). Defaults to None.

  • max_cloud_fraction (float, optional) – Maximum cloud fraction to include. Defaults to None.

  • ntrain (int, optional) – Number of samples to use for training UMAP. Defaults to 200000.

  • remove (bool, optional) – Whether to remove temporary files. Defaults to True.

  • DT_key (str, optional) – Key for DT values in the table. Defaults to ‘DT40’.

  • umap_savefile (str, optional) – File to save the UMAP model. Defaults to None.

  • train_umap (bool, optional) – Whether to train a new UMAP model. Defaults to True.

  • local (bool, optional) – Whether to use local files. Defaults to True.

  • CF (bool, optional) – Whether to use cloud-free dataset. Defaults to False.

  • debug (bool, optional) – Whether to run in debug mode. Defaults to False.

nenya.nenya_umap.grid_umap(U0, U1, nxy=16, percent=[0.05, 99.95], verbose=False)

Generate a grid on the UMAP domain.

Parameters:
  • U0 (numpy.ndarray) – First UMAP dimension coordinates

  • U1 (numpy.ndarray) – Second UMAP dimension coordinates

  • nxy (int, optional) – Number of grid cells in each dimension. Defaults to 16.

  • percent (list, optional) – Percentile range for grid boundaries. Defaults to [0.05, 99.95].

  • verbose (bool, optional) – Whether to print details. Defaults to False.

Returns:

Dictionary containing grid information

Return type:

dict

nenya.nenya_umap.cutouts_on_umap_grid(tbl, nxy, umap_keys, min_pts=1)

Generate a list of cutouts uniformly distributed on the UMAP grid.

Parameters:
  • tbl (pandas.DataFrame) – Data table

  • nxy (int) – Number of grid cells in each dimension

  • umap_keys (tuple) – Tuple of column names for UMAP coordinates

  • min_pts (int, optional) – Minimum points required in each grid cell. Defaults to 1.

Returns:

Tuple of (filtered table, cutouts, umap_grid)

Return type:

tuple

nenya.nenya_umap.regional_analysis(geo_region, tbl, nxy, umap_keys, min_counts=200)

Analyze the distribution of a geographic region in UMAP space.

Parameters:
  • geo_region (str) – Name of the geographic region (defined in defs.py)

  • tbl (pandas.DataFrame) – Data table

  • nxy (int) – Number of grid cells in each dimension

  • umap_keys (tuple) – Tuple of column names for UMAP coordinates

  • min_counts (int, optional) – Minimum counts for normalization. Defaults to 200.

Returns:

Tuple of (counts, counts_geo, tbl, grid, xedges, yedges)

Return type:

tuple