nenya_umap
The nenya_umap module provides functionality for UMAP dimensionality reduction and analysis of latent spaces.
Functions
- nenya.nenya_umap.DT_interval(inp)
Generate a DT (temperature difference) interval from the input.
- nenya.nenya_umap.load(model_name, DT=None, use_s3=False)
Load a UMAP model.
- Parameters:
- Returns:
Tuple of (UMAP model, table file path)
- Return type:
- Raises:
IOError – If model name is invalid or S3 is requested but not configured
- nenya.nenya_umap.umap_subset(tbl, opt_path, outfile, DT_cut=None, alpha_cut=None, max_cloud_fraction=None, ntrain=200000, remove=True, DT_key='DT40', umap_savefile=None, train_umap=True, local=True, CF=False, debug=False)
Run UMAP on a subset of the data. First 2 dimensions are written to the table.
Run UMAP on a subset of the data. First 2 dimensions are written to the table.
- Parameters:
tbl (pandas.DataFrame) – Data table
opt_path (str) – Path to options file
outfile (str) – Output file path
DT_cut (str, optional) – DT cut to apply (e.g., ‘DT2’, ‘DT4’). Defaults to None.
alpha_cut (str, optional) – Alpha cut to apply (e.g., ‘a1’, ‘a2’). Defaults to None.
max_cloud_fraction (float, optional) – Maximum cloud fraction to include. Defaults to None.
ntrain (int, optional) – Number of samples to use for training UMAP. Defaults to 200000.
remove (bool, optional) – Whether to remove temporary files. Defaults to True.
DT_key (str, optional) – Key for DT values in the table. Defaults to ‘DT40’.
umap_savefile (str, optional) – File to save the UMAP model. Defaults to None.
train_umap (bool, optional) – Whether to train a new UMAP model. Defaults to True.
local (bool, optional) – Whether to use local files. Defaults to True.
CF (bool, optional) – Whether to use cloud-free dataset. Defaults to False.
debug (bool, optional) – Whether to run in debug mode. Defaults to False.
- nenya.nenya_umap.grid_umap(U0, U1, nxy=16, percent=[0.05, 99.95], verbose=False)
Generate a grid on the UMAP domain.
- Parameters:
U0 (numpy.ndarray) – First UMAP dimension coordinates
U1 (numpy.ndarray) – Second UMAP dimension coordinates
nxy (int, optional) – Number of grid cells in each dimension. Defaults to 16.
percent (list, optional) – Percentile range for grid boundaries. Defaults to [0.05, 99.95].
verbose (bool, optional) – Whether to print details. Defaults to False.
- Returns:
Dictionary containing grid information
- Return type:
- nenya.nenya_umap.cutouts_on_umap_grid(tbl, nxy, umap_keys, min_pts=1)
Generate a list of cutouts uniformly distributed on the UMAP grid.
- Parameters:
- Returns:
Tuple of (filtered table, cutouts, umap_grid)
- Return type:
- nenya.nenya_umap.regional_analysis(geo_region, tbl, nxy, umap_keys, min_counts=200)
Analyze the distribution of a geographic region in UMAP space.
- Parameters:
- Returns:
Tuple of (counts, counts_geo, tbl, grid, xedges, yedges)
- Return type: