UMAP Analysis
UMAP (Uniform Manifold Approximation and Projection) is a dimensionality reduction technique used in Nenya to visualize high-dimensional latent spaces in 2D. This enables exploration of patterns and relationships in satellite imagery data.
Loading UMAP Models
To load a pre-trained UMAP model:
from nenya import nenya_umap
# Load UMAP model for a specific Nenya model
umap_model, table_file = nenya_umap.load('v5', DT=2.5)
# Available models: 'LLC', 'LLC_local', 'CF', 'v4', 'v5', 'viirs_v1'
# Optional DT parameter filters by temperature difference
The load function returns:
A trained UMAP model that can project new data
Path to a table file with pre-computed UMAP coordinates for the dataset
Creating a UMAP Model
To create a new UMAP model from latent vectors:
import pandas as pd
from nenya import nenya_umap
# Load table with metadata
tbl = pd.read_parquet('path/to/table.parquet')
# Run UMAP on the data
nenya_umap.umap_subset(
tbl=tbl,
opt_path='path/to/opts.json',
outfile='output_table.parquet',
DT_cut='DT2', # Filter by DT value
ntrain=200000, # Number of samples to use for training
umap_savefile='umap_model.pkl' # Where to save the UMAP model
)
This function:
Filters the data based on specified criteria (DT, alpha, etc.)
Loads latent vectors for the selected data
Trains a UMAP model on a random subset
Projects all data to the 2D UMAP space
Saves the results to a new table file
UMAP DT Filtering
UMAP models can be filtered by DT (temperature difference) to focus on specific oceanic features:
# DT intervals are defined in nenya.defs
umap_DT = {
'DT0': (0.25, 0.25), # DT around 0.25K (±0.25)
'DT1': (0.75, 0.25), # DT around 0.75K (±0.25)
'DT15': (1.25, 0.25), # DT around 1.25K (±0.25)
'DT2': (2.0, 0.5), # DT around 2K (±0.5)
'DT4': (3.25, 0.75), # DT around 3.25K (±0.75)
'DT5': (4.0, -1), # DT >= 4K
'all': None # No DT filtering
}
To apply a DT filter:
# For DT around 2K (±0.5)
umap_model, table_file = nenya_umap.load('v5', DT=2.0)
Working with UMAP Coordinates
The resulting table contains UMAP coordinates as columns ‘US0’ and ‘US1’:
import pandas as pd
# Load table with UMAP coordinates
umap_tbl = pd.read_parquet(table_file)
# Access UMAP coordinates
u0 = umap_tbl.US0.values
u1 = umap_tbl.US1.values
# Plot UMAP coordinates
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 8))
plt.scatter(u0, u1, s=1, alpha=0.5)
plt.xlabel('U0')
plt.ylabel('U1')
plt.title('UMAP Embedding')
plt.show()
Creating a UMAP Grid
To create a regular grid in UMAP space for analysis:
# Create a grid with 16x16 cells
umap_grid = nenya_umap.grid_umap(
umap_tbl.US0.values,
umap_tbl.US1.values,
nxy=16,
percent=[0.05, 99.95] # Percentile range to use for boundaries
)
# The grid contains:
# - xmin, xmax, ymin, ymax: Boundaries
# - xval, yval: Grid edge coordinates
# - dxv, dyv: Cell dimensions
Selecting Cutouts with UMAP
To select representative cutouts across UMAP space:
# Select cutouts uniformly distributed in UMAP space
filtered_tbl, cutouts, umap_grid = nenya_umap.cutouts_on_umap_grid(
tbl=umap_tbl,
nxy=16,
umap_keys=('US0', 'US1'),
min_pts=1 # Minimum points required in each grid cell
)
# cutouts is a list of rows from tbl, one for each grid cell (or None if empty)
Regional Analysis with UMAP
To analyze geographic regions in UMAP space:
# Analyze a specific geographic region
counts, counts_geo, tbl, grid, xedges, yedges = nenya_umap.regional_analysis(
geo_region='eqpacific', # Name of region defined in defs.py
tbl=umap_tbl,
nxy=16,
umap_keys=('US0', 'US1'),
min_counts=200
)
# counts: Histogram of all points
# counts_geo: Histogram of points in the region
# grid: Grid information
# xedges, yedges: Histogram bin edges
Geographic regions are defined in defs.py:
geo_regions = {
'coastalcali': {'lons': [-128, -118], 'lats': [32, 40]},
'eqpacific': {'lons': [-140, -90], 'lats': [-5, 5]},
'eqindian': {'lons': [60, 90], 'lats': [-5, 5]},
# And others...
}
Embedding New Images
To embed a new image in an existing UMAP space:
from nenya import analyze_image
# Embed a single image in UMAP space
embedding, pp_img, table_file, DT, latents = analyze_image.umap_image('v5', image)
# embedding contains the UMAP coordinates (U0, U1) for the image
This function:
Loads the appropriate Nenya model
Extracts latent vectors from the image
Calculates DT (temperature difference)
Projects the latent vector to UMAP space
Returns the UMAP coordinates and other information
Visualizing UMAP Embeddings
For interactive visualization, use the portal functionality described in Interactive Visualization.
Tips for UMAP Analysis
Training Size: UMAP works well with a subset of the data (e.g., 200,000 samples)
Filtering: Consider filtering by DT or other criteria to focus on specific phenomena
Normalization: Normalize latent vectors before UMAP if not already done
Parameters: Experiment with UMAP parameters (n_neighbors, min_dist) if needed
Geographic Analysis: Compare UMAP patterns with geographic distributions