Core Concepts
This page explains the key concepts and methodologies used in Nenya.
Self-Supervised Learning
Nenya uses self-supervised learning (SSL) techniques to learn representations from satellite imagery without explicit labels. Specifically, it implements contrastive learning approaches:
SimCLR
SimCLR (Simple Framework for Contrastive Learning of Visual Representations) works by:
Taking an input image and creating two augmented views
Passing both through an encoder network to get feature representations
Applying a projection head to map representations to a space where contrastive loss is applied
Training the network to bring positive pairs (augmentations of the same image) closer while pushing negative pairs (augmentations of different images) apart
SupCon
Supervised Contrastive Learning extends SimCLR by allowing the use of label information when available, though Nenya primarily uses the self-supervised approach.
Latent Space
The “latent space” refers to the compact representation of images learned by the model. In Nenya:
Images are encoded into a feature vector (typically 128 or 512 dimensions)
These vectors capture meaningful patterns and structures in the data
Similar ocean features should have similar latent representations
UMAP Dimensionality Reduction
Uniform Manifold Approximation and Projection (UMAP) is used to reduce the high-dimensional latent space to 2D for visualization:
UMAP preserves both local and global structure of the data
The 2D coordinates (U0, U1) allow for visual exploration of the latent space
Points close in the UMAP visualization represent similar ocean patterns
DT (Temperature Difference)
DT is a key metric in Nenya representing the temperature difference within an image:
Calculated as the difference between the 90th and 10th percentile temperatures in the image
Reflects the temperature gradient or contrast in the oceanic region
Higher DT values typically indicate boundaries between water masses or frontal regions
Data Preprocessing
Before training or inference, images undergo several preprocessing steps:
Rotation: Random rotations for data augmentation
Flipping: Random horizontal and vertical flips
Jitter and Crop: Random spatial jittering and cropping
Normalization: Demeaning the image (subtracting the mean)
Data Organization
Nenya organizes data in several formats:
HDF5 files: Store preprocessed images and extracted latent vectors
Parquet tables: Store metadata and UMAP coordinates for efficient querying
Visualization Portal
The interactive portal provides tools for exploring the latent space:
UMAP Plot: Shows the 2D embedding of all images
Image Gallery: Displays actual images corresponding to points in the UMAP space
Geographic View: Shows the geographic distribution of selected points
Matching: Finds similar images based on proximity in latent space
Models
Nenya includes several pre-trained models:
v4: MODIS imagery model (earlier version)
v5: MODIS imagery model (improved version)
viirs_v1: VIIRS imagery model
LLC: MIT General Circulation Model imagery
CF: Cloud-free specific model
Each model has been trained on specific satellite datasets and may be specialized for different oceanic phenomena.