Core Concepts

This page explains the key concepts and methodologies used in Nenya.

Self-Supervised Learning

Nenya uses self-supervised learning (SSL) techniques to learn representations from satellite imagery without explicit labels. Specifically, it implements contrastive learning approaches:

SimCLR

SimCLR (Simple Framework for Contrastive Learning of Visual Representations) works by:

Taking an input image and creating two augmented views
Passing both through an encoder network to get feature representations
Applying a projection head to map representations to a space where contrastive loss is applied
Training the network to bring positive pairs (augmentations of the same image) closer while pushing negative pairs (augmentations of different images) apart

SupCon

Supervised Contrastive Learning extends SimCLR by allowing the use of label information when available, though Nenya primarily uses the self-supervised approach.

Latent Space

The “latent space” refers to the compact representation of images learned by the model. In Nenya:

Images are encoded into a feature vector (typically 128 or 512 dimensions)
These vectors capture meaningful patterns and structures in the data
Similar ocean features should have similar latent representations

UMAP Dimensionality Reduction

Uniform Manifold Approximation and Projection (UMAP) is used to reduce the high-dimensional latent space to 2D for visualization:

UMAP preserves both local and global structure of the data
The 2D coordinates (U0, U1) allow for visual exploration of the latent space
Points close in the UMAP visualization represent similar ocean patterns

DT (Temperature Difference)

DT is a key metric in Nenya representing the temperature difference within an image:

Calculated as the difference between the 90th and 10th percentile temperatures in the image
Reflects the temperature gradient or contrast in the oceanic region
Higher DT values typically indicate boundaries between water masses or frontal regions

Data Preprocessing

Before training or inference, images undergo several preprocessing steps:

Rotation: Random rotations for data augmentation
Flipping: Random horizontal and vertical flips
Jitter and Crop: Random spatial jittering and cropping
Normalization: Demeaning the image (subtracting the mean)

Data Organization

Nenya organizes data in several formats:

HDF5 files: Store preprocessed images and extracted latent vectors
Parquet tables: Store metadata and UMAP coordinates for efficient querying

Visualization Portal

The interactive portal provides tools for exploring the latent space:

UMAP Plot: Shows the 2D embedding of all images
Image Gallery: Displays actual images corresponding to points in the UMAP space
Geographic View: Shows the geographic distribution of selected points
Matching: Finds similar images based on proximity in latent space

Models

Nenya includes several pre-trained models:

v4: MODIS imagery model (earlier version)
v5: MODIS imagery model (improved version)
viirs_v1: VIIRS imagery model
LLC: MIT General Circulation Model imagery
CF: Cloud-free specific model

Each model has been trained on specific satellite datasets and may be specialized for different oceanic phenomena.