Core Concepts

This page explains the key concepts and methodologies used in Nenya.

Self-Supervised Learning

Nenya uses self-supervised learning (SSL) techniques to learn representations from satellite imagery without explicit labels. Specifically, it implements contrastive learning approaches:

SimCLR

SimCLR (Simple Framework for Contrastive Learning of Visual Representations) works by:

  1. Taking an input image and creating two augmented views

  2. Passing both through an encoder network to get feature representations

  3. Applying a projection head to map representations to a space where contrastive loss is applied

  4. Training the network to bring positive pairs (augmentations of the same image) closer while pushing negative pairs (augmentations of different images) apart

SupCon

Supervised Contrastive Learning extends SimCLR by allowing the use of label information when available, though Nenya primarily uses the self-supervised approach.

Latent Space

The “latent space” refers to the compact representation of images learned by the model. In Nenya:

  • Images are encoded into a feature vector (typically 128 or 512 dimensions)

  • These vectors capture meaningful patterns and structures in the data

  • Similar ocean features should have similar latent representations

UMAP Dimensionality Reduction

Uniform Manifold Approximation and Projection (UMAP) is used to reduce the high-dimensional latent space to 2D for visualization:

  • UMAP preserves both local and global structure of the data

  • The 2D coordinates (U0, U1) allow for visual exploration of the latent space

  • Points close in the UMAP visualization represent similar ocean patterns

DT (Temperature Difference)

DT is a key metric in Nenya representing the temperature difference within an image:

  • Calculated as the difference between the 90th and 10th percentile temperatures in the image

  • Reflects the temperature gradient or contrast in the oceanic region

  • Higher DT values typically indicate boundaries between water masses or frontal regions

Data Preprocessing

Before training or inference, images undergo several preprocessing steps:

  • Rotation: Random rotations for data augmentation

  • Flipping: Random horizontal and vertical flips

  • Jitter and Crop: Random spatial jittering and cropping

  • Normalization: Demeaning the image (subtracting the mean)

Data Organization

Nenya organizes data in several formats:

  • HDF5 files: Store preprocessed images and extracted latent vectors

  • Parquet tables: Store metadata and UMAP coordinates for efficient querying

Visualization Portal

The interactive portal provides tools for exploring the latent space:

  • UMAP Plot: Shows the 2D embedding of all images

  • Image Gallery: Displays actual images corresponding to points in the UMAP space

  • Geographic View: Shows the geographic distribution of selected points

  • Matching: Finds similar images based on proximity in latent space

Models

Nenya includes several pre-trained models:

  • v4: MODIS imagery model (earlier version)

  • v5: MODIS imagery model (improved version)

  • viirs_v1: VIIRS imagery model

  • LLC: MIT General Circulation Model imagery

  • CF: Cloud-free specific model

Each model has been trained on specific satellite datasets and may be specialized for different oceanic phenomena.