.. _concepts:

Core Concepts
============

This page explains the key concepts and methodologies used in Nenya.

Self-Supervised Learning
----------------------

Nenya uses self-supervised learning (SSL) techniques to learn representations from satellite imagery without explicit labels. Specifically, it implements contrastive learning approaches:

SimCLR
~~~~~~

SimCLR (Simple Framework for Contrastive Learning of Visual Representations) works by:

1. Taking an input image and creating two augmented views
2. Passing both through an encoder network to get feature representations
3. Applying a projection head to map representations to a space where contrastive loss is applied
4. Training the network to bring positive pairs (augmentations of the same image) closer while pushing negative pairs (augmentations of different images) apart

SupCon
~~~~~~

Supervised Contrastive Learning extends SimCLR by allowing the use of label information when available, though Nenya primarily uses the self-supervised approach.

Latent Space
-----------

The "latent space" refers to the compact representation of images learned by the model. In Nenya:

- Images are encoded into a feature vector (typically 128 or 512 dimensions)
- These vectors capture meaningful patterns and structures in the data
- Similar ocean features should have similar latent representations

UMAP Dimensionality Reduction
---------------------------

Uniform Manifold Approximation and Projection (UMAP) is used to reduce the high-dimensional latent space to 2D for visualization:

- UMAP preserves both local and global structure of the data
- The 2D coordinates (U0, U1) allow for visual exploration of the latent space
- Points close in the UMAP visualization represent similar ocean patterns

DT (Temperature Difference)
-------------------------

DT is a key metric in Nenya representing the temperature difference within an image:

- Calculated as the difference between the 90th and 10th percentile temperatures in the image
- Reflects the temperature gradient or contrast in the oceanic region
- Higher DT values typically indicate boundaries between water masses or frontal regions

Data Preprocessing
---------------

Before training or inference, images undergo several preprocessing steps:

- **Rotation**: Random rotations for data augmentation
- **Flipping**: Random horizontal and vertical flips
- **Jitter and Crop**: Random spatial jittering and cropping
- **Normalization**: Demeaning the image (subtracting the mean)

Data Organization
--------------

Nenya organizes data in several formats:

- **HDF5 files**: Store preprocessed images and extracted latent vectors
- **Parquet tables**: Store metadata and UMAP coordinates for efficient querying

Visualization Portal
-----------------

The interactive portal provides tools for exploring the latent space:

- **UMAP Plot**: Shows the 2D embedding of all images
- **Image Gallery**: Displays actual images corresponding to points in the UMAP space
- **Geographic View**: Shows the geographic distribution of selected points
- **Matching**: Finds similar images based on proximity in latent space

Models
-----

Nenya includes several pre-trained models:

- **v4**: MODIS imagery model (earlier version)
- **v5**: MODIS imagery model (improved version)
- **viirs_v1**: VIIRS imagery model
- **LLC**: MIT General Circulation Model imagery
- **CF**: Cloud-free specific model

Each model has been trained on specific satellite datasets and may be specialized for different oceanic phenomena.