.. _model_training:

Model Training
============

Nenya implements self-supervised contrastive learning for training models on satellite imagery. This page describes the training process, model architecture, and configuration options.

Training Process
--------------

The main training loop is implemented in ``train.py``:

.. code-block:: python

   from nenya.train import main as train_main
   
   # Train a model using options from a JSON file
   train_main("path/to/opts_file.json", debug=False)

The training function handles:

1. Loading and preprocessing options
2. Building the model and criterion (loss function)
3. Setting up the optimizer
4. Training for the specified number of epochs
5. Validating the model periodically
6. Saving checkpoints and final model

Model Architecture
---------------

The default model architecture is based on ResNet with a customized projection head for contrastive learning:

.. code-block:: python

   from nenya.models.resnet_big import SupConResNet
   
   # Create a model with a specific backbone and feature dimension
   model = SupConResNet(name='resnet50', feat_dim=128)

Nenya supports several ResNet variants ('resnet18', 'resnet34', 'resnet50', etc.) which can be specified in the options file.

Loss Function
-----------

Nenya uses a contrastive loss function implemented in ``losses.py``:

.. code-block:: python

   from nenya.losses import SupConLoss
   
   # Create a loss function with a specific temperature
   criterion = SupConLoss(temperature=0.07)

The contrastive loss encourages representations of different augmentations of the same image to be similar, while pushing representations of different images apart.

Option Configuration
-----------------

Training options are specified in a JSON file and loaded using the ``Params`` class:

.. code-block:: python

   from nenya import params
   
   # Load options from a JSON file
   opt = params.Params("path/to/opts_file.json")
   
   # Preprocess options (set derived values)
   params.option_preprocess(opt)

Key training options include:

.. code-block:: javascript

   {
     "ssl_method": "SimCLR",     // Training method (SimCLR or SupCon)
     "ssl_model": "resnet50",    // Backbone model
     "learning_rate": 0.05,      // Initial learning rate
     "batch_size_train": 64,     // Batch size for training
     "batch_size_valid": 64,     // Batch size for validation
     "epochs": 200,              // Number of epochs
     "feat_dim": 128,            // Feature dimension size
     "temp": 0.07,               // Temperature parameter for loss
     "weight_decay": 1e-4,       // Weight decay for optimizer
     "momentum": 0.9,            // Momentum for optimizer
     "cosine": true,             // Use cosine learning rate schedule
     "random_jitter": [5, 5],    // Jitter parameters for augmentation
     "model_root": "models/v5",  // Root directory for model output
     "train_key": "train",       // Dataset key for training
     "valid_key": "valid",       // Dataset key for validation
     "save_freq": 10,            // Save checkpoint every N epochs
     "valid_freq": 5             // Validate every N epochs
   }

Data Loaders
-----------

Training and validation data loaders are created using the ``nenya_loader`` function:

.. code-block:: python

   from nenya.train_util import nenya_loader
   
   # Create a training data loader
   train_loader = nenya_loader(opt, valid=False)
   
   # Create a validation data loader
   valid_loader = nenya_loader(opt, valid=True)

These loaders apply the appropriate transformations and augmentations to the input images.

Training Loop
-----------

The core training loop is implemented in ``train_model``:

.. code-block:: python

   from nenya.train_util import train_model
   
   # Train for one epoch
   loss, losses_step, losses_avg = train_model(
       train_loader, model, criterion, optimizer, epoch, opt, 
       cuda_use=opt.cuda_use)

For each batch, the function:

1. Loads images and applies augmentations
2. Forwards the augmented views through the model
3. Calculates the contrastive loss
4. Updates the model parameters through backpropagation

Learning Rate Schedule
-------------------

Nenya supports learning rate warmup and decay:

.. code-block:: python

   from nenya.util import adjust_learning_rate, warmup_learning_rate
   
   # Adjust learning rate according to epoch
   adjust_learning_rate(opt, optimizer, epoch)
   
   # Apply warmup to the learning rate within an epoch
   warmup_learning_rate(opt, epoch, idx, len(train_loader), optimizer)

Model Saving
----------

Models are saved periodically during training and at the end:

.. code-block:: python

   from nenya.util import save_model
   
   # Save model checkpoint
   save_file = os.path.join(opt.model_folder, f'ckpt_epoch_{epoch}.pth')
   save_model(model, optimizer, opt, epoch, save_file)

Monitoring Training
-----------------

Training progress is monitored using the ``AverageMeter`` class:

.. code-block:: python

   from nenya.util import AverageMeter
   
   # Create meters for tracking statistics
   batch_time = AverageMeter()
   data_time = AverageMeter()
   losses = AverageMeter()
   
   # Update meter with new values
   losses.update(loss.item(), bsz)

Learning curves (loss over time) are saved to HDF5 files for later analysis:

.. code-block:: python

   with h5py.File(losses_file_train, 'w') as f:
       f.create_dataset('loss_train', data=np.array(loss_train))
       f.create_dataset('loss_step_train', data=np.array(loss_step_train))
       f.create_dataset('loss_avg_train', data=np.array(loss_avg_train))

Multi-GPU Training
---------------

Nenya supports multi-GPU training through PyTorch's DataParallel:

.. code-block:: python

   if torch.cuda.is_available() and cuda_use:
       if torch.cuda.device_count() > 1:
           model.encoder = torch.nn.DataParallel(model.encoder)
       model = model.cuda()
       criterion = criterion.cuda()
       cudnn.benchmark = True

Training Tips
-----------

1. **Batch Size**: Larger batch sizes generally work better for contrastive learning. If GPU memory is limited, consider using gradient accumulation.
2. **Temperature**: The temperature parameter in the loss function controls the concentration of the distribution. Lower values (e.g., 0.07) typically work well.
3. **Learning Rate**: A cosine learning rate schedule with warmup often leads to better results.
4. **Augmentations**: Strong augmentations are crucial for contrastive learning. Experiment with different combinations of rotation, jitter, and flips.
5. **Feature Dimension**: A higher feature dimension (e.g., 128 or 256) generally captures more information but requires more GPU memory.