Latent Extraction
After training a Nenya model, the next step is to extract latent representations from images. These latent vectors capture meaningful features and patterns in the data.
After training a Nenya model, the next step is to extract latent representations from images. These latent vectors capture meaningful features and patterns in the data.
Basic Latent Extraction
The main function for extracting latents is model_latents_extract in latents_extraction.py:
from nenya.latents_extraction import model_latents_extract
from nenya import io as nenya_io
# Load options and model path
opt, model_path = nenya_io.load_opt('v5')
# Extract latents from a data file
latent_dict = model_latents_extract(opt, data_file, model_path)
# Access latents for different partitions
valid_latents = latent_dict['valid']
train_latents = latent_dict['train']
Single Image Analysis
To extract latents from a single image, you can use functions from analyze_image.py:
from nenya import analyze_image
# Extract latents from a single image
latents, pp_img = analyze_image.get_latents(image, model_file, opt)
# Calculate DT (temperature difference) for the image
DT = analyze_image.calc_DT(image, opt.random_jitter)
# UMAP embed the image
embedding, pp_img, table_file, DT, latents = analyze_image.umap_image('v5', image)
Batch Processing
For batch processing of multiple images, you can use the main function in latents_extraction.py:
from nenya.latents_extraction import main as extract_main
# Path to options file
opt_path = "path/to/opts_file.json"
# List of preprocessed files to process
pp_files = [
's3://bucket/PreProc/data_preproc_1.h5',
's3://bucket/PreProc/data_preproc_2.h5'
]
# Extract latents from all files
extract_main(opt_path, pp_files, clobber=False, debug=False)
This will:
Download each preprocessed file
Extract latents for all images in the file
Save the latents to a new file with “_latents” suffix
Upload the results to S3 (if configured)
Data Loaders for Latent Extraction
Nenya provides custom data loaders for latent extraction:
class HDF5RGBDataset(torch.utils.data.Dataset):
"""Dataset for loading HDF5 data for latent extraction"""
def __init__(self, file_path, partition, allowed_indices=None):
self.file_path = file_path
self.partition = partition
self.meta_dset = partition + '_metadata'
self.h5f = h5py.File(file_path, 'r')
self.allowed_indices = allowed_indices or np.arange(self.h5f[self.partition].shape[0])
# Implementation details...
The build_loader function creates a data loader for efficient batch processing:
def build_loader(data_file, dataset, batch_size=1, num_workers=1, allowed_indices=None):
"""Create a dataloader for latent extraction"""
dset = HDF5RGBDataset(data_file, partition=dataset, allowed_indices=allowed_indices)
loader = torch.utils.data.DataLoader(
dset, batch_size=batch_size, shuffle=False,
collate_fn=id_collate,
drop_last=False, num_workers=num_workers)
return dset, loader
Loading Models for Extraction
Models are loaded using functions from train_util.py:
from nenya.train_util import set_model
# Load the model
model, _ = set_model(opt, cuda_use=using_gpu)
# Load the model state from a file
if not using_gpu:
model_dict = torch.load(model_path, map_location=torch.device('cpu'))
else:
model_dict = torch.load(model_path)
# Load the model weights
if remove_module:
# Remove 'module.' prefix from DataParallel models
new_dict = {}
for key in model_dict['model'].keys():
new_dict[key.replace('module.','')] = model_dict['model'][key]
model.load_state_dict(new_dict)
else:
model.load_state_dict(model_dict['model'])
Computing Latents
The actual computation of latents is handled by the calc_latent function:
def calc_latent(model, image_tensor, using_gpu):
"""Calculate latent representation for an image"""
model.eval()
if using_gpu:
latents_tensor = model(image_tensor.cuda())
latents_numpy = latents_tensor.cpu().numpy()
else:
latents_tensor = model(image_tensor)
latents_numpy = latents_tensor.numpy()
return latents_numpy
Latent Storage Format
Latents are stored in HDF5 files with the following structure:
File name: Original filename with “_latents” suffix
Datasets: -
train: Latent vectors for training set (if present) -valid: Latent vectors for validation setShape:
(n_samples, feat_dim)wherefeat_dimis typically 128 or 512
Working with S3 Storage
When using S3 storage, the workflow typically involves:
Checking if the latent file already exists in S3
Downloading the preprocessed file locally
Extracting latents
Saving results locally
Uploading to S3
Cleaning up local files
Tips for Latent Extraction
Memory Management: Process files in batches to manage memory usage
GPU Acceleration: Use GPU for faster processing when available
Preprocessing: Ensure images are properly preprocessed before extraction
Batch Size: Adjust batch size based on available GPU memory
Model Selection: Choose appropriate model version based on your data (MODIS vs VIIRS)
Exploring Extracted Latents
After extraction, you can explore latents through:
UMAP visualization (see UMAP Analysis)
PCA analysis
Clustering algorithms
Similarity search