latents_extraction
The latents_extraction module provides functionality for extracting latent representations from preprocessed images.
Classes
- class nenya.latents_extraction.HDF5RGBDataset(file_path, partition, allowed_indices=None)
A PyTorch dataset for HDF5 data used in latent extraction.
- Parameters:
file_path (str) – Path to the HDF5 file
partition (str) – Dataset name in the HDF5 file (e.g., ‘train’, ‘valid’)
allowed_indices (numpy.ndarray, optional) – Set of image indices to include (defaults to all)
Functions
- nenya.latents_extraction.main(opt_path, pp_files, clobber=False, debug=False)
Main function for batch latent extraction.
- nenya.latents_extraction.build_loader(data_file, dataset, batch_size=1, num_workers=1, allowed_indices=None)
Create a data loader for latent extraction.
- Parameters:
data_file (str) – Path to the data file
dataset (str) – Dataset name in the file (e.g., ‘train’, ‘valid’)
batch_size (int, optional) – Batch size for data loading. Defaults to 1.
num_workers (int, optional) – Number of worker processes. Defaults to 1.
allowed_indices (numpy.ndarray, optional) – Set of image indices to include. Defaults to None (all).
- Returns:
Tuple of (dataset, data loader)
- Return type:
- nenya.latents_extraction.calc_latent(model, image_tensor, using_gpu)
Calculate latent representations for an image tensor.
- Parameters:
model (torch.nn.Module) – Nenya model
image_tensor (torch.Tensor) – Image tensor
using_gpu (bool) – Whether to use GPU
- Returns:
Latent vectors as numpy array
- Return type:
- nenya.latents_extraction.prep(opt)
Prepare the environment for latent extraction.
- Parameters:
opt (nenya.params.Params) – Model options
- Returns:
Tuple of (model base name, list of existing latent files)
- Return type:
- nenya.latents_extraction.model_latents_extract(opt, data_file, model_path, remove_module=True, in_loader=None, partitions=('train', 'valid'), allowed_indices=None, debug=False)
Extract latents from a data file using a model.
- Parameters:
opt (nenya.params.Params) – Model options
data_file (str) – Path to the data file
model_path (str) – Path to the model file
remove_module (bool, optional) – Whether to remove ‘module.’ prefix from keys. Defaults to True.
in_loader (torch.utils.data.DataLoader, optional) – Optional pre-configured data loader. Defaults to None.
partitions (tuple, optional) – Dataset partitions to process. Defaults to (‘train’, ‘valid’).
allowed_indices (numpy.ndarray, optional) – Set of image indices to include. Defaults to None (all).
debug (bool, optional) – Whether to run in debug mode. Defaults to False.
- Returns:
Dictionary of latent vectors for each partition
- Return type:
Example Usage
from nenya.latents_extraction import model_latents_extract, main
from nenya import io as nenya_io
# Extract latents for specific files
pp_files = [
's3://bucket/PreProc/data_file1_preproc.h5',
's3://bucket/PreProc/data_file2_preproc.h5'
]
# Batch extraction
main("path/to/opts.json", pp_files, clobber=False)
# Individual extraction
opt, model_path = nenya_io.load_opt('v5')
latent_dict = model_latents_extract(opt, "data_file_preproc.h5", model_path)
# Access latents
valid_latents = latent_dict['valid']
train_latents = latent_dict['train']
Implementation Details
The latent extraction process:
Loads the model and its weights
Creates data loaders for each partition in the data file
Passes batches of images through the model
Collects the latent vectors
Returns a dictionary with latent vectors for each partition
When using the main function, the process also includes:
Downloading files from S3 if necessary
Checking for existing latent files to avoid duplicating work
Saving extracted latents to HDF5 files
Uploading results to S3
Cleaning up temporary files