latents_extraction

The latents_extraction module provides functionality for extracting latent representations from preprocessed images.

Classes

class nenya.latents_extraction.HDF5RGBDataset(file_path, partition, allowed_indices=None)

A PyTorch dataset for HDF5 data used in latent extraction.

Parameters:

file_path (str) – Path to the HDF5 file
partition (str) – Dataset name in the HDF5 file (e.g., ‘train’, ‘valid’)
allowed_indices (numpy.ndarray, optional) – Set of image indices to include (defaults to all)

__len__()

Return the number of samples in the dataset.

Returns:: Number of samples
Return type:: int

__getitem__(index)

Get a sample from the dataset.

Parameters:: index (int) – Index of the sample
Returns:: Tuple of (data, metadata)
Return type:: tuple

Functions

nenya.latents_extraction.main(opt_path, pp_files, clobber=False, debug=False)

Main function for batch latent extraction.

Parameters:

opt_path (str) – Path to options file
pp_files (list) – List of preprocessed file paths
clobber (bool, optional) – Whether to overwrite existing files. Defaults to False.
debug (bool, optional) – Whether to run in debug mode. Defaults to False.

nenya.latents_extraction.build_loader(data_file, dataset, batch_size=1, num_workers=1, allowed_indices=None)

Create a data loader for latent extraction.

Parameters:

data_file (str) – Path to the data file
dataset (str) – Dataset name in the file (e.g., ‘train’, ‘valid’)
batch_size (int, optional) – Batch size for data loading. Defaults to 1.
num_workers (int, optional) – Number of worker processes. Defaults to 1.
allowed_indices (numpy.ndarray, optional) – Set of image indices to include. Defaults to None (all).

Returns:

Tuple of (dataset, data loader)

Return type:

tuple

nenya.latents_extraction.calc_latent(model, image_tensor, using_gpu)

Calculate latent representations for an image tensor.

Parameters:

model (torch.nn.Module) – Nenya model
image_tensor (torch.Tensor) – Image tensor
using_gpu (bool) – Whether to use GPU

Returns:

Latent vectors as numpy array

Return type:

numpy.ndarray

nenya.latents_extraction.prep(opt)

Prepare the environment for latent extraction.

Parameters:: opt (nenya.params.Params) – Model options
Returns:: Tuple of (model base name, list of existing latent files)
Return type:: tuple

nenya.latents_extraction.model_latents_extract(opt, data_file, model_path, remove_module=True, in_loader=None, partitions=('train', 'valid'), allowed_indices=None, debug=False)

Extract latents from a data file using a model.

Parameters:

opt (nenya.params.Params) – Model options
data_file (str) – Path to the data file
model_path (str) – Path to the model file
remove_module (bool, optional) – Whether to remove ‘module.’ prefix from keys. Defaults to True.
in_loader (torch.utils.data.DataLoader, optional) – Optional pre-configured data loader. Defaults to None.
partitions (tuple, optional) – Dataset partitions to process. Defaults to (‘train’, ‘valid’).
allowed_indices (numpy.ndarray, optional) – Set of image indices to include. Defaults to None (all).
debug (bool, optional) – Whether to run in debug mode. Defaults to False.

Returns:

Dictionary of latent vectors for each partition

Return type:

dict

Example Usage

from nenya.latents_extraction import model_latents_extract, main
from nenya import io as nenya_io

# Extract latents for specific files
pp_files = [
    's3://bucket/PreProc/data_file1_preproc.h5',
    's3://bucket/PreProc/data_file2_preproc.h5'
]

# Batch extraction
main("path/to/opts.json", pp_files, clobber=False)

# Individual extraction
opt, model_path = nenya_io.load_opt('v5')
latent_dict = model_latents_extract(opt, "data_file_preproc.h5", model_path)

# Access latents
valid_latents = latent_dict['valid']
train_latents = latent_dict['train']

Implementation Details

The latent extraction process:

Loads the model and its weights
Creates data loaders for each partition in the data file
Passes batches of images through the model
Collects the latent vectors
Returns a dictionary with latent vectors for each partition

When using the main function, the process also includes:

Downloading files from S3 if necessary
Checking for existing latent files to avoid duplicating work
Saving extracted latents to HDF5 files
Uploading results to S3
Cleaning up temporary files

latents_extraction

Classes

Functions

Example Usage

Implementation Details

Related Modules