Available Modules¶

List of all files, classes and methods available in the library.

dataset.py¶

class keras_wrapper.dataset.Data_Batch_Generator(set_split, net, dataset, num_iterations, batch_size=50, normalization=False, normalization_type=None, data_augmentation=True, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, mean_substraction=False, predict=False, random_samples=-1, shuffle=True, temporally_linked=False, init_sample=-1, final_sample=-1)¶

Batch generator class. Retrieves batches of data.

generator()¶: Gets and processes the data :return: generator with the data

class keras_wrapper.dataset.Dataset(name, path, pad_symbol='<pad>', unk_symbol='<unk>', null_symbol='<null>', silence=False)¶

Class for defining instances of databases adapted for Keras. It includes several utility functions for easily managing data splits, image loading, mean calculation, etc.

apply_label_smoothing(y, discount, vocabulary_len, discount_type='uniform')¶

Applies label smoothing to a one-hot codified vector. :param y_text: Input to smooth :param discount: Discount to apply :param vocabulary_len: Length of the one-hot vectors :param discount_type: Type of smoothing. Types supported:

‘uniform’: Subtract a ‘label_smoothing_discount’ from the label and distribute it uniformly among all labels.

Returns:

build_bpe(codes, merges=-1, separator='@@', vocabulary=None, glossaries=None)¶

Constructs a BPE encoder instance. Currently, vocabulary and glossaries options are not implemented. :param codes: File with BPE codes (created by learn_bpe.py) :param separator: Separator between non-final subword units (default: ‘@@’)) :param vocabulary: Vocabulary file. If provided, this script reverts any merge operations that produce an OOV. :param glossaries: The strings provided in glossaries will not be affected

by the BPE (i.e. they will neither be broken into subwords, nor concatenated with other subwords.

Returns:	None

build_moses_detokenizer(language='en')¶

Constructs a BPE encoder instance. Currently, vocabulary and glossaries options are not implemented. :param codes: File with BPE codes (created by learn_bpe.py) :param separator: Separator between non-final subword units (default: ‘@@’)) :param vocabulary: Vocabulary file. If provided, this script reverts any merge operations that produce an OOV. :param glossaries: The strings provided in glossaries will not be affected

by the BPE (i.e. they will neither be broken into subwords, nor concatenated with other subwords.

Returns:	None

build_moses_tokenizer(language='en')¶: Constructs a Moses tokenizer instance. :param language: Tokenizer language. :return: None

build_vocabulary(captions, data_id, do_split=True, min_occ=0, n_words=0, split_symbol=' ', use_extra_words=True, use_unk_class=False, is_val=False)¶

Vocabulary builder for data of type ‘text’

Parameters:

use_extra_words –
captions – Corpus sentences
data_id – Dataset id of the text
do_split – Split sentence by words or use the full sentence as a class.
split_symbol – symbol used for separating the elements in each sentence
min_occ – Minimum occurrences of each word to be included in the dictionary.
n_words – Maximum number of words to include in the dictionary.
is_val – Set to True if the input ‘captions’ are values and we want to keep them sorted

Returns:

None.

calculateTrainMean(data_id)¶: Calculates the mean of the data belonging to the training set split in each channel.

convert_3DLabels_to_bboxes(predictions, original_sizes, threshold=0.5, idx_3DLabel=0, size_restriction=0.001)¶

Converts a set of predictions of type 3DLabel to their corresponding bounding boxes.

Parameters:	idx_3DLabel – size_restriction – predictions – 3DLabels predicted by Model_Wrapper. If type is list it will be assumed that position 0 corresponds to 3DLabels original_sizes – original sizes of the predicted images width and height threshold – minimum overlapping threshold for considering a prediction valid
Returns:	predicted_bboxes, predicted_Y, predicted_scores for each image

static convert_GT_3DLabels_to_bboxes(gt)¶

Converts a GT list of 3DLabels to a set of bboxes.

Parameters:	gt – list of Dataset output of type 3DLabels
Returns:	[out_list, original_sizes], where out_list contains a list of samples with the following info [GT_bboxes, GT_Y], and original_sizes contains the original width and height for each image

static detokenize_bpe(caption, separator='@@')¶: Reverts BPE segmentation (https://github.com/rsennrich/subword-nmt) :param caption: Caption to detokenize. :param separator: BPE separator. :return: Detokenized version of caption.

detokenize_moses(caption, language='en', lowercase=False, return_str=True, unescape=True)¶

Applies the Moses detokenization. Relying on sacremoses’ implementation of the Moses tokenizer.

Parameters:	caption – Sentence to tokenize language – Language (will build the tokenizer for this language) lowercase – Whether to lowercase or not the sentence agressive_dash_splits – Option to trigger dash split rules . return_str – Return string or list escape – Escape HTML special chars
Returns:

static detokenize_none(caption)¶: Dummy function: Keeps the caption as it is. :param caption: String to de-tokenize. :return: Same caption.

static detokenize_none_char(caption)¶

Character-level detokenization. Respects all symbols. Joins chars into words. Words are delimited by the <space> token. If found an special character is converted to the escaped char. # List of escaped chars (by moses tokenizer)

& -> & | -> | < -> < > -> > ‘ -> ' ” -> " [ -> [ ] -> ]

Parameters:	caption – String to de-tokenize. :return: Detokenized version of caption.

getClassID(class_name, data_id)¶

Returns:	the class data_id (int) for a given class string.

getFramesPaths(idx_videos, data_id, set_name, max_len, data_augmentation)¶: Recovers the paths from the selected video frames.

getImageFromPrediction_3DSemanticLabel(img, n_classes)¶

Get the segmented image from the prediction of the model using the semantic classes of the dataset together with their corresponding colours.

Parameters:	img – Prediction of the model. n_classes – Number of semantic classes.
Returns:	out_img: The segmented image with the class colours.

getX(set_name, init, final, normalization_type='(-1)-1', normalization=False, meanSubstraction=False, dataAugmentation=False, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, get_only_ids=False)¶

Gets all the data samples stored between the positions init to final

Parameters:	set_name – ‘train’, ‘val’ or ‘test’ set init – initial position in the corresponding set split. Must be bigger or equal than 0 and smaller than final. final – final position in the corresponding set split.

# ‘raw-image’, ‘video’, ‘image-features’ and ‘video-features’-related parameters :param normalization: indicates if we want to normalize the data. # ‘image-features’ and ‘video-features’-related parameters :param normalization_type: indicates the type of normalization applied.

See available types in self.__available_norm_im_vid for ‘raw-image’ and ‘video’ and self.__available_norm_feat for ‘image-features’ and ‘video-features’.

# ‘raw-image’ and ‘video’-related parameters :param meanSubstraction: indicates if we want to substract the training mean from the returned images

(only applicable if normalization=True)

Parameters:	dataAugmentation – indicates if we want to apply data augmentation to the loaded images (random flip and cropping)
Returns:	X, list of input data variables from sample ‘init’ to ‘final’ belonging to the chosen ‘set_name’

getXY(set_name, k, normalization_type='(-1)-1', normalization=False, meanSubstraction=False, dataAugmentation=False, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, get_only_ids=False)¶

Gets the [X,Y] pairs for the next ‘k’ samples in the desired set. :param set_name: ‘train’, ‘val’ or ‘test’ set :param k: number of consecutive samples retrieved from the corresponding set. # ‘raw-image’, ‘video’, ‘image-features’ and ‘video-features’-related parameters :param normalization: indicates if we want to normalize the data. # ‘image-features’ and ‘video-features’-related parameters :param normalization_type: indicates the type of normalization applied. See available types in

self.__available_norm_im_vid for ‘image’ and ‘video’ and self.__available_norm_feat for ‘image-features’ and ‘video-features’.

# ‘raw-image’ and ‘video’-related parameters :param meanSubstraction: indicates if we want to substract the training mean from the returned images

(only applicable if normalization=True)

Parameters:	dataAugmentation – indicates if we want to apply data augmentation to the loaded images (random flip and cropping)
Returns:	[X,Y], list of input and output data variables of the next ‘k’ consecutive samples belonging to the chosen ‘set_name’

getXY_FromIndices(set_name, k, normalization_type='(-1)-1', normalization=False, meanSubstraction=False, dataAugmentation=False, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, get_only_ids=False)¶

Gets the [X,Y] pairs for the samples in positions ‘k’ in the desired set. :param set_name: ‘train’, ‘val’ or ‘test’ set :param k: positions of the desired samples # ‘raw-image’, ‘video’, ‘image-features’ and ‘video-features’-related parameters :param normalization: indicates if we want to normalize the data. # ‘image-features’ and ‘video-features’-related parameters :param normalization_type: indicates the type of normalization applied. See available types in

self.__available_norm_im_vid for ‘raw-image’ and ‘video’ and self.__available_norm_feat for ‘image-features’ and ‘video-features’.

# ‘raw-image’ and ‘video’-related parameters :param meanSubstraction: indicates if we want to substract the training mean from the returned images

(only applicable if normalization=True)

Parameters:	dataAugmentation – indicates if we want to apply data augmentation to the loaded images (random flip and cropping)
Returns:	[X,Y], list of input and output data variables of the samples identified by the indices in ‘k’ samples belonging to the chosen ‘set_name’

getX_FromIndices(set_name, k, normalization_type='(-1)-1', normalization=False, meanSubstraction=False, dataAugmentation=False, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, get_only_ids=False)¶

Gets the [X,Y] pairs for the samples in positions ‘k’ in the desired set. :param set_name: ‘train’, ‘val’ or ‘test’ set :param k: positions of the desired samples # ‘raw-image’, ‘video’, ‘image-features’ and ‘video-features’-related parameters :param normalization: indicates if we want to normalize the data. # ‘image-features’ and ‘video-features’-related parameters :param normalization_type: indicates the type of normalization applied. See available types in

self.__available_norm_im_vid for ‘raw-image’ and ‘video’ and self.__available_norm_feat for ‘image-features’ and ‘video-features’.

# ‘raw-image’ and ‘video’-related parameters :param meanSubstraction: indicates if we want to substract the training mean from the returned images

(only applicable if normalization=True)

Parameters:	dataAugmentation – indicates if we want to apply data augmentation to the loaded images (random flip and cropping)
Returns:	[X,Y], list of input and output data variables of the samples identified by the indices in ‘k’ samples belonging to the chosen ‘set_name’

getY(set_name, init, final, dataAugmentation=False, get_only_ids=False)¶

Gets the [Y] samples for the FULL dataset :param set_name: ‘train’, ‘val’ or ‘test’ set :param init: initial position in the corresponding set split. Must be bigger or equal than 0 and smaller than

final.

Parameters:	final – final position in the corresponding set split.
Returns:	Y, list of output data variables from sample ‘init’ to ‘final’ belonging to the chosen ‘set_name’

getY_FromIndices(set_name, k, dataAugmentation=False, return_mask=True, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, get_only_ids=False)¶

Gets the [Y] pairs for the samples in positions ‘k’ in the desired set. :param set_name: ‘train’, ‘val’ or ‘test’ set :param k: positions of the desired samples # ‘raw-image’, ‘video’, ‘image-features’ and ‘video-features’-related parameters :param normalization: indicates if we want to normalize the data. # ‘image-features’ and ‘video-features’-related parameters :param normalization_type: indicates the type of normalization applied. See available types in

self.__available_norm_im_vid for ‘raw-image’ and ‘video’ and self.__available_norm_feat for ‘image-features’ and ‘video-features’.

# ‘raw-image’ and ‘video’-related parameters :param meanSubstraction: indicates if we want to substract the training mean from the returned images

(only applicable if normalization=True)

Parameters:	dataAugmentation – indicates if we want to apply data augmentation to the loaded images (random flip and cropping)
Returns:	[X,Y], list of input and output data variables of the samples identified by the indices in ‘k’ samples belonging to the chosen ‘set_name’

keepTopOutputs(set_name, id_out, n_top)¶: Keep the most frequent outputs from a set_name. :param set_name: Set name to modify. :param id_out: Id. :param n_top: Number of elements to keep. :return:

static load3DLabels(bbox_list, nClasses, dataAugmentation, daRandomParams, img_size, size_crop, image_list)¶

Loads a set of outputs of the type 3DLabel (used for detection)

Parameters:

bbox_list – list of bboxes, labels and original sizes
nClasses – number of different classes to be detected
dataAugmentation – are we applying data augmentation?
daRandomParams – random parameters applied on data augmentation (vflip, hflip and random crop)
img_size – resized applied to input images
size_crop – crop size applied to input images
image_list – list of input images used as identifiers to ‘daRandomParams’

Returns:

3DLabels with shape (batch_size, width*height, classes)

load3DSemanticLabels(labeled_images_list, nClasses, classes_to_colour, dataAugmentation, daRandomParams, img_size, size_crop, image_list)¶

Loads a set of outputs of the type 3DSemanticLabel (used for semantic segmentation TRAINING)

Parameters:

labeled_images_list – list of labeled images
nClasses – number of different classes to be detected
classes_to_colour – dictionary relating each class id to their corresponding colour in the labeled image
dataAugmentation – are we applying data augmentation?
daRandomParams – random parameters applied on data augmentation (vflip, hflip and random crop)
img_size – resized applied to input images
size_crop – crop size applied to input images
image_list – list of input images used as identifiers to ‘daRandomParams’

Returns:

3DSemanticLabels with shape (batch_size, width*height, classes)

loadBinary(y_raw, data_id)¶: Load a binary vector. May be of type ‘sparse’ :param y_raw: Vector to load. :param data_id: Id to load. :return:

static loadCategorical(y_raw, nClasses)¶: Converts a class vector (integers) to binary class matrix. From utils. :param y_raw: class vector to be converted into a matrix (integers from 0 to num_classes). :param nClasses: total number of classes. :return:

loadFeatures(X, feat_len, normalization_type='L2', normalization=False, loaded=False, external=False, data_augmentation=True)¶

Loads and normalizes features.

Parameters:

X – Features to load.
feat_len – Length of the features.
normalization_type – Normalization to perform to the features (see: self.__available_norm_feat)
normalization – Whether to normalize or not the features.
loaded – Flag that indicates if these features have been already loaded.
external – Boolean indicating if the paths provided in ‘X’ are absolute paths to external images
data_augmentation – Perform data augmentation (with mean=0.0, std_dev=0.01)

Returns:

Loaded features as numpy array

loadImages(images, data_id, normalization_type='(-1)-1', normalization=False, meanSubstraction=False, dataAugmentation=False, daRandomParams=None, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, useBGR=False, external=False, loaded=False)¶

Loads a set of images from disk.

:param images : list of image string names or list of matrices representing images (only if loaded==True) :param data_id : identifier in the Dataset object of the data we are loading :param normalization_type: type of normalization applied :param normalization : whether we applying a ‘0-1’ or ‘(-1)-1’ normalization to the images :param meanSubstraction : whether we are removing the training mean :param dataAugmentation : whether we are applying dataAugmentatino (random cropping and horizontal flip) :param daRandomParams : dictionary with results of random data augmentation provided by

self.getDataAugmentationRandomParams()

:param external : if True the images will be loaded from an external database, in this case the list of: images must be absolute paths

:param loaded : set this option to True if images is a list of matricies instead of a list of strings

loadMapping(path_list)¶: Loads a mapping of Source – Target words. :param path_list: Pickle object with the mapping :return: None

loadText(X, vocabularies, max_len, offset, fill, pad_on_batch, words_so_far, loading_X=False)¶

Text encoder: Transforms samples from a text representation into a numerical one. It also masks the text.

Parameters:

X – Text to encode.
vocabularies – Mapping word -> index
max_len – Maximum length of the text.
offset – Shifts the text to the right, adding null symbol at the start
fill – ‘start’: the resulting vector will be filled with 0s at the beginning. ‘end’: it will be filled with 0s at the end. ‘center’: the vector will be surrounded by 0s, both at beginning and end.
pad_on_batch – Whether we get sentences with length of the maximum length of the minibatch or sentences with a fixed (max_text_length) length.
words_so_far – Experimental feature. Use with caution.
loading_X – Whether we are loading an input or an output of the model

Returns:

Text as sequence of number. Mask for each sentence.

loadTextFeatures(X, max_len, pad_on_batch, offset)¶

Text encoder: Transforms samples from a text representation into a numerical one. It also masks the text.

Parameters:	X – Encoded text. max_len – Maximum length of the text. pad_on_batch – Whether we get sentences with length of the maximum length of the minibatch or sentences with a fixed (max_text_length) length.
Returns:	Text as sequence of numbers. Mask for each sentence.

loadTextFeaturesOneHot(X, vocabulary_len, max_len, pad_on_batch, offset, sample_weights=False, label_smoothing=0.0)¶

Text encoder: Transforms samples from a text representation into a one-hot. It also masks the text. :param X: Encoded text. :param vocabulary_len: Length of the vocabulary (size of the one-hot vector) :param sample_weights: If True, we also return the mask of the text. :param vocabularies: Mapping word -> index :param max_len: Maximum length of the text. :param offset: Shifts the text to the right, adding null symbol at the start :param fill: ‘start’: the resulting vector will be filled with 0s at the beginning.

‘end’: it will be filled with 0s at the end. ‘center’: the vector will be surrounded by 0s, both at beginning and end.

Parameters:	pad_on_batch – Whether we get sentences with length of the maximum length of the minibatch or sentences with a fixed (max_text_length) length. words_so_far – Experimental feature. Use with caution. loading_X – Whether we are loading an input or an output of the model
Returns:	Text as sequence of one-hot vectors. Mask for each sentence.

loadTextOneHot(X, vocabularies, vocabulary_len, max_len, offset, fill, pad_on_batch, words_so_far, sample_weights=False, loading_X=False, label_smoothing=0.0)¶

Text encoder: Transforms samples from a text representation into a one-hot. It also masks the text.

Parameters:

vocabulary_len –
sample_weights –
X – Text to encode.
vocabularies – Mapping word -> index
max_len – Maximum length of the text.
offset – Shifts the text to the right, adding null symbol at the start
fill – ‘start’: the resulting vector will be filled with 0s at the beginning. ‘end’: it will be filled with 0s at the end. ‘center’: the vector will be surrounded by 0s, both at beginning and end.
pad_on_batch – Whether we get sentences with length of the maximum length of the minibatch or sentences with a fixed (max_text_length) length.
words_so_far – Experimental feature. Use with caution.
loading_X – Whether we are loading an input or an output of the model

Returns:

Text as sequence of one-hot vectors. Mask for each sentence.

loadVideoFeatures(idx_videos, data_id, set_name, max_len, normalization_type, normalization, feat_len, external=False, data_augmentation=True)¶

Parameters:

idx_videos – indices of the videos in the complete list of the current set_name
data_id – identifier of the input/output that we are loading
set_name – ‘train’, ‘val’ or ‘test’
max_len – maximum video length (number of frames)
normalization_type – type of data normalization applied
normalization – Switch on/off data normalization
feat_len – length of the features about to load
external – Switch on/off data loading from external dataset (not sharing self.path)
data_augmentation – Switch on/off data augmentation

Returns:

loadVideos(n_frames, data_id, last, set_name, max_len, normalization_type, normalization, meanSubstraction, dataAugmentation)¶

Loads a set of videos from disk. (Untested!)

Parameters:

n_frames – Number of frames per video
data_id – Id to load
last – Last video loaded
set_name – ‘train’, ‘val’, ‘test’
max_len – Maximum length of videos
normalization_type – Type of normalization applied
normalization – Whether we apply a 0-1 normalization to the images
meanSubstraction – Whether we are removing the training mean
dataAugmentation – Whether we are applying dataAugmentatino (random cropping and horizontal flip)

loadVideosByIndex(n_frames, data_id, indices, set_name, max_len, normalization_type, normalization, meanSubstraction, dataAugmentation)¶: Get videos by indices. :param n_frames: Indices of the frames to load from each video. :param data_id: Data id to be processed. :param indices: Indices of the videos to load. :param set_name: Set name to be processed. :param max_len: Maximum length of each video. :param normalization_type: Normalization type applied to the frames. :param normalization: Normalization applied to the frames. :param meanSubstraction: Mean subtraction applied to the frames. :param dataAugmentation: Whether apply data augmentation. :return:

load_GT_3DSemanticLabels(gt, data_id)¶

Loads a GT list of 3DSemanticLabels in a 2D matrix and reshapes them to an Nx1 array (EVALUATION)

Parameters:	gt – list of Dataset output of type 3DSemanticLabels data_id – id of the input/output we are processing
Returns:	out_list: containing a list of label images reshaped as an Nx1 array

merge_vocabularies(ids)¶

Merges the vocabularies from a set of text inputs/outputs into a single one.

Parameters:	ids – identifiers of the inputs/outputs whose vocabularies will be merged
Returns:	None

preprocess3DSemanticLabel(path_list, data_id, associated_id_in, num_poolings)¶: Preprocess 3D Semantic labels

preprocessBinary(labels_list, data_id, sparse)¶

Preprocesses binary classes.

Parameters:	data_id – labels_list – Binary label list given as an instance of the class list. sparse – indicates if the data is stored as a list of lists with class indices, e.g. [[4, 234],[87, 222, 4568],[3],…]
Returns:	Preprocessed labels.

preprocessCategorical(labels_list, data_id, sample_weights=False)¶

Preprocesses categorical data.

Parameters:	data_id – sample_weights – labels_list – Label list. Given as a path to a file or as an instance of the class list.
Returns:	Preprocessed labels.

preprocessFeatures(path_list, data_id, set_name, feat_len)¶

Preprocesses features. We should give a path to a text file where each line must contain a path to a .npy file storing a feature vector. Alternatively “path_list” can be an instance of the class list.

Parameters:	path_list – Path to a text file where each line must contain a path to a .npy file storing a feature vector. Alternatively, instance of the class list. data_id – Dataset id set_name – Used? feat_len – Length of features. If all features have the same length, given as a number. Otherwise, list.
Returns:	Preprocessed features

static preprocessIDs(path_list, data_id, set_name)¶: Preprocess ID outputs: Strip and put each ID in a line.

preprocessImages(path_list, data_id, set_name, img_size, img_size_crop, use_RGB)¶: Image preprocessing function. :param path_list: Path to the images. :param data_id: Data id. :param set_name: Set name. :param img_size: Size of the images to process. :param img_size_crop: Size of the image crops. :param use_RGB: Whether use RGB color encoding. :return:

static preprocessReal(labels_list)¶

Preprocesses real classes.

Parameters:	labels_list – Label list. Given as a path to a file or as an instance of the class list.
Returns:	Preprocessed labels.

preprocessText(annotations_list, data_id, set_name, tokenization, build_vocabulary, max_text_len, max_words, offset, fill, min_occ, pad_on_batch, words_so_far, bpe_codes=None, separator='@@', use_unk_class=False)¶

Preprocess ‘text’ data type: Builds vocabulary (if necessary) and preprocesses the sentences. Also sets Dataset parameters.

Parameters:

annotations_list – Path to the sentences to process.
data_id – Dataset id of the data.
set_name – Name of the current set (‘train’, ‘val’, ‘test’)
tokenization – Tokenization to perform.
build_vocabulary – Whether we should build a vocabulary for this text or not.
max_text_len – Maximum length of the text. If max_text_len == 0, we treat the full sentence as a class.
max_words – Maximum number of words to include in the dictionary.
offset – Text shifting.
fill – Whether we path with zeros at the beginning or at the end of the sentences.
min_occ – Minimum occurrences of each word to be included in the dictionary.
pad_on_batch – Whether we get sentences with length of the maximum length of the minibatch or sentences with a fixed (max_text_length) length.
words_so_far – Experimental feature. Should be ignored.
bpe_codes – Codes used for applying BPE encoding.
separator – BPE encoding separator.
use_unk_class – Add a special class for the unknown word when maxt_text_len == 0.

Returns:

Preprocessed sentences.

preprocessTextFeatures(annotations_list, data_id, set_name, tokenization, build_vocabulary, max_text_len, max_words, offset, fill, min_occ, pad_on_batch, words_so_far, bpe_codes=None, separator='@@', use_unk_class=False)¶

Preprocess ‘text’ data type: Builds vocabulary (if necessary) and preprocesses the sentences. Also sets Dataset parameters.

Parameters:

annotations_list – Path to the sentences to process.
data_id – Dataset id of the data.
set_name – Name of the current set (‘train’, ‘val’, ‘test’)
tokenization – Tokenization to perform.
build_vocabulary – Whether we should build a vocabulary for this text or not.
max_text_len – Maximum length of the text. If max_text_len == 0, we treat the full sentence as a class.
max_words – Maximum number of words to include in the dictionary.
offset – Text shifting.
fill – Whether we path with zeros at the beginning or at the end of the sentences.
min_occ – Minimum occurrences of each word to be included in the dictionary.
pad_on_batch – Whether we get sentences with length of the maximum length of the minibatch or sentences with a fixed (max_text_length) length.
words_so_far – Experimental feature. Should be ignored.
bpe_codes – Codes used for applying BPE encoding.
separator – BPE encoding separator.

Returns:

Preprocessed sentences.

preprocessVideoFeatures(path_list, data_id, set_name, max_video_len, img_size, img_size_crop, feat_len)¶: Preprocess already extracted features from video frames. :param path_list: path to all features in all videos :param data_id: Data id to be processed. :param set_name: Set name to be processed. :param max_video_len: Maximum number of subsampled video features. :param img_size: Size of each frame. :param img_size_crop: Size of each image crop. :param feat_len: Length of each feature. :return:

preprocessVideos(path_list, data_id, set_name, max_video_len, img_size, img_size_crop)¶: Preprocess videos. Subsample and crop frames. :param path_list: path to all images in all videos :param data_id: Data id to be processed. :param set_name: Set name to be processed. :param max_video_len: Maximum number of subsampled video frames. :param img_size: Size of each frame. :param img_size_crop: Size of each image crop. :return:

removeInput(set_name, id='label', type='categorical')¶: Deletes an input from the dataset. :param set_name: Set name to remove. :param id: Input to remove id. :param type: Type of the input to remove. :return:

removeOutput(set_name, id='label', type='categorical')¶: Deletes an output from the dataset. :param set_name: Set name to remove. :param id: Output to remove id. :param type: Type of the output to remove. :return:

replaceInput(data, set_name, data_type, data_id)¶: Replaces the data in a certain set_name and for a given data_id

resetCounters(set_name='all')¶: Resets some basic counter indices for the next samples to read.

resize_semantic_output(predictions, ids_out)¶: Resize semantic output.

setClasses(path_classes, data_id)¶

Loads the list of classes of the dataset. Each line must contain a unique identifier of the class.

Parameters:	path_classes – Path to a text file with the classes or an instance of the class list. data_id – Dataset id
Returns:	None

setInput(path_list, set_name, type='raw-image', id='image', repeat_set=1, required=True, overwrite_split=False, normalization_types=None, data_augmentation_types=None, add_additional=False, img_size=None, img_size_crop=None, use_RGB=True, max_text_len=35, tokenization='tokenize_none', offset=0, fill='end', min_occ=0, pad_on_batch=True, build_vocabulary=False, max_words=0, words_so_far=False, bpe_codes=None, separator='@@', use_unk_class=False, feat_len=1024, max_video_len=26, sparse=False)¶

Loads a list which can contain all samples from either the ‘train’, ‘val’, or ‘test’ set splits (specified by set_name).

# General parameters

Parameters:

use_RGB –
path_list – can either be a path to a text file containing the paths to the images or a python list of paths
set_name – identifier of the set split loaded (‘train’, ‘val’ or ‘test’)
type – identifier of the type of input we are loading (accepted types can be seen in self.__accepted_types_inputs)
id – identifier of the input data loaded
repeat_set – repeats the inputs given (useful when we have more outputs than inputs). Int or array of ints.
required – flag for optional inputs
overwrite_split – indicates that we want to overwrite the data with id that was already declared in the dataset
normalization_types – type of normalization applied to the current input if we activate the data normalization while loading
data_augmentation_types – type of data augmentation applied to the current input if we activate the data augmentation while loading
add_additional – adds additional data to an already existent input ID

# ‘raw-image’-related parameters

Parameters:	img_size – size of the input images (any input image will be resized to this) img_size_crop – size of the cropped zone (when dataAugmentation=False the central crop will be used)

# ‘text’-related parameters

Parameters:

tokenization – type of tokenization applied (must be declared as a method of this class) (only applicable when type==’text’).
build_vocabulary – whether a new vocabulary will be built from the loaded data or not (only applicable when type==’text’). A previously calculated vocabulary will be used if build_vocabulary is an ‘id’ from a previously loaded input/output
max_text_len – maximum text length, the rest of the data will be padded with 0s (only applicable if the output data is of type ‘text’).
max_words – a maximum of ‘max_words’ words from the whole vocabulary will be chosen by number or occurrences
offset – number of timesteps that the text is shifted to the right (for sequential conditional models, which take as input the previous output)
fill – select whether padding before or after the sequence
min_occ – minimum number of occurrences allowed for the words in the vocabulary. (default = 0)
pad_on_batch – the batch timesteps size will be set to the length of the largest sample +1 if True, max_len will be used as the fixed length otherwise
words_so_far – if True, each sample will be represented as the complete set of words until the point defined by the timestep dimension (e.g. t=0 ‘a’, t=1 ‘a dog’, t=2 ‘a dog is’, etc.)
bpe_codes – Codes used for applying BPE encoding.
separator – BPE encoding separator.

# ‘image-features’ and ‘video-features’- related parameters

Parameters:	feat_len – size of the feature vectors for each dimension. We must provide a list if the features are not vectors.

# ‘video’-related parameters :param max_video_len: maximum video length, the rest of the data will be padded with 0s

(only applicable if the input data is of type ‘video’ or video-features’).

setLabels(labels_list, set_name, type='categorical', id='label')¶: DEPRECATED

setList(path_list, set_name, type='raw-image', id='image')¶: DEPRECATED

setListGeneral(path_list, split=None, shuffle=True, type='raw-image', id='image')¶: Deprecated

setOutput(path_list, set_name, type='categorical', id='label', repeat_set=1, overwrite_split=False, add_additional=False, sample_weights=False, label_smoothing=0.0, tokenization='tokenize_none', max_text_len=0, offset=0, fill='end', min_occ=0, pad_on_batch=True, words_so_far=False, build_vocabulary=False, max_words=0, bpe_codes=None, separator='@@', use_unk_class=False, associated_id_in=None, num_poolings=None, sparse=False)¶

Loads a set of output data.

# General parameters

Parameters:

path_list – can either be a path to a text file containing the labels or a python list of labels.
set_name – identifier of the set split loaded (‘train’, ‘val’ or ‘test’).
type – identifier of the type of input we are loading (accepted types can be seen in self.__accepted_types_outputs).
id – identifier of the input data loaded.
repeat_set – repeats the outputs given (useful when we have more inputs than outputs). Int or array of ints.
overwrite_split – indicates that we want to overwrite the data with id that was already declared in the dataset
add_additional – adds additional data to an already existent output ID
sample_weights – switch on/off sample weights usage for the current output
label_smoothing – epsilon value for label smoothing. See arxiv.org/abs/1512.00567. # ‘text’-related parameters
tokenization – type of tokenization applied (must be declared as a method of this class) (only applicable when type==’text’).
build_vocabulary – whether a new vocabulary will be built from the loaded data or not (only applicable when type==’text’).
max_text_len –
maximum text length, the rest of the data will be padded with 0s (only applicable if the output data is of type ‘text’).

Set to 0 if the whole sentence will be used as an output class.
max_words – a maximum of ‘max_words’ words from the whole vocabulary will be chosen by number or occurrences
offset – number of timesteps that the text is shifted to the right (for sequential conditional models, which take as input the previous output)
fill – select whether padding before or after the sequence
min_occ – minimum number of occurrences allowed for the words in the vocabulary. (default = 0)
pad_on_batch – the batch timesteps size will be set to the length of the largest sample +1 if True, max_len will be used as the fixed length otherwise
words_so_far – if True, each sample will be represented as the complete set of words until the point defined by the timestep dimension (e.g. t=0 ‘a’, t=1 ‘a dog’, t=2 ‘a dog is’, etc.)
bpe_codes – Codes used for applying BPE encoding.
separator –
BPE encoding separator.

# ‘3DLabel’ or ‘3DSemanticLabel’-related parameters
associated_id_in – id of the input ‘raw-image’ associated to the inputted 3DLabels or 3DSemanticLabel
num_poolings –
number of pooling layers used in the model (used for calculating output dimensions)

# ‘binary’-related parameters
sparse – indicates if the data is stored as a list of lists with class indices, e.g. [[4, 234],[87, 222, 4568],[3],…]

setRawInput(path_list, set_name, type='file-name', id='raw-text', overwrite_split=False)¶

Loads a list which can contain all samples from either the ‘train’, ‘val’, or ‘test’ set splits (specified by set_name).

# General parameters :param overwrite_split: :param path_list: Path to a text file containing the paths to the images or a python list of paths :param set_name: identifier of the set split loaded (‘train’, ‘val’ or ‘test’) :param type: identifier of the type of input we are loading

(see self.__accepted_types_inputs for accepted types)

Parameters:	id – identifier of the input data loaded

setRawOutput(path_list, set_name, type='file-name', id='raw-text', overwrite_split=False, add_additional=False)¶

Loads a list which can contain all samples from either the ‘train’, ‘val’, or ‘test’ set splits (specified by set_name).

# General parameters :param overwrite_split: :param add_additional: :param path_list: can either be a path to a text file containing the paths to

the images or a python list of paths

Parameters:	set_name – identifier of the set split loaded (‘train’, ‘val’ or ‘test’) type – identifier of the type of input we are loading (accepted types can be seen in self.__accepted_types_inputs) id – identifier of the input data loaded

setSemanticClasses(path_classes, data_id)¶

Loads the list of semantic classes of the dataset together with their corresponding colours in the GT image. Each line must contain a unique identifier of the class and its associated RGB colour representation

separated by commas.

Parameters:	path_classes – Path to a text file with the classes and their colours. data_id – input/output id
Returns:	None

setSilence(silence)¶: Changes the silence mode of the ‘Dataset’ instance.

setTrainMean(mean_image, data_id, normalization=False)¶

Loads a pre-calculated training mean image, ‘mean_image’ can either be:

numpy.array (complete image)

list with a value per channel

string with the path to the stored image.

Parameters:	mean_image – normalization – data_id – identifier of the type of input whose train mean is being introduced.

shuffleTraining()¶: Applies a random shuffling to the training samples.

static tokenize_CNN_sentence(caption)¶: Tokenization employed in the CNN_sentence package (https://github.com/yoonkim/CNN_sentence/blob/master/process_data.py#L97). :param caption: String to tokenize :return: Tokenized version of caption

static tokenize_aggressive(caption, lowercase=True)¶

Aggressive tokenizer for the input/output data of type ‘text’: * Removes punctuation * Optional lowercasing

Parameters:	caption – String to tokenize lowercase – Whether to lowercase the caption or not
Returns:	Tokenized version of caption

static tokenize_basic(caption, lowercase=True)¶

Basic tokenizer for the input/output data of type ‘text’:

Splits punctuation
Optional lowercasing

Parameters:	caption – String to tokenize lowercase – Whether to lowercase the caption or not
Returns:	Tokenized version of caption

tokenize_bpe(caption)¶: Applies BPE segmentation (https://github.com/rsennrich/subword-nmt) :param caption: Caption to detokenize. :return: Encoded version of caption.

static tokenize_icann(caption)¶

Tokenization used for the icann paper: * Removes some punctuation (. , “) * Lowercasing

Parameters:	caption – String to tokenize
Returns:	Tokenized version of caption

static tokenize_montreal(caption)¶

Similar to tokenize_icann

Removes some punctuation
Lowercase

Parameters:	caption – String to tokenize
Returns:	Tokenized version of caption

tokenize_moses(caption, language='en', lowercase=False, aggressive_dash_splits=False, return_str=True, escape=False)¶

Applies the Moses tokenization. Relying on sacremoses’ implementation of the Moses tokenizer.

Parameters:	caption – Sentence to tokenize language – Language (will build the tokenizer for this language) lowercase – Whether to lowercase or not the sentence agressive_dash_splits – Option to trigger dash split rules . return_str – Return string or list escape – Escape HTML special chars
Returns:

static tokenize_none(caption)¶

Does not tokenizes the sentences. Only performs a stripping

Parameters:	caption – String to tokenize
Returns:	Tokenized version of caption

static tokenize_none_char(caption)¶: Character-level tokenization. Respects all symbols. Separates chars. Inserts <space> sybmol for spaces. If found an escaped char, “'” symbol, it is converted to the original one # List of escaped chars (by moses tokenizer) & -> & | -> | < -> < > -> > ‘ -> ' ” -> " [ -> [ ] -> ] :param caption: String to tokenize :return: Tokenized version of caption

static tokenize_questions(caption)¶

Basic tokenizer for VQA questions:

Lowercasing
Splits contractions
Removes punctuation
Numbers to digits

Parameters:	caption – String to tokenize
Returns:	Tokenized version of caption

static tokenize_soft(caption, lowercase=True)¶

Tokenization used for the icann paper:

Removes very little punctuation
Lowercase

Parameters:	caption – String to tokenize lowercase – Whether to lowercase the caption or not
Returns:	Tokenized version of caption

class keras_wrapper.dataset.Homogeneous_Data_Batch_Generator(set_split, net, dataset, num_iterations, batch_size=50, joint_batches=20, normalization=False, normalization_type=None, data_augmentation=True, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, mean_substraction=False, predict=False, random_samples=-1, shuffle=True)¶

Batch generator class. Retrieves batches of data.

generator()¶: Gets and processes the data :return: generator with the data

reset()¶: Resets the counters. :return:

retrieve_maxibatch()¶: Gets a maxibatch of self.params[‘joint_batches’] * self.batch_size samples. :return:

class keras_wrapper.dataset.Parallel_Data_Batch_Generator(set_split, net, dataset, num_iterations, batch_size=50, normalization=False, normalization_type=None, data_augmentation=True, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, mean_substraction=False, predict=False, random_samples=-1, shuffle=True, temporally_linked=False, init_sample=-1, final_sample=-1, n_parallel_loaders=1)¶

Batch generator class. Retrieves batches of data.

generator()¶: Gets and processes the data :return: generator with the data

keras_wrapper.dataset.dataLoad(process_name, net, dataset, max_queue_len, queues)¶: Parallel data loader. Risky and untested! :param process_name: :param net: :param dataset: :param max_queue_len: :param queues: :return:

keras_wrapper.dataset.loadDataset(dataset_path)¶

Loads a previously saved Dataset object.

Parameters:	dataset_path – Path to the stored Dataset to load
Returns:	Loaded Dataset object

keras_wrapper.dataset.saveDataset(dataset, store_path)¶

Saves a backup of the current Dataset object.

Parameters:	dataset – Dataset object to save store_path – Saving path
Returns:	None

cnn_model.py¶

callbacks_keras_wrapper.py¶

beam_search_ensemble.py¶

utils.py¶

class keras_wrapper.utils.MultiprocessQueue(manager, multiprocess_type='Queue')¶

Wrapper class for encapsulating the behaviour of some multiprocessing communication structures.

See how Queues and Pipes work in the following link https://docs.python.org/2/library/multiprocessing.html#multiprocessing-examples

keras_wrapper.utils.bbox(img, mode='max')¶

Returns a bounding box covering all the non-zero area in the image.

Parameters:	img – Image on which print the bounding box mode – “width_height” returns width in [2] and height in [3], “max” returns xmax in [2] and ymax in [3]
Returns:

keras_wrapper.utils.build_OneVsAllECOC_Stage(n_classes_ecoc, input_shape, ds, stage1_lr)¶

Parameters:	n_classes_ecoc – input_shape – ds – stage1_lr –
Returns:

keras_wrapper.utils.build_OneVsOneECOC_Stage(n_classes_ecoc, input_shape, ds, stage1_lr=0.01, ecoc_version=2)¶

Parameters:	n_classes_ecoc – input_shape – ds – stage1_lr – ecoc_version –
Returns:

keras_wrapper.utils.build_Specific_OneVsOneECOC_Stage(pairs, input_shape, ds, lr, ecoc_version=2)¶

Parameters:	pairs – input_shape – ds – lr – ecoc_version –
Returns:

keras_wrapper.utils.build_Specific_OneVsOneECOC_loss_Stage(net, input_net, input_shape, classes, ecoc_version=3, pairs=None, functional_api=False, activations=None)¶

Parameters:	net – input_net – input_shape – classes – ecoc_version – pairs – functional_api – activations –
Returns:

keras_wrapper.utils.build_Specific_OneVsOneVsRestECOC_Stage(pairs, input_shape, ds, lr, ecoc_version=2)¶

Parameters:	pairs – input_shape – ds – lr – ecoc_version –
Returns:

keras_wrapper.utils.checkParameters(input_params, default_params, hard_check=False)¶

Validates a set of input parameters and uses the default ones if not specified.

Parameters:	input_params – Input parameters. default_params – Default parameters hard_check – If True, raise exception if a parameter is not valid.
Returns:

keras_wrapper.utils.decode_categorical(preds, index2word, verbose=0)¶: Decodes predictions :param preds: Predictions codified as the output of a softmax activation function. :param index2word: Mapping from word indices into word characters. :return: List of decoded predictions.

keras_wrapper.utils.decode_multilabel(preds, index2word, min_val=0.5, get_probs=False, verbose=0)¶: Decodes predictions :param preds: Predictions codified as the output of a softmax activation function. :param index2word: Mapping from word indices into word characters. :param min_val: Minimum value needed for considering a positive prediction. :param get_probs: additionally return probability for each predicted label :param verbose: Verbosity level, by default 0. :return: List of decoded predictions.

keras_wrapper.utils.decode_predictions(preds, temperature, index2word, sampling_type, verbose=0)¶: Decodes predictions :param preds: Predictions codified as the output of a softmax activation function. :param temperature: Temperature for sampling. :param index2word: Mapping from word indices into word characters. :param sampling_type: ‘max_likelihood’ or ‘multinomial’. :param verbose: Verbosity level, by default 0. :return: List of decoded predictions.

keras_wrapper.utils.decode_predictions_beam_search(preds, index2word, glossary=None, alphas=None, heuristic=0, x_text=None, unk_symbol='<unk>', pad_sequences=False, mapping=None, verbose=0)¶

Decodes predictions from the BeamSearch method.

Parameters:

preds – Predictions codified as word indices.
index2word – Mapping from word indices into word characters.
alphas – Attention model weights: Float matrix with shape (I, J) (I: number of target items; J: number of source items).
heuristic – Replace unknown words heuristic (0, 1 or 2)
x_text – Source text (for unk replacement)
unk_symbol – Unknown words symbol
pad_sequences – Whether we should make a zero-pad on the input sequence.
mapping – Source-target dictionary (for unk_replace heuristics 1 and 2)
verbose – Verbosity level, by default 0.

Returns:

List of decoded predictions

keras_wrapper.utils.decode_predictions_one_hot(preds, index2word, pad_sequences=True, verbose=0)¶: Decodes predictions following a one-hot codification. :param preds: Predictions codified as one-hot vectors. :param index2word: Mapping from word indices into word characters. :param verbose: Verbosity level, by default 0. :return: List of decoded predictions

keras_wrapper.utils.flatten(l)¶: Flatten a list (more general than flatten_list_of_lists, but also more inefficient :param l: :return:

keras_wrapper.utils.flatten_list_of_lists(list_of_lists)¶: Flattens a list of lists :param list_of_lists: List of lists :return: Flatten list of lists

keras_wrapper.utils.indices_2_one_hot(indices, n)¶

Converts a list of indices into one hot codification

Parameters:	indices – list of indices n – integer. Size of the vocabulary
Returns:	numpy array with shape (len(indices), n)

keras_wrapper.utils.key_with_max_val(d)¶

create a list of the dict’s keys and values;
return the key with the max value

keras_wrapper.utils.loadGoogleNetForFood101(nClasses=101, load_path='/media/HDD_2TB/CNN_MODELS/GoogleNet')¶

Parameters:	nClasses – load_path –
Returns:

keras_wrapper.utils.one_hot_2_indices(preds, pad_sequences=True, verbose=0)¶: Converts a one-hot codification into a index-based one :param preds: Predictions codified as one-hot vectors. :param pad_sequences: Whether we should pad sequence or not :param verbose: Verbosity level, by default 0. :return: List of convertedpredictions

keras_wrapper.utils.prepareECOCLossOutputs(net, ds, ecoc_table, input_name, output_names, splits=None)¶

Parameters:	net – ds – ecoc_table – input_name – output_names – splits –
Returns:

keras_wrapper.utils.prepareGoogleNet_Food101(model_wrapper)¶: Prepares the GoogleNet model after its conversion from Caffe :param model_wrapper: :return:

keras_wrapper.utils.prepareGoogleNet_Food101_ECOC_loss(model_wrapper)¶: Prepares the GoogleNet model for inserting an ECOC structure after removing the last part of the net :param model_wrapper: :return:

keras_wrapper.utils.prepareGoogleNet_Food101_Stage1(model_wrapper)¶: Prepares the GoogleNet model for serving as the first Stage of a Staged_Netork :param model_wrapper: :return:

keras_wrapper.utils.prepareGoogleNet_Stage2(stage1, stage2)¶: Removes the second part of the GoogleNet for inserting it into the second stage. :param stage1: :param stage2: :return:

keras_wrapper.utils.print_dict(d, header='')¶: Formats a dictionary for printing. :param d: Dictionary to print. :return: String containing the formatted dictionary.

keras_wrapper.utils.replace_unknown_words(src_word_seq, trg_word_seq, hard_alignment, unk_symbol, glossary=None, heuristic=0, mapping=None, verbose=0)¶: Replaces unknown words from the target sentence according to some heuristic. Borrowed from: https://github.com/sebastien-j/LV_groundhog/blob/master/experiments/nmt/replace_UNK.py :param src_word_seq: Source sentence words :param trg_word_seq: Hypothesis words :param hard_alignment: Target-Source alignments :param glossary: Hard-coded substitutions. :param unk_symbol: Symbol in trg_word_seq to replace :param heuristic: Heuristic (0, 1, 2) :param mapping: External alignment dictionary :param verbose: Verbosity level :return: trg_word_seq with replaced unknown words

keras_wrapper.utils.sampling(scores, sampling_type='max_likelihood', temperature=1.0)¶: Sampling words (each sample is drawn from a categorical distribution). Or picks up words that maximize the likelihood. :param scores: array of size #samples x #classes; every entry determines a score for sample i having class j :param sampling_type: :param temperature: Predictions temperature. The higher, the flatter probabilities. Hence more random outputs. :return: set of indices chosen as output, a vector of size #samples

keras_wrapper.utils.simplifyDataset(ds, id_classes, n_classes=50)¶

Parameters:	ds – id_classes – n_classes –
Returns:

keras_wrapper.utils.to_categorical(y, num_classes=None)¶

Converts a class vector (integers) to binary class matrix.

E.g. for use with categorical_crossentropy.

# Arguments

y: class vector to be converted into a matrix: (integers from 0 to num_classes).

num_classes: total number of classes.

# Returns

A binary matrix representation of the input.