Available Modules

List of all files, classes and methods available in the library.

dataset.py

class keras_wrapper.dataset.Data_Batch_Generator(set_split, net, dataset, num_iterations, batch_size=50, normalization=False, normalization_type=None, data_augmentation=True, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, mean_substraction=False, predict=False, random_samples=-1, shuffle=True, temporally_linked=False, init_sample=-1, final_sample=-1)

Batch generator class. Retrieves batches of data.

generator()

Gets and processes the data :return: generator with the data

class keras_wrapper.dataset.Dataset(name, path, pad_symbol='<pad>', unk_symbol='<unk>', null_symbol='<null>', silence=False)

Class for defining instances of databases adapted for Keras. It includes several utility functions for easily managing data splits, image loading, mean calculation, etc.

apply_label_smoothing(y, discount, vocabulary_len, discount_type='uniform')

Applies label smoothing to a one-hot codified vector. :param y_text: Input to smooth :param discount: Discount to apply :param vocabulary_len: Length of the one-hot vectors :param discount_type: Type of smoothing. Types supported:

‘uniform’: Subtract a ‘label_smoothing_discount’ from the label and distribute it uniformly among all labels.
Returns:
build_bpe(codes, merges=-1, separator='@@', vocabulary=None, glossaries=None)

Constructs a BPE encoder instance. Currently, vocabulary and glossaries options are not implemented. :param codes: File with BPE codes (created by learn_bpe.py) :param separator: Separator between non-final subword units (default: ‘@@’)) :param vocabulary: Vocabulary file. If provided, this script reverts any merge operations that produce an OOV. :param glossaries: The strings provided in glossaries will not be affected

by the BPE (i.e. they will neither be broken into subwords, nor concatenated with other subwords.
Returns:None
build_moses_detokenizer(language='en')

Constructs a BPE encoder instance. Currently, vocabulary and glossaries options are not implemented. :param codes: File with BPE codes (created by learn_bpe.py) :param separator: Separator between non-final subword units (default: ‘@@’)) :param vocabulary: Vocabulary file. If provided, this script reverts any merge operations that produce an OOV. :param glossaries: The strings provided in glossaries will not be affected

by the BPE (i.e. they will neither be broken into subwords, nor concatenated with other subwords.
Returns:None
build_moses_tokenizer(language='en')

Constructs a Moses tokenizer instance. :param language: Tokenizer language. :return: None

build_vocabulary(captions, data_id, do_split=True, min_occ=0, n_words=0, split_symbol=' ', use_extra_words=True, use_unk_class=False, is_val=False)

Vocabulary builder for data of type ‘text’

Parameters:
  • use_extra_words
  • captions – Corpus sentences
  • data_id – Dataset id of the text
  • do_split – Split sentence by words or use the full sentence as a class.
  • split_symbol – symbol used for separating the elements in each sentence
  • min_occ – Minimum occurrences of each word to be included in the dictionary.
  • n_words – Maximum number of words to include in the dictionary.
  • is_val – Set to True if the input ‘captions’ are values and we want to keep them sorted
Returns:

None.

calculateTrainMean(data_id)

Calculates the mean of the data belonging to the training set split in each channel.

convert_3DLabels_to_bboxes(predictions, original_sizes, threshold=0.5, idx_3DLabel=0, size_restriction=0.001)

Converts a set of predictions of type 3DLabel to their corresponding bounding boxes.

Parameters:
  • idx_3DLabel
  • size_restriction
  • predictions – 3DLabels predicted by Model_Wrapper. If type is list it will be assumed that position 0 corresponds to 3DLabels
  • original_sizes – original sizes of the predicted images width and height
  • threshold – minimum overlapping threshold for considering a prediction valid
Returns:

predicted_bboxes, predicted_Y, predicted_scores for each image

static convert_GT_3DLabels_to_bboxes(gt)

Converts a GT list of 3DLabels to a set of bboxes.

Parameters:gt – list of Dataset output of type 3DLabels
Returns:[out_list, original_sizes], where out_list contains a list of samples with the following info [GT_bboxes, GT_Y], and original_sizes contains the original width and height for each image
static detokenize_bpe(caption, separator='@@')

Reverts BPE segmentation (https://github.com/rsennrich/subword-nmt) :param caption: Caption to detokenize. :param separator: BPE separator. :return: Detokenized version of caption.

detokenize_moses(caption, language='en', lowercase=False, return_str=True, unescape=True)

Applies the Moses detokenization. Relying on sacremoses’ implementation of the Moses tokenizer.

Parameters:
  • caption – Sentence to tokenize
  • language – Language (will build the tokenizer for this language)
  • lowercase – Whether to lowercase or not the sentence
  • agressive_dash_splits – Option to trigger dash split rules .
  • return_str – Return string or list
  • escape – Escape HTML special chars
Returns:

static detokenize_none(caption)

Dummy function: Keeps the caption as it is. :param caption: String to de-tokenize. :return: Same caption.

static detokenize_none_char(caption)

Character-level detokenization. Respects all symbols. Joins chars into words. Words are delimited by the <space> token. If found an special character is converted to the escaped char. # List of escaped chars (by moses tokenizer)

& -> &amp; | -> &#124; < -> &lt; > -> &gt; ‘ -> &apos; ” -> &quot; [ -> &#91; ] -> &#93;
Parameters:caption – String to de-tokenize. :return: Detokenized version of caption.
getClassID(class_name, data_id)
Returns:the class data_id (int) for a given class string.
getFramesPaths(idx_videos, data_id, set_name, max_len, data_augmentation)

Recovers the paths from the selected video frames.

getImageFromPrediction_3DSemanticLabel(img, n_classes)

Get the segmented image from the prediction of the model using the semantic classes of the dataset together with their corresponding colours.

Parameters:
  • img – Prediction of the model.
  • n_classes – Number of semantic classes.
Returns:

out_img: The segmented image with the class colours.

getX(set_name, init, final, normalization_type='(-1)-1', normalization=False, meanSubstraction=False, dataAugmentation=False, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, get_only_ids=False)

Gets all the data samples stored between the positions init to final

Parameters:
  • set_name – ‘train’, ‘val’ or ‘test’ set
  • init – initial position in the corresponding set split. Must be bigger or equal than 0 and smaller than final.
  • final – final position in the corresponding set split.

# ‘raw-image’, ‘video’, ‘image-features’ and ‘video-features’-related parameters :param normalization: indicates if we want to normalize the data. # ‘image-features’ and ‘video-features’-related parameters :param normalization_type: indicates the type of normalization applied.

See available types in self.__available_norm_im_vid for ‘raw-image’ and ‘video’ and self.__available_norm_feat for ‘image-features’ and ‘video-features’.

# ‘raw-image’ and ‘video’-related parameters :param meanSubstraction: indicates if we want to substract the training mean from the returned images

(only applicable if normalization=True)
Parameters:dataAugmentation – indicates if we want to apply data augmentation to the loaded images (random flip and cropping)
Returns:X, list of input data variables from sample ‘init’ to ‘final’ belonging to the chosen ‘set_name’
getXY(set_name, k, normalization_type='(-1)-1', normalization=False, meanSubstraction=False, dataAugmentation=False, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, get_only_ids=False)

Gets the [X,Y] pairs for the next ‘k’ samples in the desired set. :param set_name: ‘train’, ‘val’ or ‘test’ set :param k: number of consecutive samples retrieved from the corresponding set. # ‘raw-image’, ‘video’, ‘image-features’ and ‘video-features’-related parameters :param normalization: indicates if we want to normalize the data. # ‘image-features’ and ‘video-features’-related parameters :param normalization_type: indicates the type of normalization applied. See available types in

self.__available_norm_im_vid for ‘image’ and ‘video’ and self.__available_norm_feat for ‘image-features’ and ‘video-features’.

# ‘raw-image’ and ‘video’-related parameters :param meanSubstraction: indicates if we want to substract the training mean from the returned images

(only applicable if normalization=True)
Parameters:dataAugmentation – indicates if we want to apply data augmentation to the loaded images (random flip and cropping)
Returns:[X,Y], list of input and output data variables of the next ‘k’ consecutive samples belonging to the chosen ‘set_name’
getXY_FromIndices(set_name, k, normalization_type='(-1)-1', normalization=False, meanSubstraction=False, dataAugmentation=False, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, get_only_ids=False)

Gets the [X,Y] pairs for the samples in positions ‘k’ in the desired set. :param set_name: ‘train’, ‘val’ or ‘test’ set :param k: positions of the desired samples # ‘raw-image’, ‘video’, ‘image-features’ and ‘video-features’-related parameters :param normalization: indicates if we want to normalize the data. # ‘image-features’ and ‘video-features’-related parameters :param normalization_type: indicates the type of normalization applied. See available types in

self.__available_norm_im_vid for ‘raw-image’ and ‘video’ and self.__available_norm_feat for ‘image-features’ and ‘video-features’.

# ‘raw-image’ and ‘video’-related parameters :param meanSubstraction: indicates if we want to substract the training mean from the returned images

(only applicable if normalization=True)
Parameters:dataAugmentation – indicates if we want to apply data augmentation to the loaded images (random flip and cropping)
Returns:[X,Y], list of input and output data variables of the samples identified by the indices in ‘k’ samples belonging to the chosen ‘set_name’
getX_FromIndices(set_name, k, normalization_type='(-1)-1', normalization=False, meanSubstraction=False, dataAugmentation=False, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, get_only_ids=False)

Gets the [X,Y] pairs for the samples in positions ‘k’ in the desired set. :param set_name: ‘train’, ‘val’ or ‘test’ set :param k: positions of the desired samples # ‘raw-image’, ‘video’, ‘image-features’ and ‘video-features’-related parameters :param normalization: indicates if we want to normalize the data. # ‘image-features’ and ‘video-features’-related parameters :param normalization_type: indicates the type of normalization applied. See available types in

self.__available_norm_im_vid for ‘raw-image’ and ‘video’ and self.__available_norm_feat for ‘image-features’ and ‘video-features’.

# ‘raw-image’ and ‘video’-related parameters :param meanSubstraction: indicates if we want to substract the training mean from the returned images

(only applicable if normalization=True)
Parameters:dataAugmentation – indicates if we want to apply data augmentation to the loaded images (random flip and cropping)
Returns:[X,Y], list of input and output data variables of the samples identified by the indices in ‘k’ samples belonging to the chosen ‘set_name’
getY(set_name, init, final, dataAugmentation=False, get_only_ids=False)

Gets the [Y] samples for the FULL dataset :param set_name: ‘train’, ‘val’ or ‘test’ set :param init: initial position in the corresponding set split. Must be bigger or equal than 0 and smaller than

final.
Parameters:final – final position in the corresponding set split.
Returns:Y, list of output data variables from sample ‘init’ to ‘final’ belonging to the chosen ‘set_name’
getY_FromIndices(set_name, k, dataAugmentation=False, return_mask=True, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, get_only_ids=False)

Gets the [Y] pairs for the samples in positions ‘k’ in the desired set. :param set_name: ‘train’, ‘val’ or ‘test’ set :param k: positions of the desired samples # ‘raw-image’, ‘video’, ‘image-features’ and ‘video-features’-related parameters :param normalization: indicates if we want to normalize the data. # ‘image-features’ and ‘video-features’-related parameters :param normalization_type: indicates the type of normalization applied. See available types in

self.__available_norm_im_vid for ‘raw-image’ and ‘video’ and self.__available_norm_feat for ‘image-features’ and ‘video-features’.

# ‘raw-image’ and ‘video’-related parameters :param meanSubstraction: indicates if we want to substract the training mean from the returned images

(only applicable if normalization=True)
Parameters:dataAugmentation – indicates if we want to apply data augmentation to the loaded images (random flip and cropping)
Returns:[X,Y], list of input and output data variables of the samples identified by the indices in ‘k’ samples belonging to the chosen ‘set_name’
keepTopOutputs(set_name, id_out, n_top)

Keep the most frequent outputs from a set_name. :param set_name: Set name to modify. :param id_out: Id. :param n_top: Number of elements to keep. :return:

static load3DLabels(bbox_list, nClasses, dataAugmentation, daRandomParams, img_size, size_crop, image_list)

Loads a set of outputs of the type 3DLabel (used for detection)

Parameters:
  • bbox_list – list of bboxes, labels and original sizes
  • nClasses – number of different classes to be detected
  • dataAugmentation – are we applying data augmentation?
  • daRandomParams – random parameters applied on data augmentation (vflip, hflip and random crop)
  • img_size – resized applied to input images
  • size_crop – crop size applied to input images
  • image_list – list of input images used as identifiers to ‘daRandomParams’
Returns:

3DLabels with shape (batch_size, width*height, classes)

load3DSemanticLabels(labeled_images_list, nClasses, classes_to_colour, dataAugmentation, daRandomParams, img_size, size_crop, image_list)

Loads a set of outputs of the type 3DSemanticLabel (used for semantic segmentation TRAINING)

Parameters:
  • labeled_images_list – list of labeled images
  • nClasses – number of different classes to be detected
  • classes_to_colour – dictionary relating each class id to their corresponding colour in the labeled image
  • dataAugmentation – are we applying data augmentation?
  • daRandomParams – random parameters applied on data augmentation (vflip, hflip and random crop)
  • img_size – resized applied to input images
  • size_crop – crop size applied to input images
  • image_list – list of input images used as identifiers to ‘daRandomParams’
Returns:

3DSemanticLabels with shape (batch_size, width*height, classes)

loadBinary(y_raw, data_id)

Load a binary vector. May be of type ‘sparse’ :param y_raw: Vector to load. :param data_id: Id to load. :return:

static loadCategorical(y_raw, nClasses)

Converts a class vector (integers) to binary class matrix. From utils. :param y_raw: class vector to be converted into a matrix (integers from 0 to num_classes). :param nClasses: total number of classes. :return:

loadFeatures(X, feat_len, normalization_type='L2', normalization=False, loaded=False, external=False, data_augmentation=True)

Loads and normalizes features.

Parameters:
  • X – Features to load.
  • feat_len – Length of the features.
  • normalization_type – Normalization to perform to the features (see: self.__available_norm_feat)
  • normalization – Whether to normalize or not the features.
  • loaded – Flag that indicates if these features have been already loaded.
  • external – Boolean indicating if the paths provided in ‘X’ are absolute paths to external images
  • data_augmentation – Perform data augmentation (with mean=0.0, std_dev=0.01)
Returns:

Loaded features as numpy array

loadImages(images, data_id, normalization_type='(-1)-1', normalization=False, meanSubstraction=False, dataAugmentation=False, daRandomParams=None, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, useBGR=False, external=False, loaded=False)

Loads a set of images from disk.

:param images : list of image string names or list of matrices representing images (only if loaded==True) :param data_id : identifier in the Dataset object of the data we are loading :param normalization_type: type of normalization applied :param normalization : whether we applying a ‘0-1’ or ‘(-1)-1’ normalization to the images :param meanSubstraction : whether we are removing the training mean :param dataAugmentation : whether we are applying dataAugmentatino (random cropping and horizontal flip) :param daRandomParams : dictionary with results of random data augmentation provided by

self.getDataAugmentationRandomParams()
:param external : if True the images will be loaded from an external database, in this case the list of
images must be absolute paths

:param loaded : set this option to True if images is a list of matricies instead of a list of strings

loadMapping(path_list)

Loads a mapping of Source – Target words. :param path_list: Pickle object with the mapping :return: None

loadText(X, vocabularies, max_len, offset, fill, pad_on_batch, words_so_far, loading_X=False)

Text encoder: Transforms samples from a text representation into a numerical one. It also masks the text.

Parameters:
  • X – Text to encode.
  • vocabularies – Mapping word -> index
  • max_len – Maximum length of the text.
  • offset – Shifts the text to the right, adding null symbol at the start
  • fill – ‘start’: the resulting vector will be filled with 0s at the beginning. ‘end’: it will be filled with 0s at the end. ‘center’: the vector will be surrounded by 0s, both at beginning and end.
  • pad_on_batch – Whether we get sentences with length of the maximum length of the minibatch or sentences with a fixed (max_text_length) length.
  • words_so_far – Experimental feature. Use with caution.
  • loading_X – Whether we are loading an input or an output of the model
Returns:

Text as sequence of number. Mask for each sentence.

loadTextFeatures(X, max_len, pad_on_batch, offset)

Text encoder: Transforms samples from a text representation into a numerical one. It also masks the text.

Parameters:
  • X – Encoded text.
  • max_len – Maximum length of the text.
  • pad_on_batch – Whether we get sentences with length of the maximum length of the minibatch or sentences with a fixed (max_text_length) length.
Returns:

Text as sequence of numbers. Mask for each sentence.

loadTextFeaturesOneHot(X, vocabulary_len, max_len, pad_on_batch, offset, sample_weights=False, label_smoothing=0.0)

Text encoder: Transforms samples from a text representation into a one-hot. It also masks the text. :param X: Encoded text. :param vocabulary_len: Length of the vocabulary (size of the one-hot vector) :param sample_weights: If True, we also return the mask of the text. :param vocabularies: Mapping word -> index :param max_len: Maximum length of the text. :param offset: Shifts the text to the right, adding null symbol at the start :param fill: ‘start’: the resulting vector will be filled with 0s at the beginning.

‘end’: it will be filled with 0s at the end. ‘center’: the vector will be surrounded by 0s, both at beginning and end.
Parameters:
  • pad_on_batch – Whether we get sentences with length of the maximum length of the minibatch or sentences with a fixed (max_text_length) length.
  • words_so_far – Experimental feature. Use with caution.
  • loading_X – Whether we are loading an input or an output of the model
Returns:

Text as sequence of one-hot vectors. Mask for each sentence.

loadTextOneHot(X, vocabularies, vocabulary_len, max_len, offset, fill, pad_on_batch, words_so_far, sample_weights=False, loading_X=False, label_smoothing=0.0)

Text encoder: Transforms samples from a text representation into a one-hot. It also masks the text.

Parameters:
  • vocabulary_len
  • sample_weights
  • X – Text to encode.
  • vocabularies – Mapping word -> index
  • max_len – Maximum length of the text.
  • offset – Shifts the text to the right, adding null symbol at the start
  • fill – ‘start’: the resulting vector will be filled with 0s at the beginning. ‘end’: it will be filled with 0s at the end. ‘center’: the vector will be surrounded by 0s, both at beginning and end.
  • pad_on_batch – Whether we get sentences with length of the maximum length of the minibatch or sentences with a fixed (max_text_length) length.
  • words_so_far – Experimental feature. Use with caution.
  • loading_X – Whether we are loading an input or an output of the model
Returns:

Text as sequence of one-hot vectors. Mask for each sentence.

loadVideoFeatures(idx_videos, data_id, set_name, max_len, normalization_type, normalization, feat_len, external=False, data_augmentation=True)
Parameters:
  • idx_videos – indices of the videos in the complete list of the current set_name
  • data_id – identifier of the input/output that we are loading
  • set_name – ‘train’, ‘val’ or ‘test’
  • max_len – maximum video length (number of frames)
  • normalization_type – type of data normalization applied
  • normalization – Switch on/off data normalization
  • feat_len – length of the features about to load
  • external – Switch on/off data loading from external dataset (not sharing self.path)
  • data_augmentation – Switch on/off data augmentation
Returns:

loadVideos(n_frames, data_id, last, set_name, max_len, normalization_type, normalization, meanSubstraction, dataAugmentation)
Loads a set of videos from disk. (Untested!)
Parameters:
  • n_frames – Number of frames per video
  • data_id – Id to load
  • last – Last video loaded
  • set_name – ‘train’, ‘val’, ‘test’
  • max_len – Maximum length of videos
  • normalization_type – Type of normalization applied
  • normalization – Whether we apply a 0-1 normalization to the images
  • meanSubstraction – Whether we are removing the training mean
  • dataAugmentation – Whether we are applying dataAugmentatino (random cropping and horizontal flip)
loadVideosByIndex(n_frames, data_id, indices, set_name, max_len, normalization_type, normalization, meanSubstraction, dataAugmentation)

Get videos by indices. :param n_frames: Indices of the frames to load from each video. :param data_id: Data id to be processed. :param indices: Indices of the videos to load. :param set_name: Set name to be processed. :param max_len: Maximum length of each video. :param normalization_type: Normalization type applied to the frames. :param normalization: Normalization applied to the frames. :param meanSubstraction: Mean subtraction applied to the frames. :param dataAugmentation: Whether apply data augmentation. :return:

load_GT_3DSemanticLabels(gt, data_id)

Loads a GT list of 3DSemanticLabels in a 2D matrix and reshapes them to an Nx1 array (EVALUATION)

Parameters:
  • gt – list of Dataset output of type 3DSemanticLabels
  • data_id – id of the input/output we are processing
Returns:

out_list: containing a list of label images reshaped as an Nx1 array

merge_vocabularies(ids)

Merges the vocabularies from a set of text inputs/outputs into a single one.

Parameters:ids – identifiers of the inputs/outputs whose vocabularies will be merged
Returns:None
preprocess3DSemanticLabel(path_list, data_id, associated_id_in, num_poolings)

Preprocess 3D Semantic labels

preprocessBinary(labels_list, data_id, sparse)

Preprocesses binary classes.

Parameters:
  • data_id
  • labels_list – Binary label list given as an instance of the class list.
  • sparse – indicates if the data is stored as a list of lists with class indices, e.g. [[4, 234],[87, 222, 4568],[3],…]
Returns:

Preprocessed labels.

preprocessCategorical(labels_list, data_id, sample_weights=False)

Preprocesses categorical data.

Parameters:
  • data_id
  • sample_weights
  • labels_list – Label list. Given as a path to a file or as an instance of the class list.
Returns:

Preprocessed labels.

preprocessFeatures(path_list, data_id, set_name, feat_len)

Preprocesses features. We should give a path to a text file where each line must contain a path to a .npy file storing a feature vector. Alternatively “path_list” can be an instance of the class list.

Parameters:
  • path_list – Path to a text file where each line must contain a path to a .npy file storing a feature vector. Alternatively, instance of the class list.
  • data_id – Dataset id
  • set_name – Used?
  • feat_len – Length of features. If all features have the same length, given as a number. Otherwise, list.
Returns:

Preprocessed features

static preprocessIDs(path_list, data_id, set_name)

Preprocess ID outputs: Strip and put each ID in a line.

preprocessImages(path_list, data_id, set_name, img_size, img_size_crop, use_RGB)

Image preprocessing function. :param path_list: Path to the images. :param data_id: Data id. :param set_name: Set name. :param img_size: Size of the images to process. :param img_size_crop: Size of the image crops. :param use_RGB: Whether use RGB color encoding. :return:

static preprocessReal(labels_list)

Preprocesses real classes.

Parameters:labels_list – Label list. Given as a path to a file or as an instance of the class list.
Returns:Preprocessed labels.
preprocessText(annotations_list, data_id, set_name, tokenization, build_vocabulary, max_text_len, max_words, offset, fill, min_occ, pad_on_batch, words_so_far, bpe_codes=None, separator='@@', use_unk_class=False)

Preprocess ‘text’ data type: Builds vocabulary (if necessary) and preprocesses the sentences. Also sets Dataset parameters.

Parameters:
  • annotations_list – Path to the sentences to process.
  • data_id – Dataset id of the data.
  • set_name – Name of the current set (‘train’, ‘val’, ‘test’)
  • tokenization – Tokenization to perform.
  • build_vocabulary – Whether we should build a vocabulary for this text or not.
  • max_text_len – Maximum length of the text. If max_text_len == 0, we treat the full sentence as a class.
  • max_words – Maximum number of words to include in the dictionary.
  • offset – Text shifting.
  • fill – Whether we path with zeros at the beginning or at the end of the sentences.
  • min_occ – Minimum occurrences of each word to be included in the dictionary.
  • pad_on_batch – Whether we get sentences with length of the maximum length of the minibatch or sentences with a fixed (max_text_length) length.
  • words_so_far – Experimental feature. Should be ignored.
  • bpe_codes – Codes used for applying BPE encoding.
  • separator – BPE encoding separator.
  • use_unk_class – Add a special class for the unknown word when maxt_text_len == 0.
Returns:

Preprocessed sentences.

preprocessTextFeatures(annotations_list, data_id, set_name, tokenization, build_vocabulary, max_text_len, max_words, offset, fill, min_occ, pad_on_batch, words_so_far, bpe_codes=None, separator='@@', use_unk_class=False)

Preprocess ‘text’ data type: Builds vocabulary (if necessary) and preprocesses the sentences. Also sets Dataset parameters.

Parameters:
  • annotations_list – Path to the sentences to process.
  • data_id – Dataset id of the data.
  • set_name – Name of the current set (‘train’, ‘val’, ‘test’)
  • tokenization – Tokenization to perform.
  • build_vocabulary – Whether we should build a vocabulary for this text or not.
  • max_text_len – Maximum length of the text. If max_text_len == 0, we treat the full sentence as a class.
  • max_words – Maximum number of words to include in the dictionary.
  • offset – Text shifting.
  • fill – Whether we path with zeros at the beginning or at the end of the sentences.
  • min_occ – Minimum occurrences of each word to be included in the dictionary.
  • pad_on_batch – Whether we get sentences with length of the maximum length of the minibatch or sentences with a fixed (max_text_length) length.
  • words_so_far – Experimental feature. Should be ignored.
  • bpe_codes – Codes used for applying BPE encoding.
  • separator – BPE encoding separator.
Returns:

Preprocessed sentences.

preprocessVideoFeatures(path_list, data_id, set_name, max_video_len, img_size, img_size_crop, feat_len)

Preprocess already extracted features from video frames. :param path_list: path to all features in all videos :param data_id: Data id to be processed. :param set_name: Set name to be processed. :param max_video_len: Maximum number of subsampled video features. :param img_size: Size of each frame. :param img_size_crop: Size of each image crop. :param feat_len: Length of each feature. :return:

preprocessVideos(path_list, data_id, set_name, max_video_len, img_size, img_size_crop)

Preprocess videos. Subsample and crop frames. :param path_list: path to all images in all videos :param data_id: Data id to be processed. :param set_name: Set name to be processed. :param max_video_len: Maximum number of subsampled video frames. :param img_size: Size of each frame. :param img_size_crop: Size of each image crop. :return:

removeInput(set_name, id='label', type='categorical')

Deletes an input from the dataset. :param set_name: Set name to remove. :param id: Input to remove id. :param type: Type of the input to remove. :return:

removeOutput(set_name, id='label', type='categorical')

Deletes an output from the dataset. :param set_name: Set name to remove. :param id: Output to remove id. :param type: Type of the output to remove. :return:

replaceInput(data, set_name, data_type, data_id)

Replaces the data in a certain set_name and for a given data_id

resetCounters(set_name='all')

Resets some basic counter indices for the next samples to read.

resize_semantic_output(predictions, ids_out)

Resize semantic output.

setClasses(path_classes, data_id)

Loads the list of classes of the dataset. Each line must contain a unique identifier of the class.

Parameters:
  • path_classes – Path to a text file with the classes or an instance of the class list.
  • data_id – Dataset id
Returns:

None

setInput(path_list, set_name, type='raw-image', id='image', repeat_set=1, required=True, overwrite_split=False, normalization_types=None, data_augmentation_types=None, add_additional=False, img_size=None, img_size_crop=None, use_RGB=True, max_text_len=35, tokenization='tokenize_none', offset=0, fill='end', min_occ=0, pad_on_batch=True, build_vocabulary=False, max_words=0, words_so_far=False, bpe_codes=None, separator='@@', use_unk_class=False, feat_len=1024, max_video_len=26, sparse=False)

Loads a list which can contain all samples from either the ‘train’, ‘val’, or ‘test’ set splits (specified by set_name).

# General parameters

Parameters:
  • use_RGB
  • path_list – can either be a path to a text file containing the paths to the images or a python list of paths
  • set_name – identifier of the set split loaded (‘train’, ‘val’ or ‘test’)
  • type – identifier of the type of input we are loading (accepted types can be seen in self.__accepted_types_inputs)
  • id – identifier of the input data loaded
  • repeat_set – repeats the inputs given (useful when we have more outputs than inputs). Int or array of ints.
  • required – flag for optional inputs
  • overwrite_split – indicates that we want to overwrite the data with id that was already declared in the dataset
  • normalization_types – type of normalization applied to the current input if we activate the data normalization while loading
  • data_augmentation_types – type of data augmentation applied to the current input if we activate the data augmentation while loading
  • add_additional – adds additional data to an already existent input ID

# ‘raw-image’-related parameters

Parameters:
  • img_size – size of the input images (any input image will be resized to this)
  • img_size_crop – size of the cropped zone (when dataAugmentation=False the central crop will be used)

# ‘text’-related parameters

Parameters:
  • tokenization – type of tokenization applied (must be declared as a method of this class) (only applicable when type==’text’).
  • build_vocabulary – whether a new vocabulary will be built from the loaded data or not (only applicable when type==’text’). A previously calculated vocabulary will be used if build_vocabulary is an ‘id’ from a previously loaded input/output
  • max_text_len – maximum text length, the rest of the data will be padded with 0s (only applicable if the output data is of type ‘text’).
  • max_words – a maximum of ‘max_words’ words from the whole vocabulary will be chosen by number or occurrences
  • offset – number of timesteps that the text is shifted to the right (for sequential conditional models, which take as input the previous output)
  • fill – select whether padding before or after the sequence
  • min_occ – minimum number of occurrences allowed for the words in the vocabulary. (default = 0)
  • pad_on_batch – the batch timesteps size will be set to the length of the largest sample +1 if True, max_len will be used as the fixed length otherwise
  • words_so_far – if True, each sample will be represented as the complete set of words until the point defined by the timestep dimension (e.g. t=0 ‘a’, t=1 ‘a dog’, t=2 ‘a dog is’, etc.)
  • bpe_codes – Codes used for applying BPE encoding.
  • separator – BPE encoding separator.

# ‘image-features’ and ‘video-features’- related parameters

Parameters:feat_len – size of the feature vectors for each dimension. We must provide a list if the features are not vectors.

# ‘video’-related parameters :param max_video_len: maximum video length, the rest of the data will be padded with 0s

(only applicable if the input data is of type ‘video’ or video-features’).
setLabels(labels_list, set_name, type='categorical', id='label')

DEPRECATED

setList(path_list, set_name, type='raw-image', id='image')

DEPRECATED

setListGeneral(path_list, split=None, shuffle=True, type='raw-image', id='image')

Deprecated

setOutput(path_list, set_name, type='categorical', id='label', repeat_set=1, overwrite_split=False, add_additional=False, sample_weights=False, label_smoothing=0.0, tokenization='tokenize_none', max_text_len=0, offset=0, fill='end', min_occ=0, pad_on_batch=True, words_so_far=False, build_vocabulary=False, max_words=0, bpe_codes=None, separator='@@', use_unk_class=False, associated_id_in=None, num_poolings=None, sparse=False)

Loads a set of output data.

# General parameters

Parameters:
  • path_list – can either be a path to a text file containing the labels or a python list of labels.
  • set_name – identifier of the set split loaded (‘train’, ‘val’ or ‘test’).
  • type – identifier of the type of input we are loading (accepted types can be seen in self.__accepted_types_outputs).
  • id – identifier of the input data loaded.
  • repeat_set – repeats the outputs given (useful when we have more inputs than outputs). Int or array of ints.
  • overwrite_split – indicates that we want to overwrite the data with id that was already declared in the dataset
  • add_additional – adds additional data to an already existent output ID
  • sample_weights – switch on/off sample weights usage for the current output
  • label_smoothing – epsilon value for label smoothing. See arxiv.org/abs/1512.00567. # ‘text’-related parameters
  • tokenization – type of tokenization applied (must be declared as a method of this class) (only applicable when type==’text’).
  • build_vocabulary – whether a new vocabulary will be built from the loaded data or not (only applicable when type==’text’).
  • max_text_len

    maximum text length, the rest of the data will be padded with 0s (only applicable if the output data is of type ‘text’).

    Set to 0 if the whole sentence will be used as an output class.
  • max_words – a maximum of ‘max_words’ words from the whole vocabulary will be chosen by number or occurrences
  • offset – number of timesteps that the text is shifted to the right (for sequential conditional models, which take as input the previous output)
  • fill – select whether padding before or after the sequence
  • min_occ – minimum number of occurrences allowed for the words in the vocabulary. (default = 0)
  • pad_on_batch – the batch timesteps size will be set to the length of the largest sample +1 if True, max_len will be used as the fixed length otherwise
  • words_so_far – if True, each sample will be represented as the complete set of words until the point defined by the timestep dimension (e.g. t=0 ‘a’, t=1 ‘a dog’, t=2 ‘a dog is’, etc.)
  • bpe_codes – Codes used for applying BPE encoding.
  • separator

    BPE encoding separator.

    # ‘3DLabel’ or ‘3DSemanticLabel’-related parameters

  • associated_id_in – id of the input ‘raw-image’ associated to the inputted 3DLabels or 3DSemanticLabel
  • num_poolings

    number of pooling layers used in the model (used for calculating output dimensions)

    # ‘binary’-related parameters

  • sparse – indicates if the data is stored as a list of lists with class indices, e.g. [[4, 234],[87, 222, 4568],[3],…]
setRawInput(path_list, set_name, type='file-name', id='raw-text', overwrite_split=False)

Loads a list which can contain all samples from either the ‘train’, ‘val’, or ‘test’ set splits (specified by set_name).

# General parameters :param overwrite_split: :param path_list: Path to a text file containing the paths to the images or a python list of paths :param set_name: identifier of the set split loaded (‘train’, ‘val’ or ‘test’) :param type: identifier of the type of input we are loading

(see self.__accepted_types_inputs for accepted types)
Parameters:id – identifier of the input data loaded
setRawOutput(path_list, set_name, type='file-name', id='raw-text', overwrite_split=False, add_additional=False)

Loads a list which can contain all samples from either the ‘train’, ‘val’, or ‘test’ set splits (specified by set_name).

# General parameters :param overwrite_split: :param add_additional: :param path_list: can either be a path to a text file containing the paths to

the images or a python list of paths
Parameters:
  • set_name – identifier of the set split loaded (‘train’, ‘val’ or ‘test’)
  • type – identifier of the type of input we are loading (accepted types can be seen in self.__accepted_types_inputs)
  • id – identifier of the input data loaded
setSemanticClasses(path_classes, data_id)

Loads the list of semantic classes of the dataset together with their corresponding colours in the GT image. Each line must contain a unique identifier of the class and its associated RGB colour representation

separated by commas.
Parameters:
  • path_classes – Path to a text file with the classes and their colours.
  • data_id – input/output id
Returns:

None

setSilence(silence)

Changes the silence mode of the ‘Dataset’ instance.

setTrainMean(mean_image, data_id, normalization=False)

Loads a pre-calculated training mean image, ‘mean_image’ can either be:

  • numpy.array (complete image)
  • list with a value per channel
  • string with the path to the stored image.
Parameters:
  • mean_image
  • normalization
  • data_id – identifier of the type of input whose train mean is being introduced.
shuffleTraining()

Applies a random shuffling to the training samples.

static tokenize_CNN_sentence(caption)

Tokenization employed in the CNN_sentence package (https://github.com/yoonkim/CNN_sentence/blob/master/process_data.py#L97). :param caption: String to tokenize :return: Tokenized version of caption

static tokenize_aggressive(caption, lowercase=True)

Aggressive tokenizer for the input/output data of type ‘text’: * Removes punctuation * Optional lowercasing

Parameters:
  • caption – String to tokenize
  • lowercase – Whether to lowercase the caption or not
Returns:

Tokenized version of caption

static tokenize_basic(caption, lowercase=True)
Basic tokenizer for the input/output data of type ‘text’:
  • Splits punctuation
  • Optional lowercasing
Parameters:
  • caption – String to tokenize
  • lowercase – Whether to lowercase the caption or not
Returns:

Tokenized version of caption

tokenize_bpe(caption)

Applies BPE segmentation (https://github.com/rsennrich/subword-nmt) :param caption: Caption to detokenize. :return: Encoded version of caption.

static tokenize_icann(caption)

Tokenization used for the icann paper: * Removes some punctuation (. , “) * Lowercasing

Parameters:caption – String to tokenize
Returns:Tokenized version of caption
static tokenize_montreal(caption)
Similar to tokenize_icann
  • Removes some punctuation
  • Lowercase
Parameters:caption – String to tokenize
Returns:Tokenized version of caption
tokenize_moses(caption, language='en', lowercase=False, aggressive_dash_splits=False, return_str=True, escape=False)

Applies the Moses tokenization. Relying on sacremoses’ implementation of the Moses tokenizer.

Parameters:
  • caption – Sentence to tokenize
  • language – Language (will build the tokenizer for this language)
  • lowercase – Whether to lowercase or not the sentence
  • agressive_dash_splits – Option to trigger dash split rules .
  • return_str – Return string or list
  • escape – Escape HTML special chars
Returns:

static tokenize_none(caption)

Does not tokenizes the sentences. Only performs a stripping

Parameters:caption – String to tokenize
Returns:Tokenized version of caption
static tokenize_none_char(caption)

Character-level tokenization. Respects all symbols. Separates chars. Inserts <space> sybmol for spaces. If found an escaped char, “&apos;” symbol, it is converted to the original one # List of escaped chars (by moses tokenizer) & -> &amp; | -> &#124; < -> &lt; > -> &gt; ‘ -> &apos; ” -> &quot; [ -> &#91; ] -> &#93; :param caption: String to tokenize :return: Tokenized version of caption

static tokenize_questions(caption)
Basic tokenizer for VQA questions:
  • Lowercasing
  • Splits contractions
  • Removes punctuation
  • Numbers to digits
Parameters:caption – String to tokenize
Returns:Tokenized version of caption
static tokenize_soft(caption, lowercase=True)
Tokenization used for the icann paper:
  • Removes very little punctuation
  • Lowercase
Parameters:
  • caption – String to tokenize
  • lowercase – Whether to lowercase the caption or not
Returns:

Tokenized version of caption

class keras_wrapper.dataset.Homogeneous_Data_Batch_Generator(set_split, net, dataset, num_iterations, batch_size=50, joint_batches=20, normalization=False, normalization_type=None, data_augmentation=True, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, mean_substraction=False, predict=False, random_samples=-1, shuffle=True)

Batch generator class. Retrieves batches of data.

generator()

Gets and processes the data :return: generator with the data

reset()

Resets the counters. :return:

retrieve_maxibatch()

Gets a maxibatch of self.params[‘joint_batches’] * self.batch_size samples. :return:

class keras_wrapper.dataset.Parallel_Data_Batch_Generator(set_split, net, dataset, num_iterations, batch_size=50, normalization=False, normalization_type=None, data_augmentation=True, wo_da_patch_type='whole', da_patch_type='resize_and_rndcrop', da_enhance_list=None, mean_substraction=False, predict=False, random_samples=-1, shuffle=True, temporally_linked=False, init_sample=-1, final_sample=-1, n_parallel_loaders=1)

Batch generator class. Retrieves batches of data.

generator()

Gets and processes the data :return: generator with the data

keras_wrapper.dataset.dataLoad(process_name, net, dataset, max_queue_len, queues)

Parallel data loader. Risky and untested! :param process_name: :param net: :param dataset: :param max_queue_len: :param queues: :return:

keras_wrapper.dataset.loadDataset(dataset_path)

Loads a previously saved Dataset object.

Parameters:dataset_path – Path to the stored Dataset to load
Returns:Loaded Dataset object
keras_wrapper.dataset.saveDataset(dataset, store_path)

Saves a backup of the current Dataset object.

Parameters:
  • dataset – Dataset object to save
  • store_path – Saving path
Returns:

None

cnn_model.py

callbacks_keras_wrapper.py

beam_search_ensemble.py

utils.py

class keras_wrapper.utils.MultiprocessQueue(manager, multiprocess_type='Queue')

Wrapper class for encapsulating the behaviour of some multiprocessing communication structures.

See how Queues and Pipes work in the following link https://docs.python.org/2/library/multiprocessing.html#multiprocessing-examples

keras_wrapper.utils.bbox(img, mode='max')

Returns a bounding box covering all the non-zero area in the image.

Parameters:
  • img – Image on which print the bounding box
  • mode – “width_height” returns width in [2] and height in [3], “max” returns xmax in [2] and ymax in [3]
Returns:

keras_wrapper.utils.build_OneVsAllECOC_Stage(n_classes_ecoc, input_shape, ds, stage1_lr)
Parameters:
  • n_classes_ecoc
  • input_shape
  • ds
  • stage1_lr
Returns:

keras_wrapper.utils.build_OneVsOneECOC_Stage(n_classes_ecoc, input_shape, ds, stage1_lr=0.01, ecoc_version=2)
Parameters:
  • n_classes_ecoc
  • input_shape
  • ds
  • stage1_lr
  • ecoc_version
Returns:

keras_wrapper.utils.build_Specific_OneVsOneECOC_Stage(pairs, input_shape, ds, lr, ecoc_version=2)
Parameters:
  • pairs
  • input_shape
  • ds
  • lr
  • ecoc_version
Returns:

keras_wrapper.utils.build_Specific_OneVsOneECOC_loss_Stage(net, input_net, input_shape, classes, ecoc_version=3, pairs=None, functional_api=False, activations=None)
Parameters:
  • net
  • input_net
  • input_shape
  • classes
  • ecoc_version
  • pairs
  • functional_api
  • activations
Returns:

keras_wrapper.utils.build_Specific_OneVsOneVsRestECOC_Stage(pairs, input_shape, ds, lr, ecoc_version=2)
Parameters:
  • pairs
  • input_shape
  • ds
  • lr
  • ecoc_version
Returns:

keras_wrapper.utils.checkParameters(input_params, default_params, hard_check=False)

Validates a set of input parameters and uses the default ones if not specified.

Parameters:
  • input_params – Input parameters.
  • default_params – Default parameters
  • hard_check – If True, raise exception if a parameter is not valid.
Returns:

keras_wrapper.utils.decode_categorical(preds, index2word, verbose=0)

Decodes predictions :param preds: Predictions codified as the output of a softmax activation function. :param index2word: Mapping from word indices into word characters. :return: List of decoded predictions.

keras_wrapper.utils.decode_multilabel(preds, index2word, min_val=0.5, get_probs=False, verbose=0)

Decodes predictions :param preds: Predictions codified as the output of a softmax activation function. :param index2word: Mapping from word indices into word characters. :param min_val: Minimum value needed for considering a positive prediction. :param get_probs: additionally return probability for each predicted label :param verbose: Verbosity level, by default 0. :return: List of decoded predictions.

keras_wrapper.utils.decode_predictions(preds, temperature, index2word, sampling_type, verbose=0)

Decodes predictions :param preds: Predictions codified as the output of a softmax activation function. :param temperature: Temperature for sampling. :param index2word: Mapping from word indices into word characters. :param sampling_type: ‘max_likelihood’ or ‘multinomial’. :param verbose: Verbosity level, by default 0. :return: List of decoded predictions.

Decodes predictions from the BeamSearch method.

Parameters:
  • preds – Predictions codified as word indices.
  • index2word – Mapping from word indices into word characters.
  • alphas – Attention model weights: Float matrix with shape (I, J) (I: number of target items; J: number of source items).
  • heuristic – Replace unknown words heuristic (0, 1 or 2)
  • x_text – Source text (for unk replacement)
  • unk_symbol – Unknown words symbol
  • pad_sequences – Whether we should make a zero-pad on the input sequence.
  • mapping – Source-target dictionary (for unk_replace heuristics 1 and 2)
  • verbose – Verbosity level, by default 0.
Returns:

List of decoded predictions

keras_wrapper.utils.decode_predictions_one_hot(preds, index2word, pad_sequences=True, verbose=0)

Decodes predictions following a one-hot codification. :param preds: Predictions codified as one-hot vectors. :param index2word: Mapping from word indices into word characters. :param verbose: Verbosity level, by default 0. :return: List of decoded predictions

keras_wrapper.utils.flatten(l)

Flatten a list (more general than flatten_list_of_lists, but also more inefficient :param l: :return:

keras_wrapper.utils.flatten_list_of_lists(list_of_lists)

Flattens a list of lists :param list_of_lists: List of lists :return: Flatten list of lists

keras_wrapper.utils.indices_2_one_hot(indices, n)

Converts a list of indices into one hot codification

Parameters:
  • indices – list of indices
  • n – integer. Size of the vocabulary
Returns:

numpy array with shape (len(indices), n)

keras_wrapper.utils.key_with_max_val(d)
  1. create a list of the dict’s keys and values;
  2. return the key with the max value
keras_wrapper.utils.loadGoogleNetForFood101(nClasses=101, load_path='/media/HDD_2TB/CNN_MODELS/GoogleNet')
Parameters:
  • nClasses
  • load_path
Returns:

keras_wrapper.utils.one_hot_2_indices(preds, pad_sequences=True, verbose=0)

Converts a one-hot codification into a index-based one :param preds: Predictions codified as one-hot vectors. :param pad_sequences: Whether we should pad sequence or not :param verbose: Verbosity level, by default 0. :return: List of convertedpredictions

keras_wrapper.utils.prepareECOCLossOutputs(net, ds, ecoc_table, input_name, output_names, splits=None)
Parameters:
  • net
  • ds
  • ecoc_table
  • input_name
  • output_names
  • splits
Returns:

keras_wrapper.utils.prepareGoogleNet_Food101(model_wrapper)

Prepares the GoogleNet model after its conversion from Caffe :param model_wrapper: :return:

keras_wrapper.utils.prepareGoogleNet_Food101_ECOC_loss(model_wrapper)

Prepares the GoogleNet model for inserting an ECOC structure after removing the last part of the net :param model_wrapper: :return:

keras_wrapper.utils.prepareGoogleNet_Food101_Stage1(model_wrapper)

Prepares the GoogleNet model for serving as the first Stage of a Staged_Netork :param model_wrapper: :return:

keras_wrapper.utils.prepareGoogleNet_Stage2(stage1, stage2)

Removes the second part of the GoogleNet for inserting it into the second stage. :param stage1: :param stage2: :return:

keras_wrapper.utils.print_dict(d, header='')

Formats a dictionary for printing. :param d: Dictionary to print. :return: String containing the formatted dictionary.

keras_wrapper.utils.replace_unknown_words(src_word_seq, trg_word_seq, hard_alignment, unk_symbol, glossary=None, heuristic=0, mapping=None, verbose=0)

Replaces unknown words from the target sentence according to some heuristic. Borrowed from: https://github.com/sebastien-j/LV_groundhog/blob/master/experiments/nmt/replace_UNK.py :param src_word_seq: Source sentence words :param trg_word_seq: Hypothesis words :param hard_alignment: Target-Source alignments :param glossary: Hard-coded substitutions. :param unk_symbol: Symbol in trg_word_seq to replace :param heuristic: Heuristic (0, 1, 2) :param mapping: External alignment dictionary :param verbose: Verbosity level :return: trg_word_seq with replaced unknown words

keras_wrapper.utils.sampling(scores, sampling_type='max_likelihood', temperature=1.0)

Sampling words (each sample is drawn from a categorical distribution). Or picks up words that maximize the likelihood. :param scores: array of size #samples x #classes; every entry determines a score for sample i having class j :param sampling_type: :param temperature: Predictions temperature. The higher, the flatter probabilities. Hence more random outputs. :return: set of indices chosen as output, a vector of size #samples

keras_wrapper.utils.simplifyDataset(ds, id_classes, n_classes=50)
Parameters:
  • ds
  • id_classes
  • n_classes
Returns:

keras_wrapper.utils.to_categorical(y, num_classes=None)

Converts a class vector (integers) to binary class matrix.

E.g. for use with categorical_crossentropy.

# Arguments
y: class vector to be converted into a matrix
(integers from 0 to num_classes).

num_classes: total number of classes.

# Returns
A binary matrix representation of the input.