Models subpackage `melusine.models`¶

TODO : ADD DESCRIPTION OF THE SUBPACKAGE

List of submodules¶

NeuralArchitectures melusine.models.neural_architectures
Train melusine.models.train

NeuralArchitectures `melusine.models.neural_architectures`¶

melusine.models.neural_architectures.bert_model(ntargets=18, seq_max=100, nb_meta=134, loss='categorical_crossentropy', activation='softmax', bert_model='jplu/tf-camembert-base')[source]¶

Pre-defined architecture of a pre-trained Bert model.

Parameters

ntargetsint, optional: Dimension of model output. Default value, 18.
seq_maxint, optional: Maximum input length. Default value, 100.
nb_metaint, optional: Dimension of meta data input. Default value, 252.
lossstr, optional: Loss function for training. Default value, ‘categorical_crossentropy’.
activationstr, optional: Activation function. Default value, ‘softmax’.
bert_modelstr, optional: Model name from HuggingFace library or path to local model Only Camembert and Flaubert supported Default value, ‘camembert-base’

Returns

Model instance

melusine.models.neural_architectures.cnn_model(embedding_matrix_init, ntargets, seq_max, nb_meta, loss='categorical_crossentropy', activation='softmax')[source]¶

Pre-defined architecture of a CNN model.

Parameters

embedding_matrix_initnp.array,: Pretrained embedding matrix.
ntargetsint, optional: Dimension of model output. Default value, 18.
seq_maxint, optional: Maximum input length. Default value, 100.
nb_metaint, optional: Dimension of meta data input. Default value, 252.
lossstr, optional: Loss function for training. Default value, ‘categorical_crossentropy’.
activationstr, optional: Activation function. Default value, ‘softmax’.

Returns

Model instance

melusine.models.neural_architectures.rnn_model(embedding_matrix_init, ntargets=18, seq_max=100, nb_meta=252, loss='categorical_crossentropy', activation='softmax')[source]¶

Pre-defined architecture of a RNN model.

Parameters

embedding_matrix_initnp.array,: Pretrained embedding matrix.
ntargetsint, optional: Dimension of model output. Default value, 18.
seq_maxint, optional: Maximum input length. Default value, 100.
nb_metaint, optional: Dimension of meta data input. Default value, 252.
lossstr, optional: Loss function for training. Default value, ‘categorical_crossentropy’.
activationstr, optional: Activation function. Default value, ‘softmax’.

Returns

Model instance

melusine.models.neural_architectures.transformers_model(embedding_matrix_init, ntargets=18, seq_max=100, nb_meta=134, loss='categorical_crossentropy', activation='softmax')[source]¶

Pre-defined architecture of a Transformer model.

Parameters

embedding_matrix_initnp.array,: Pretrained embedding matrix.
ntargetsint, optional: Dimension of model output. Default value, 18.
seq_maxint, optional: Maximum input length. Default value, 100.
nb_metaint, optional: Dimension of meta data input. Default value, 252.
lossstr, optional: Loss function for training. Default value, ‘categorical_crossentropy’.
activationstr, optional: Activation function. Default value, ‘softmax’.

Returns

Model instance

Train `melusine.models.train`¶

class melusine.models.train.NeuralModel[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

Generic class for neural models.

It is compatible with scikit-learn API (i.e. contains fit, transform methods).

Parameters

neural_architecture_functionfunction,

Function which returns a Model instance from Keras. Implemented model functions are: cnn_model, rnn_model, transformers_model, bert_model

pretrained_embeddingnp.array,

Pretrained embedding matrix.

text_input_columnstr,

Input text column to consider for the model.

meta_input_listlist, optional

List of the names of the columns containing the metadata. If empty list or None the model is used without metadata Default value, [‘extension’, ‘dayofweek’, ‘hour’, ‘min’].

vocab_sizeint, optional

Size of vocabulary for neurol network model. Default value, 25000.

seq_sizeint, optional

Maximum size of input for neural model. Default value, 100.

lossstr, optional

Loss function for training. Default value, ‘categorical_crossentropy’.

activationstr, optional

Activation function. Default value, ‘softmax’.

batch_sizeint, optional

Size of batches for the training of the neural network model. Default value, 4096.

n_epochsint, optional

Number of epochs for the training of the neural network model. Default value, 15.

bert_tokenizerstr, optional

Tokenizer name from HuggingFace library or path to local tokenizer Only Camembert and Flaubert supported Default value, ‘camembert-base’

bert_modelstr, optional

Model name from HuggingFace library or path to local model

Only Camembert and Flaubert supported Default value, ‘camembert-base’

Examples

>>> from melusine.models.train import NeuralModel
>>> from melusine.models.neural_architectures import cnn_model
>>> from melusine.nlp_tools.embedding import Embedding
>>> pretrained_embedding = Embedding.load()
>>> list_meta = ['extension', 'dayofweek', 'hour']
>>> nn_model = NeuralModel(cnn_model, pretrained_embedding, list_meta)  #noqa
>>> nn_model.fit(X_train, y_train)  #noqa
>>> y_res = nn_model.predict(X_test)  #noqa

Attributes

architecture_function, pretrained_embedding, text_input_column,
meta_input_list, vocab_size, seq_size, loss, batch_size, n_epochs,
modelModel instance from Keras,
tokenizerTokenizer instance from Melusine,
embedding_matrixnp.array,: Embedding matrix used as input for the neural network model.

fit(X_train, y_train, tensorboard_log_dir=None, validation_data=None, **kwargs)[source]¶

Fit the neural network model on X and y. If meta_input list is empty list or None the model is used without metadata.

Compatible with scikit-learn API.

Parameters

X_trainpd.DataFrame
y_trainpd.Series
tensorboard_log_dirstr: If not None, will be used as path to write logs for tensorboard Tensordboard callback parameters can be changed in config file
validation_data: tuple: Tuple of validation data Data on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data. This could be a tuple (x_val, y_val). validation_data will override validation_split. Default value, None.

Returns

selfobject: Returns the instance

load_nn_model(filepath)[source]¶: Save model from json and load weights from .h5.

predict(X, **kwargs)[source]¶

Returns the class predicted.

Parameters

Xpd.DataFrame

Returns

int

predict_proba(X, **kwargs)[source]¶

Returns the probabilities associated to each classes. If meta_input list is empty list or None the model is used without metadata.

Parameters

Xpd.DataFrame
prediction_intervalfloat, optional: between [0,1], the confidence level of the interval. Only available with tensorflow-probability models.
Returns
——-
scorenp.array: The estimation of probability for each category.
infnp.array, optional: The upper bound of the estimation of probability. Only provided if prediction_interval exists.
supnp.array, optional: The lower bound of the estimation of probability. Only provided if prediction_interval exists.

prepare_email_to_predict(X)[source]¶

Returns the email as a compatible shape wich depends on the type of neural model

Parameters

Xpd.DataFrame

Returns

list: List of the inputs to the neural model Either [X_seq] if no metadata Or [X_seq, X_meta] if metadata Or [X_seq, X_attention, X_meta] if Bert model

save_nn_model(filepath)[source]¶: Save model to pickle, json and save weights to .h5.

tokens_to_indices(tokens)[source]¶: Input : list of tokens [“ma”, “carte_verte”, …] Output : list of indices [46, 359, …]

Models subpackage melusine.models¶

List of submodules¶

NeuralArchitectures melusine.models.neural_architectures¶

Train melusine.models.train¶

Models subpackage `melusine.models`¶

NeuralArchitectures `melusine.models.neural_architectures`¶

Train `melusine.models.train`¶