Models subpackage melusine.models
¶
TODO : ADD DESCRIPTION OF THE SUBPACKAGE
List of submodules¶
NeuralArchitectures melusine.models.neural_architectures
¶
- melusine.models.neural_architectures.bert_model(ntargets=18, seq_max=100, nb_meta=134, loss='categorical_crossentropy', activation='softmax', bert_model='jplu/tf-camembert-base')[source]¶
Pre-defined architecture of a pre-trained Bert model.
- Parameters
- ntargetsint, optional
Dimension of model output. Default value, 18.
- seq_maxint, optional
Maximum input length. Default value, 100.
- nb_metaint, optional
Dimension of meta data input. Default value, 252.
- lossstr, optional
Loss function for training. Default value, ‘categorical_crossentropy’.
- activationstr, optional
Activation function. Default value, ‘softmax’.
- bert_modelstr, optional
Model name from HuggingFace library or path to local model Only Camembert and Flaubert supported Default value, ‘camembert-base’
- Returns
- Model instance
- melusine.models.neural_architectures.cnn_model(embedding_matrix_init, ntargets, seq_max, nb_meta, loss='categorical_crossentropy', activation='softmax')[source]¶
Pre-defined architecture of a CNN model.
- Parameters
- embedding_matrix_initnp.array,
Pretrained embedding matrix.
- ntargetsint, optional
Dimension of model output. Default value, 18.
- seq_maxint, optional
Maximum input length. Default value, 100.
- nb_metaint, optional
Dimension of meta data input. Default value, 252.
- lossstr, optional
Loss function for training. Default value, ‘categorical_crossentropy’.
- activationstr, optional
Activation function. Default value, ‘softmax’.
- Returns
- Model instance
- melusine.models.neural_architectures.rnn_model(embedding_matrix_init, ntargets=18, seq_max=100, nb_meta=252, loss='categorical_crossentropy', activation='softmax')[source]¶
Pre-defined architecture of a RNN model.
- Parameters
- embedding_matrix_initnp.array,
Pretrained embedding matrix.
- ntargetsint, optional
Dimension of model output. Default value, 18.
- seq_maxint, optional
Maximum input length. Default value, 100.
- nb_metaint, optional
Dimension of meta data input. Default value, 252.
- lossstr, optional
Loss function for training. Default value, ‘categorical_crossentropy’.
- activationstr, optional
Activation function. Default value, ‘softmax’.
- Returns
- Model instance
- melusine.models.neural_architectures.transformers_model(embedding_matrix_init, ntargets=18, seq_max=100, nb_meta=134, loss='categorical_crossentropy', activation='softmax')[source]¶
Pre-defined architecture of a Transformer model.
- Parameters
- embedding_matrix_initnp.array,
Pretrained embedding matrix.
- ntargetsint, optional
Dimension of model output. Default value, 18.
- seq_maxint, optional
Maximum input length. Default value, 100.
- nb_metaint, optional
Dimension of meta data input. Default value, 252.
- lossstr, optional
Loss function for training. Default value, ‘categorical_crossentropy’.
- activationstr, optional
Activation function. Default value, ‘softmax’.
- Returns
- Model instance
Train melusine.models.train
¶
- class melusine.models.train.NeuralModel[source]¶
Bases:
sklearn.base.BaseEstimator
,sklearn.base.ClassifierMixin
Generic class for neural models.
It is compatible with scikit-learn API (i.e. contains fit, transform methods).
- Parameters
- neural_architecture_functionfunction,
Function which returns a Model instance from Keras. Implemented model functions are: cnn_model, rnn_model, transformers_model, bert_model
- pretrained_embeddingnp.array,
Pretrained embedding matrix.
- text_input_columnstr,
Input text column to consider for the model.
- meta_input_listlist, optional
List of the names of the columns containing the metadata. If empty list or None the model is used without metadata Default value, [‘extension’, ‘dayofweek’, ‘hour’, ‘min’].
- vocab_sizeint, optional
Size of vocabulary for neurol network model. Default value, 25000.
- seq_sizeint, optional
Maximum size of input for neural model. Default value, 100.
- lossstr, optional
Loss function for training. Default value, ‘categorical_crossentropy’.
- activationstr, optional
Activation function. Default value, ‘softmax’.
- batch_sizeint, optional
Size of batches for the training of the neural network model. Default value, 4096.
- n_epochsint, optional
Number of epochs for the training of the neural network model. Default value, 15.
- bert_tokenizerstr, optional
Tokenizer name from HuggingFace library or path to local tokenizer Only Camembert and Flaubert supported Default value, ‘camembert-base’
- bert_modelstr, optional
Model name from HuggingFace library or path to local model
Only Camembert and Flaubert supported Default value, ‘camembert-base’
Examples
>>> from melusine.models.train import NeuralModel >>> from melusine.models.neural_architectures import cnn_model >>> from melusine.nlp_tools.embedding import Embedding >>> pretrained_embedding = Embedding.load() >>> list_meta = ['extension', 'dayofweek', 'hour'] >>> nn_model = NeuralModel(cnn_model, pretrained_embedding, list_meta) #noqa >>> nn_model.fit(X_train, y_train) #noqa >>> y_res = nn_model.predict(X_test) #noqa
- Attributes
- architecture_function, pretrained_embedding, text_input_column,
- meta_input_list, vocab_size, seq_size, loss, batch_size, n_epochs,
- modelModel instance from Keras,
- tokenizerTokenizer instance from Melusine,
- embedding_matrixnp.array,
Embedding matrix used as input for the neural network model.
- fit(X_train, y_train, tensorboard_log_dir=None, validation_data=None, **kwargs)[source]¶
Fit the neural network model on X and y. If meta_input list is empty list or None the model is used without metadata.
Compatible with scikit-learn API.
- Parameters
- X_trainpd.DataFrame
- y_trainpd.Series
- tensorboard_log_dirstr
If not None, will be used as path to write logs for tensorboard Tensordboard callback parameters can be changed in config file
- validation_data: tuple
Tuple of validation data Data on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data. This could be a tuple (x_val, y_val). validation_data will override validation_split. Default value, None.
- Returns
- selfobject
Returns the instance
- predict_proba(X, **kwargs)[source]¶
Returns the probabilities associated to each classes. If meta_input list is empty list or None the model is used without metadata.
- Parameters
- Xpd.DataFrame
- prediction_intervalfloat, optional
between [0,1], the confidence level of the interval. Only available with tensorflow-probability models.
- Returns
- ——-
- scorenp.array
The estimation of probability for each category.
- infnp.array, optional
The upper bound of the estimation of probability. Only provided if prediction_interval exists.
- supnp.array, optional
The lower bound of the estimation of probability. Only provided if prediction_interval exists.
- prepare_email_to_predict(X)[source]¶
Returns the email as a compatible shape wich depends on the type of neural model
- Parameters
- Xpd.DataFrame
- Returns
- list
List of the inputs to the neural model Either [X_seq] if no metadata Or [X_seq, X_meta] if metadata Or [X_seq, X_attention, X_meta] if Bert model