Utils subpackage melusine.utils

TODO : ADD DESCRIPTION OF THE SUBPACKAGE

List of submodules

TransformerScheduler melusine.utils.transformer_scheduler

Useful class to define its own transformer using specific functions in a specific order to apply along a row of DataFrame (axis=1).

It is compatible with scikit-learn API (i.e. contains fit, transform methods).

class melusine.utils.transformer_scheduler.TransformerScheduler(functions_scheduler, mode='apply', n_jobs=1, progress_bar=True, copy=True, verbose=0)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

This class aims to provide a good way to define its own transformer. It takes a list of function defined in a specific order to apply along a row of DataFrame (axis=1). Transformer returned is compatible with scikit-learn API (i.e. contains fit, transform methods).

Parameters
functions_schedulerlist of tuples, (function, tuple, list)

List of function to be applied in a specific order. Each element of the list has to be defined as follow: (function, argument(s) used by the function (optional), colname(s) returned (optional))

modestr {‘apply’, ‘apply_by_multiprocessing’}, optional

Define mode to apply function along a row axis (axis=1). Default value, ‘apply’. If set to ‘apply_by_multiprocessing’, it uses multiprocessing tool to parallelize computation.

n_jobsint, optional

Number of cores used for computation. Default value, 1.

progress_barboolean, optional

Whether to print a progress bar from tqdm package. Default value, True. Works only when mode is set to ‘apply_by_multiprocessing’.

copyboolean, optional

Make a copy of DataFrame. Default value, True.

verboseint, optional

Verosity mode, print loggers. Default value, 0.

Examples

>>> from melusine.utils.transformer_scheduler import TransformerScheduler
>>> MelusineTransformer = TransformerScheduler(
>>>     functions_scheduler=[
>>>         (my_function_1, (argument1, argument2), ['return_col_A']),
>>>         (my_function_2, None, ['return_col_B', 'return_col_C'])
>>>         (my_function_3, (), ['return_col_D'])
>>>     ])
Attributes
function_scheduler, mode, n_jobs, progress_bar
static apply_dict(X_, func_, args_=None, cols_=None, **kwargs)[source]

Apply a function on a dictionary.

Parameters
X_dict,

Data on which transformations are applied.

args_list or tuple

List of arguments of the function to apply

cols_list or tuple

List of columns created by the transformation

func_func

Function to apply

Returns
dict
static apply_pandas(X_, func_, args_=None, cols_=None, **kwargs)[source]

Apply a function on a pandas DataFrame.

Parameters
X_pandas.DataFrame,

Data on which transformations are applied.

args_list or tuple

List of arguments of the function to apply

cols_list or tuple

List of columns created by the transformation

func_func

Function to apply

Returns
pandas.DataFrame
static apply_pandas_multiprocessing(X_, func_, args_=None, cols_=None, n_jobs=1, progress_bar=False, **kwargs)[source]
fit(X, y=None)[source]

Unused method. Defined only for compatibility with scikit-learn API.

transform(X)[source]

Apply functions defined in the function_scheduler parameter.

Parameters
Xpandas.DataFrame,

Data on which transformations are applied.

Returns
pandas.DataFrame

Multiprocessing melusine.utils.multiprocessing

melusine.utils.multiprocessing.apply_by_multiprocessing(df, func, **kwargs)[source]

Apply a function along an axis of the DataFrame using multiprocessing.

Parameters
dfpd.DataFrame

DataFrame where the function is applied

funcfunction to apply
Returns
pd.DataFrame

Returns the DataFrame with the function applied.

melusine.utils.multiprocessing.apply_df(input_args)[source]

Streamer melusine.utils.streamer

PrintParts melusine.utils.printer

melusine.utils.printer.print_color(text, part=None)[source]

Select according to the tag the right color to use when printing.

melusine.utils.printer.print_color_mail(structured_body)[source]

Highlight the tagged sentences.

Parameters
structured_bodya structured body from process_sent_tag,
Returns
Print the mail by sentence.