evomap.preprocessing#

Useful transformation for data pre-processing.

Module Contents#

Functions#

diss2sim(diss_mat[, transformation, eps])

Transform a dissimilarity matrix to a similarity matrix

sim2diss(sim_mat[, transformation, eps])

Transform a similarity matrix to a dissimilarity matrix.

coocc2sim(coocc_mat)

Transform a matrix with co-occurrence counts to a similarity matrix.

edgelist2matrix(df, score_var, id_var_i, id_var_j[, ...])

Transform an edgelist to a relationship matrix.

edgelist2matrices(df, score_var, id_var_i, id_var_j, ...)

Transform a time-indexed edgelist to a sequence of relationship matrices.

normalize_diss_mat(D)

normalize_diss_mats(D_ts)

Normalize a sequence of dissimilarity matrices by a common factor

expand_matrices(Xts, names_t)

Exand list of similarity matrices to equal shape and calculate inclusion vectors.

calc_distances(X[, metric])

Caluclate matrix of pairwise distances among the rows of an input

evomap.preprocessing.diss2sim(diss_mat, transformation='inverse', eps=0.001)[source]#

Transform a dissimilarity matrix to a similarity matrix

Parameters
  • diss_mat (ndarray of shape (n_samples, n_samples)) – Matrix of pairwise dissimilarities.

  • transformation (str, optional) – Transformation function, either ‘inverse’ or ‘mirror’, by default ‘inverse’

  • eps (float, optional) – Incremental constant to avoid division by zero, by default 1e-3

Returns

Matrix of pairwise similarities.

Return type

ndarray of shape (n_samples, n_samples)

evomap.preprocessing.sim2diss(sim_mat, transformation='inverse', eps=0.0001)[source]#

Transform a similarity matrix to a dissimilarity matrix.

Parameters
  • sim_mat (ndarray of shape (n_samples, n_samples)) – Matrix of pairwise similarities

  • transformation (str, optional) – Transformation function, either ‘inverse’ or ‘mirror’, by default ‘inverse’

  • eps (float, optional) – Incremental constant to avoid division by zero, by default 1e-3

Returns

Matrix of pairwise dissimilarities.

Return type

ndaray of shape (n_samples, n_samples)

evomap.preprocessing.coocc2sim(coocc_mat)[source]#

Transform a matrix with co-occurrence counts to a similarity matrix.

Parameters

coocc_mat (ndarray of shape (n_samples, n_samples)) – Matrix of co-occurrence counts.

Returns

Matrix of pairwise similarities.

Return type

ndarray of shape (n_samples, n_samples)

evomap.preprocessing.edgelist2matrix(df, score_var, id_var_i, id_var_j, time_var=None, time_selected=None)[source]#

Transform an edgelist to a relationship matrix.

Parameters
  • df (DataFrame) – Data containing the edgelist. Each row should include a pair. Needs to include two id variables and a score variable. Can also include a time variable.

  • score_var (string) – The score variable.

  • id_var_i (string) – The first id variable.

  • id_var_j (string) – The second id variable.

  • time_var (string, optional) – The time variable (int), by default None

  • time_selected (int, optional) – The selected time, by default None

Returns

  • S (ndarray of shape (n_samples, n_samples)) – A matrix of pairwise relationships.

  • ids (ndarray of shape (n_samles, )) – Identifiers for each element of the matrix.

evomap.preprocessing.edgelist2matrices(df, score_var, id_var_i, id_var_j, time_var)[source]#

Transform a time-indexed edgelist to a sequence of relationship matrices.

Parameters
  • df (DataFrame) – Data containing the edgelist. Each row should include a pair. Needs to include two id variables, a score variable, and a time variable.

  • score_var (string) – The score variable.

  • id_var_i (string) – The first id variable.

  • id_var_j (string) – The second id variable.

  • time_var (string) – The time variable (int)

Returns

  • S_t (list of ndarrays of shape (n_samples, n_samples) with length (n_periods)) – A sequence of relationship matrices.

  • ids_t (ndarray of shape (n_samles, )) – Identifiers for each element of the matrix.

evomap.preprocessing.normalize_diss_mat(D)[source]#
evomap.preprocessing.normalize_diss_mats(D_ts)[source]#

Normalize a sequence of dissimilarity matrices by a common factor (the max. dissimilarity within the sequence).

Parameters

D_ts (list of ndarrays, each of shape (n_samples, n_samples)) – Sequence of dissimilarity matrices.

Returns

D_ts – Sequence of dissimilarity matrices, normalized by the maximum dissimilarity within the input sequence.

Return type

ndarray of shape (n_samples, n_samples)

evomap.preprocessing.expand_matrices(Xts, names_t)[source]#

Exand list of similarity matrices to equal shape and calculate inclusion vectors.

Args:

Returns

list of similarity matrices (equal size), list of inclusion vectors (0/1) and list of all labels.

Return type

(list, list, list)

evomap.preprocessing.calc_distances(X, metric='euclidean')[source]#

Caluclate matrix of pairwise distances among the rows of an input matrix.

Parameters
  • X (ndarray of shape (n_samples, n_dims)) – Input matrix.

  • metric (string) –

    The distance metric to use. Can be any of ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulsinski’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.

    Returns:

    ndarray of shape (n_samples, n_samples): Matrix of pairwise distances.