evomap.preprocessing#

Useful transformation for data pre-processing.

Module Contents#

Functions#

dist2sim(dist_mat[, transformation, eps])

Transform a distance matrix to a similarity matrix

sim2dist(sim_mat[, transformation, eps])

Transform a similarity matrix to a distance matrix.

coocc2sim(coocc_mat)

Transform a matrix with co-occurrence counts to a similarity matrix.

edgelist2matrix(df, score_var, id_var_i, id_var_j[, ...])

Transform an edgelist to a relationship matrix.

normalize_dist_mat(D)

normalize_dist_mats(D_ts)

Normalize a sequence of distance matrices by a common factor

expand_matrices(Xts, names_t)

Exand list of similarity matrices to equal shape and calculate inclusion vectors.

calc_distances(X[, metric])

Caluclate matrix of pairwise distances among the rows of an input

evomap.preprocessing.dist2sim(dist_mat, transformation='inverse', eps=0.001)[source]#

Transform a distance matrix to a similarity matrix

Parameters
  • dist_mat (ndarray of shape (n_samples, n_samples)) – Matrix of pairwise distances.

  • transformation (str, optional) – Transformation function, either ‘inverse’ or ‘mirror’, by default ‘inverse’

  • eps (float, optional) – Incremental constant to avoid division by zero, by default 1e-3

Returns

Matrix of pairwise similarities.

Return type

ndarray of shape (n_samples, n_samples)

evomap.preprocessing.sim2dist(sim_mat, transformation='inverse', eps=0.0001)[source]#

Transform a similarity matrix to a distance matrix.

Parameters
  • sim_mat (ndarray of shape (n_samples, n_samples)) – Matrix of pairwise similarities

  • transformation (str, optional) – Transformation function, either ‘inverse’ or ‘mirror’, by default ‘inverse’

  • eps (float, optional) – Incremental constant to avoid division by zero, by default 1e-3

Returns

Matrix of pairwise distances.

Return type

ndaray of shape (n_samples, n_samples)

evomap.preprocessing.coocc2sim(coocc_mat)[source]#

Transform a matrix with co-occurrence counts to a similarity matrix.

Parameters

coocc_mat (ndarray of shape (n_samples, n_samples)) – Matrix of co-occurrence counts.

Returns

Matrix of pairwise similarities.

Return type

ndarray of shape (n_samples, n_samples)

evomap.preprocessing.edgelist2matrix(df, score_var, id_var_i, id_var_j, time_var=None, time_selected=None)[source]#

Transform an edgelist to a relationship matrix.

Parameters
  • df (DataFrame) – Data containing the edgelist. Each row should include a pair. Needs to include two id variables and a score variable. Can also include a time variable.

  • score_var (string) – The score variable.

  • id_var_i (string) – The first id variable.

  • id_var_j (string) – The second id variable.

  • time_var (string, optional) – The time variable (int), by default None

  • time_selected (int, optional) – The selected time, by default None

Returns

  • S (ndarray of shape (n_samples, n_samples)) – A matrix of pairwise relationships.

  • ids (ndarray of shape (n_samles, )) – Identifiers for each element of the matrix.

evomap.preprocessing.normalize_dist_mat(D)[source]#
evomap.preprocessing.normalize_dist_mats(D_ts)[source]#

Normalize a sequence of distance matrices by a common factor (the max. distance within the sequence).

Parameters

D_ts (list of ndarrays, each of shape (n_samples, n_samples)) – Sequence of distance matrices.

Returns

D_ts – Sequence of distance matrices, normalized by the maximum distance within the input sequence.

Return type

ndarray of shape (n_samples, n_samples)

evomap.preprocessing.expand_matrices(Xts, names_t)[source]#

Exand list of similarity matrices to equal shape and calculate inclusion vectors.

Args:

Returns

list of similarity matrices (equal size), list of inclusion vectors (0/1) and list of all labels.

Return type

(list, list, list)

evomap.preprocessing.calc_distances(X, metric='euclidean')[source]#

Caluclate matrix of pairwise distances among the rows of an input matrix.

Parameters
  • X (ndarray of shape (n_samples, n_dims)) – Input matrix.

  • metric (string) –

    The distance metric to use. Can be any of ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulsinski’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.

    Returns:

    ndarray of shape (n_samples, n_samples): Matrix of pairwise distances.