evomap.preprocessing
Contents
evomap.preprocessing
#
Useful transformation for data pre-processing.
Module Contents#
Functions#
|
Transform a distance matrix to a similarity matrix |
|
Transform a similarity matrix to a distance matrix. |
|
Transform a matrix with co-occurrence counts to a similarity matrix. |
|
Transform an edgelist to a relationship matrix. |
|
Normalize a sequence of distance matrices by a common factor |
|
Exand list of similarity matrices to equal shape and calculate inclusion vectors. |
|
Caluclate matrix of pairwise distances among the rows of an input |
- evomap.preprocessing.dist2sim(dist_mat, transformation='inverse', eps=0.001)[source]#
Transform a distance matrix to a similarity matrix
- Parameters
dist_mat (ndarray of shape (n_samples, n_samples)) – Matrix of pairwise distances.
transformation (str, optional) – Transformation function, either ‘inverse’ or ‘mirror’, by default ‘inverse’
eps (float, optional) – Incremental constant to avoid division by zero, by default 1e-3
- Returns
Matrix of pairwise similarities.
- Return type
ndarray of shape (n_samples, n_samples)
- evomap.preprocessing.sim2dist(sim_mat, transformation='inverse', eps=0.0001)[source]#
Transform a similarity matrix to a distance matrix.
- Parameters
sim_mat (ndarray of shape (n_samples, n_samples)) – Matrix of pairwise similarities
transformation (str, optional) – Transformation function, either ‘inverse’ or ‘mirror’, by default ‘inverse’
eps (float, optional) – Incremental constant to avoid division by zero, by default 1e-3
- Returns
Matrix of pairwise distances.
- Return type
ndaray of shape (n_samples, n_samples)
- evomap.preprocessing.coocc2sim(coocc_mat)[source]#
Transform a matrix with co-occurrence counts to a similarity matrix.
- Parameters
coocc_mat (ndarray of shape (n_samples, n_samples)) – Matrix of co-occurrence counts.
- Returns
Matrix of pairwise similarities.
- Return type
ndarray of shape (n_samples, n_samples)
- evomap.preprocessing.edgelist2matrix(df, score_var, id_var_i, id_var_j, time_var=None, time_selected=None)[source]#
Transform an edgelist to a relationship matrix.
- Parameters
df (DataFrame) – Data containing the edgelist. Each row should include a pair. Needs to include two id variables and a score variable. Can also include a time variable.
score_var (string) – The score variable.
id_var_i (string) – The first id variable.
id_var_j (string) – The second id variable.
time_var (string, optional) – The time variable (int), by default None
time_selected (int, optional) – The selected time, by default None
- Returns
S (ndarray of shape (n_samples, n_samples)) – A matrix of pairwise relationships.
ids (ndarray of shape (n_samles, )) – Identifiers for each element of the matrix.
- evomap.preprocessing.normalize_dist_mats(D_ts)[source]#
Normalize a sequence of distance matrices by a common factor (the max. distance within the sequence).
- Parameters
D_ts (list of ndarrays, each of shape (n_samples, n_samples)) – Sequence of distance matrices.
- Returns
D_ts – Sequence of distance matrices, normalized by the maximum distance within the input sequence.
- Return type
ndarray of shape (n_samples, n_samples)
- evomap.preprocessing.expand_matrices(Xts, names_t)[source]#
Exand list of similarity matrices to equal shape and calculate inclusion vectors.
Args:
- Returns
list of similarity matrices (equal size), list of inclusion vectors (0/1) and list of all labels.
- Return type
(list, list, list)
- evomap.preprocessing.calc_distances(X, metric='euclidean')[source]#
Caluclate matrix of pairwise distances among the rows of an input matrix.
- Parameters
X (ndarray of shape (n_samples, n_dims)) – Input matrix.
metric (string) –
The distance metric to use. Can be any of ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulsinski’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.
- Returns:
ndarray of shape (n_samples, n_samples): Matrix of pairwise distances.