evomap.preprocessing
Contents
evomap.preprocessing
#
Useful transformation for data pre-processing.
Module Contents#
Functions#
|
Transform a dissimilarity matrix to a similarity matrix |
|
Transform a similarity matrix to a dissimilarity matrix. |
|
Transform a matrix with co-occurrence counts to a similarity matrix. |
|
Transform an edgelist to a relationship matrix. |
|
Transform a time-indexed edgelist to a sequence of relationship matrices. |
|
Normalize a sequence of dissimilarity matrices by a common factor |
|
Exand list of similarity matrices to equal shape and calculate inclusion vectors. |
|
Caluclate matrix of pairwise distances among the rows of an input |
- evomap.preprocessing.diss2sim(diss_mat, transformation='inverse', eps=0.001)[source]#
Transform a dissimilarity matrix to a similarity matrix
- Parameters
diss_mat (ndarray of shape (n_samples, n_samples)) – Matrix of pairwise dissimilarities.
transformation (str, optional) – Transformation function, either ‘inverse’ or ‘mirror’, by default ‘inverse’
eps (float, optional) – Incremental constant to avoid division by zero, by default 1e-3
- Returns
Matrix of pairwise similarities.
- Return type
ndarray of shape (n_samples, n_samples)
- evomap.preprocessing.sim2diss(sim_mat, transformation='inverse', eps=0.0001)[source]#
Transform a similarity matrix to a dissimilarity matrix.
- Parameters
sim_mat (ndarray of shape (n_samples, n_samples)) – Matrix of pairwise similarities
transformation (str, optional) – Transformation function, either ‘inverse’ or ‘mirror’, by default ‘inverse’
eps (float, optional) – Incremental constant to avoid division by zero, by default 1e-3
- Returns
Matrix of pairwise dissimilarities.
- Return type
ndaray of shape (n_samples, n_samples)
- evomap.preprocessing.coocc2sim(coocc_mat)[source]#
Transform a matrix with co-occurrence counts to a similarity matrix.
- Parameters
coocc_mat (ndarray of shape (n_samples, n_samples)) – Matrix of co-occurrence counts.
- Returns
Matrix of pairwise similarities.
- Return type
ndarray of shape (n_samples, n_samples)
- evomap.preprocessing.edgelist2matrix(df, score_var, id_var_i, id_var_j, time_var=None, time_selected=None)[source]#
Transform an edgelist to a relationship matrix.
- Parameters
df (DataFrame) – Data containing the edgelist. Each row should include a pair. Needs to include two id variables and a score variable. Can also include a time variable.
score_var (string) – The score variable.
id_var_i (string) – The first id variable.
id_var_j (string) – The second id variable.
time_var (string, optional) – The time variable (int), by default None
time_selected (int, optional) – The selected time, by default None
- Returns
S (ndarray of shape (n_samples, n_samples)) – A matrix of pairwise relationships.
ids (ndarray of shape (n_samles, )) – Identifiers for each element of the matrix.
- evomap.preprocessing.edgelist2matrices(df, score_var, id_var_i, id_var_j, time_var)[source]#
Transform a time-indexed edgelist to a sequence of relationship matrices.
- Parameters
df (DataFrame) – Data containing the edgelist. Each row should include a pair. Needs to include two id variables, a score variable, and a time variable.
score_var (string) – The score variable.
id_var_i (string) – The first id variable.
id_var_j (string) – The second id variable.
time_var (string) – The time variable (int)
- Returns
S_t (list of ndarrays of shape (n_samples, n_samples) with length (n_periods)) – A sequence of relationship matrices.
ids_t (ndarray of shape (n_samles, )) – Identifiers for each element of the matrix.
- evomap.preprocessing.normalize_diss_mats(D_ts)[source]#
Normalize a sequence of dissimilarity matrices by a common factor (the max. dissimilarity within the sequence).
- Parameters
D_ts (list of ndarrays, each of shape (n_samples, n_samples)) – Sequence of dissimilarity matrices.
- Returns
D_ts – Sequence of dissimilarity matrices, normalized by the maximum dissimilarity within the input sequence.
- Return type
ndarray of shape (n_samples, n_samples)
- evomap.preprocessing.expand_matrices(Xts, names_t)[source]#
Exand list of similarity matrices to equal shape and calculate inclusion vectors.
Args:
- Returns
list of similarity matrices (equal size), list of inclusion vectors (0/1) and list of all labels.
- Return type
(list, list, list)
- evomap.preprocessing.calc_distances(X, metric='euclidean')[source]#
Caluclate matrix of pairwise distances among the rows of an input matrix.
- Parameters
X (ndarray of shape (n_samples, n_dims)) – Input matrix.
metric (string) –
The distance metric to use. Can be any of ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulsinski’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.
- Returns:
ndarray of shape (n_samples, n_samples): Matrix of pairwise distances.