evomap.preprocessing#

Module for data pre-processing, including transformations between different data formats.

Functions#

`diss2sim`(diss_mat[, transformation, eps])	Transform a dissimilarity matrix to a similarity matrix.
`sim2diss`(sim_mat[, transformation, eps])	Transform a similarity matrix to a dissimilarity matrix.
`coocc2sim`(coocc_mat)	Transform a matrix with co-occurrence counts to a similarity matrix.
`edgelist2matrix`(df, score_var, id_var_i, id_var_j[, ...])	Transform an edgelist to a relationship matrix.
`edgelist2matrices`(df, score_var, id_var_i, id_var_j, ...)	Transform a time-indexed edgelist into a sequence of relationship matrices.
`normalize_diss_mat`(D)	Normalize a dissimilarity matrix by the maximum dissimilarity observed in the matrix.
`normalize_diss_mats`(D_ts)	Normalize a sequence of dissimilarity matrices by a common factor
`expand_matrices`(X_ts, labels_ts)	Expand a list of similarity matrices (X_ts) to equal shape and calculate inclusion vectors.
`calc_distances`(X[, metric])	Calculate matrix of pairwise distances among the rows of an input matrix.

Module Contents#

evomap.preprocessing.diss2sim(diss_mat, transformation='inverse', eps=0.001)[source]#

Transform a dissimilarity matrix to a similarity matrix.

Parameters:

diss_mat (ndarray of shape (n_samples, n_samples)) – Matrix of pairwise dissimilarities.
transformation (str, optional) – Transformation function, either ‘inverse’ or ‘mirror’, by default ‘inverse’
eps (float, optional) – Incremental constant to avoid division by zero, by default 1e-3

Returns:

Matrix of pairwise similarities.

Return type:

ndarray of shape (n_samples, n_samples)

evomap.preprocessing.sim2diss(sim_mat, transformation='inverse', eps=0.0001)[source]#

Transform a similarity matrix to a dissimilarity matrix.

Parameters:

sim_mat (ndarray of shape (n_samples, n_samples)) – Matrix of pairwise similarities.
transformation (str, optional) – Transformation function, either ‘inverse’ or ‘mirror’, by default ‘inverse’. ‘inverse’ - Transforms by taking the reciprocal of the similarity scores. ‘mirror’ - Transforms by reflecting the similarity scores about 0.5 (1 - similarity).
eps (float, optional) – Incremental constant to avoid division by zero, by default 1e-4

Returns:

Matrix of pairwise dissimilarities.

Return type:

ndarray of shape (n_samples, n_samples)

evomap.preprocessing.coocc2sim(coocc_mat)[source]#

Transform a matrix with co-occurrence counts to a similarity matrix.

Parameters:: coocc_mat (ndarray of shape (n_samples, n_samples)) – Matrix of co-occurrence counts. Assumes a non-negative, symmetric matrix where the diagonal can be ignored (usually representing self-co-occurrence).
Returns:: Matrix of pairwise similarities, normalized such that each element is a proportion of the maximum co-occurrence for that row.
Return type:: ndarray of shape (n_samples, n_samples)

Notes

The function sets the diagonal to zero to prevent self-similarity from skewing the results. If a row’s total co-occurrence is zero, it sets the entire row’s similarity to zero to avoid division by zero.

evomap.preprocessing.edgelist2matrix(df, score_var, id_var_i, id_var_j, time_var=None, time_selected=None)[source]#

Transform an edgelist to a relationship matrix. :param df: Data containing the edgelist. Each row should include a pair. Must contain

two id variables and a score variable. Can optionally include a time variable.

Parameters:

score_var (string) – The score variable.
id_var_i (string) – The first id variable.
id_var_j (string) – The second id variable.
time_var (string, optional) – The time variable, by default None.
time_selected (int, optional) – The selected time, by default None.

Returns:

S (ndarray of shape (n_samples, n_samples)) – A matrix of pairwise relationships.
ids (ndarray) – Identifiers for each element of the matrix.

Raises:

ValueError: – If required columns are missing in the DataFrame.

evomap.preprocessing.edgelist2matrices(df, score_var, id_var_i, id_var_j, time_var)[source]#

Transform a time-indexed edgelist into a sequence of relationship matrices.

Parameters:

df (DataFrame) – Data containing the edgelist. Each row should include a pair and must contain two id variables, a score variable, and a time variable.
score_var (string) – The score variable used to assign values in the matrix.
id_var_i (string) – The first id variable corresponding to rows in the matrix.
id_var_j (string) – The second id variable corresponding to columns in the matrix.
time_var (string) – The time variable used to split the data into different matrices.

Returns:

S_t (list of ndarray) – A list of relationship matrices, each corresponding to a different time period.
ids_t (ndarray) – Array of identifiers for each element of the matrices.

Raises:

ValueError: – If the DataFrame is missing any required columns or if there are no valid entries for any time period.

evomap.preprocessing.normalize_diss_mat(D)[source]#

Normalize a dissimilarity matrix by the maximum dissimilarity observed in the matrix.

Parameters:: D (ndarray of shape (n_samples, n_samples)) – A dissimilarity matrix.
Returns:: Normalized dissimilarity matrix.
Return type:: ndarray of shape (n_samples, n_samples)
Raises:: ValueError – If the input matrix is not square or if the maximum dissimilarity is zero.

evomap.preprocessing.normalize_diss_mats(D_ts)[source]#

Normalize a sequence of dissimilarity matrices by a common factor (the maximum dissimilarity within the sequence).

Parameters:: D_ts (list of ndarray, each of shape (n_samples, n_samples)) – Sequence of dissimilarity matrices.
Returns:: Sequence of dissimilarity matrices, normalized by the maximum dissimilarity within the input sequence.
Return type:: list of ndarray
Raises:: ValueError – If any matrix is not square or if the list is empty.

evomap.preprocessing.expand_matrices(X_ts, labels_ts)[source]#

Expand a list of similarity matrices (X_ts) to equal shape and calculate inclusion vectors.

Parameters:

X_ts (list of ndarray) – List of similarity matrices for each time point.
labels_ts (list of list) – List of labels corresponding to each matrix in X_ts.

Returns:

Contains a list of expanded similarity matrices, inclusion vectors, and all labels.

Return type:

tuple

evomap.preprocessing.calc_distances(X, metric='euclidean')[source]#

Calculate matrix of pairwise distances among the rows of an input matrix.

Parameters:

X (ndarray of shape (n_samples, n_dims)) – Input matrix containing samples for which pairwise distances will be calculated.
metric (str, optional) – The distance metric to use. Can be any of those supported by scipy.spatial.distance.pdist, such as ‘euclidean’, ‘cityblock’, ‘cosine’, etc. Defaults to ‘euclidean’.

Returns:

A matrix of pairwise distances, where each element (i, j) is the distance between the i-th and j-th rows of the input matrix X according to the specified metric.

Return type:

ndarray of shape (n_samples, n_samples)

Raises:

ValueError – If the metric specified is not supported by scipy.spatial.distance.pdist.