evomap.mapping._tsne#

T-Distributed Stochastic Neighborhood Embedding, as propsoed in:

Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(11).

Attributes#

Classes#

Functions#

_calc_p_matrix(X, included, input_type, perplexity)

Calculate the probability matrix (P-matrix) for t-SNE.

_check_prepare_tsne(model, X)

Check and prepare the input data for t-SNE.

sqeuclidean_dist(Y)

Calculate the pairwise squared Euclidean distance matrix.

calc_q_matrix(Y, inclusions)

Calculate the Q-matrix of joint probabilities in low-dimensional space.

_kl_divergence(Y, P[, compute_error, compute_grad])

Calculate the KL-divergence between high-dimensional and low-dimensional joint probabilities.

_kl_divergence_grad(Y, P, Q, dist)

Calculate the gradient of the KL-divergence with respect to the low-dimensional coordinates.

cond_to_joint(P)

Convert a conditional probability matrix to a symmetric joint probability matrix.

Module Contents#

evomap.mapping._tsne.EPSILON = 1e-12#
class evomap.mapping._tsne.TSNE(n_dims=2, perplexity=15, n_iter=2000, stop_lying_iter=250, early_exaggeration=4, initial_momentum=0.5, final_momentum=0.8, eta='auto', n_iter_check=50, init=None, verbose=0, input_type='distance', max_halves=5, tol=0.001, n_inits=1, step_size=1)[source]#
n_dims = 2#
perplexity = 15#
n_iter = 2000#
stop_lying_iter = 250#
early_exaggeration = 4#
initial_momentum = 0.5#
final_momentum = 0.8#
eta = 'auto'#
n_iter_check = 50#
init = None#
verbose = 0#
input_type = 'distance'#
max_halves = 5#
tol = 0.001#
n_inits = 1#
step_size = 1#
method_str = 'TSNE'#
__str__()[source]#

Return a string representation of the TSNE instance with key parameters and user-modified values.

fit(X)[source]#

Fit the TSNE model to the input data, without returning the transformed coordinates.

Parameters:

X (np.array of shape (n_samples, n_features) or (n_samples, n_samples)) – The input data. If input_type is ‘vector’, X should be the feature vectors of the samples. If input_type is ‘distance’, X should be the pairwise distance matrix.

Returns:

self – Returns the instance of the TSNE class with the configuration matrix Y_ stored as an attribute.

Return type:

object

fit_transform(X)[source]#

Fit the TSNE model and return the transformed coordinates.

Parameters:

X (np.array of shape (n_samples, n_features) or (n_samples, n_samples)) – The input data. If input_type is ‘vector’, X should be the feature vectors of the samples. If input_type is ‘distance’, X should be the pairwise distance matrix.

Returns:

The transformed coordinates in the reduced-dimensional space.

Return type:

np.array of shape (n_samples, n_dims)

Raises:

ValueError – If the input_type is not ‘distance’ or ‘vector’.

evomap.mapping._tsne._calc_p_matrix(X, included, input_type, perplexity)[source]#

Calculate the probability matrix (P-matrix) for t-SNE.

The P-matrix is a joint probability distribution over pairwise similarities. Depending on the input type, this function calculates the matrix from either feature vectors, distance matrices, or similarity matrices. It also handles cases where certain rows are excluded from the calculation.

Parameters:
  • X (ndarray of shape (n_samples, n_features) or (n_samples, n_samples)) – The input data, which can either be feature vectors (if input_type is ‘vector’), a distance matrix (if input_type is ‘distance’), or a similarity matrix (if input_type is ‘similarity’).

  • included (ndarray of shape (n_samples,), optional) – A binary array (0/1) indicating whether each sample should be included in the P-matrix calculation. If None, all samples are included by default.

  • input_type (str) – Specifies the type of input. Should be one of {‘vector’, ‘distance’, ‘similarity’}.

  • perplexity (float) – The desired perplexity, used for tuning the distribution of the P-matrix. Perplexity determines the effective number of neighbors for each point.

Returns:

P – The joint probability distribution matrix over pairwise similarities.

Return type:

ndarray of shape (n_samples, n_samples)

Raises:
  • AssertionError – If any rows of the input matrix contain only zeros, which would indicate invalid data for the calculation.

  • ValueError – If the input type is not recognized or if the P-matrix contains invalid values.

evomap.mapping._tsne._check_prepare_tsne(model, X)[source]#

Check and prepare the input data for t-SNE.

This function validates and prepares the input data for t-SNE by calculating the appropriate learning rate (eta) and generating the P-matrix based on the input data.

Parameters:
  • model (TSNE or EvoMap) – The t-SNE or EvoMap model instance. The function will check and set the learning rate (eta) and other parameters from this model.

  • X (ndarray of shape (n_samples, n_features) or (n_samples, n_samples)) – The input data, which can be feature vectors or a distance matrix.

Returns:

P – The prepared P-matrix representing pairwise similarities.

Return type:

ndarray of shape (n_samples, n_samples)

Raises:

ValueError – If the learning rate (eta) is invalid.

evomap.mapping._tsne.sqeuclidean_dist(Y)[source]#

Calculate the pairwise squared Euclidean distance matrix.

Parameters:

Y (np.ndarray of shape (n_samples, n_dims)) – The coordinates of the points in the low-dimensional space.

Returns:

D – The squared Euclidean distance matrix.

Return type:

np.ndarray of shape (n_samples, n_samples)

evomap.mapping._tsne.calc_q_matrix(Y, inclusions)[source]#

Calculate the Q-matrix of joint probabilities in low-dimensional space.

The Q-matrix represents the joint probabilities in the low-dimensional space based on the pairwise Euclidean distances between points. A small constant is added to avoid division by zero. The method also allows excluding certain points from the calculation.

Parameters:
  • Y (np.ndarray of shape (n_samples, n_dims)) – Array of map coordinates in the low-dimensional space.

  • inclusions (np.ndarray of shape (n_samples,), optional) – A binary array where 1 indicates the point is included and 0 indicates the point is excluded from the probability matrix. If None, all points are included by default.

Returns:

  • Q (np.ndarray of shape (n_samples, n_samples)) – The joint probability matrix in the low-dimensional space.

  • dist (np.ndarray of shape (n_samples, n_samples)) – The squared Euclidean distance matrix used to compute Q.

evomap.mapping._tsne._kl_divergence(Y, P, compute_error=True, compute_grad=True)[source]#

Calculate the KL-divergence between high-dimensional and low-dimensional joint probabilities.

This function computes the Kullback-Leibler (KL) divergence between the joint probability distribution in the high-dimensional space (P) and the low-dimensional space (Q). Optionally, it also computes the gradient of the KL-divergence with respect to the low-dimensional coordinates.

Parameters:
  • Y (np.ndarray of shape (n_samples, n_dims)) – Array of map coordinates in the low-dimensional space.

  • P (np.ndarray of shape (n_samples, n_samples)) – The joint probability matrix in the high-dimensional space.

  • compute_error (bool, optional) – Whether to compute the KL-divergence value, by default True.

  • compute_grad (bool, optional) – Whether to compute the gradient of the KL-divergence, by default True.

Returns:

  • error (float or None) – The KL-divergence value, or None if compute_error is False.

  • grad (np.ndarray or None) – The gradient of the KL-divergence, or None if compute_grad is False.

evomap.mapping._tsne._kl_divergence_grad(Y, P, Q, dist)[source]#

Calculate the gradient of the KL-divergence with respect to the low-dimensional coordinates.

Parameters:
  • Y (np.ndarray of shape (n_samples, n_dims)) – Array of map coordinates in the low-dimensional space.

  • P (np.ndarray of shape (n_samples, n_samples)) – The joint probability matrix in the high-dimensional space.

  • Q (np.ndarray of shape (n_samples, n_samples)) – The joint probability matrix in the low-dimensional space.

  • dist (np.ndarray of shape (n_samples, n_samples)) – The squared Euclidean distance matrix in the low-dimensional space.

Returns:

dY – The gradient of the KL-divergence with respect to the map coordinates.

Return type:

np.ndarray of shape (n_samples, n_dims)

evomap.mapping._tsne.cond_to_joint(P)[source]#

Convert a conditional probability matrix to a symmetric joint probability matrix.

This function takes an asymmetric conditional probability matrix (P) and converts it into a symmetric joint probability matrix by averaging the pairwise probabilities and normalizing the result.

Parameters:

P (np.ndarray of shape (n_samples, n_samples)) – The conditional probability matrix.

Returns:

P – The symmetric joint probability matrix.

Return type:

np.ndarray of shape (n_samples, n_samples)