evomap.mapping._tsne
====================

.. py:module:: evomap.mapping._tsne

.. autoapi-nested-parse::

   T-Distributed Stochastic Neighborhood Embedding, as propsoed in:

   Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(11).


Attributes
----------

.. autoapisummary::

   evomap.mapping._tsne.EPSILON


Classes
-------

.. autoapisummary::

   evomap.mapping._tsne.TSNE


Functions
---------

.. autoapisummary::

   evomap.mapping._tsne._calc_p_matrix
   evomap.mapping._tsne._check_prepare_tsne
   evomap.mapping._tsne.sqeuclidean_dist
   evomap.mapping._tsne.calc_q_matrix
   evomap.mapping._tsne._kl_divergence
   evomap.mapping._tsne._kl_divergence_grad
   evomap.mapping._tsne.cond_to_joint


Module Contents
---------------

.. py:data:: EPSILON
   :value: 1e-12


.. py:class:: TSNE(n_dims=2, perplexity=15, n_iter=2000, stop_lying_iter=250, early_exaggeration=4, initial_momentum=0.5, final_momentum=0.8, eta='auto', n_iter_check=50, init=None, verbose=0, input_type='distance', max_halves=5, tol=0.001, n_inits=1, step_size=1)

   .. py:attribute:: n_dims
      :value: 2


   .. py:attribute:: perplexity
      :value: 15


   .. py:attribute:: n_iter
      :value: 2000


   .. py:attribute:: stop_lying_iter
      :value: 250


   .. py:attribute:: early_exaggeration
      :value: 4


   .. py:attribute:: initial_momentum
      :value: 0.5


   .. py:attribute:: final_momentum
      :value: 0.8


   .. py:attribute:: eta
      :value: 'auto'


   .. py:attribute:: n_iter_check
      :value: 50


   .. py:attribute:: init
      :value: None


   .. py:attribute:: verbose
      :value: 0


   .. py:attribute:: input_type
      :value: 'distance'


   .. py:attribute:: max_halves
      :value: 5


   .. py:attribute:: tol
      :value: 0.001


   .. py:attribute:: n_inits
      :value: 1


   .. py:attribute:: step_size
      :value: 1


   .. py:attribute:: method_str
      :value: 'TSNE'


   .. py:method:: __str__()

      Return a string representation of the TSNE instance with key parameters and user-modified values.


   .. py:method:: fit(X)

      Fit the TSNE model to the input data, without returning the transformed coordinates.

      :param X: The input data. If `input_type` is 'vector', `X` should be the feature
                vectors of the samples. If `input_type` is 'distance', `X` should be
                the pairwise distance matrix.
      :type X: np.array of shape (n_samples, n_features) or (n_samples, n_samples)

      :returns: **self** -- Returns the instance of the TSNE class with the configuration matrix
                `Y_` stored as an attribute.
      :rtype: object


   .. py:method:: fit_transform(X)

      Fit the TSNE model and return the transformed coordinates.

      :param X: The input data. If `input_type` is 'vector', `X` should be the feature
                vectors of the samples. If `input_type` is 'distance', `X` should be
                the pairwise distance matrix.
      :type X: np.array of shape (n_samples, n_features) or (n_samples, n_samples)

      :returns: The transformed coordinates in the reduced-dimensional space.
      :rtype: np.array of shape (n_samples, n_dims)

      :raises ValueError: If the `input_type` is not 'distance' or 'vector'.


.. py:function:: _calc_p_matrix(X, included, input_type, perplexity)

   Calculate the probability matrix (P-matrix) for t-SNE.

   The P-matrix is a joint probability distribution over pairwise similarities.
   Depending on the input type, this function calculates the matrix from either
   feature vectors, distance matrices, or similarity matrices. It also handles
   cases where certain rows are excluded from the calculation.

   :param X: The input data, which can either be feature vectors (if `input_type` is 'vector'),
             a distance matrix (if `input_type` is 'distance'), or a similarity matrix
             (if `input_type` is 'similarity').
   :type X: ndarray of shape (n_samples, n_features) or (n_samples, n_samples)
   :param included: A binary array (0/1) indicating whether each sample should be included in the
                    P-matrix calculation. If None, all samples are included by default.
   :type included: ndarray of shape (n_samples,), optional
   :param input_type: Specifies the type of input. Should be one of {'vector', 'distance', 'similarity'}.
   :type input_type: str
   :param perplexity: The desired perplexity, used for tuning the distribution of the P-matrix.
                      Perplexity determines the effective number of neighbors for each point.
   :type perplexity: float

   :returns: **P** -- The joint probability distribution matrix over pairwise similarities.
   :rtype: ndarray of shape (n_samples, n_samples)

   :raises AssertionError: If any rows of the input matrix contain only zeros, which would indicate
       invalid data for the calculation.
   :raises ValueError: If the input type is not recognized or if the P-matrix contains invalid values.


.. py:function:: _check_prepare_tsne(model, X)

   Check and prepare the input data for t-SNE.

   This function validates and prepares the input data for t-SNE by calculating the
   appropriate learning rate (`eta`) and generating the P-matrix based on the input data.

   :param model: The t-SNE or EvoMap model instance. The function will check and set the
                 learning rate (`eta`) and other parameters from this model.
   :type model: TSNE or EvoMap
   :param X: The input data, which can be feature vectors or a distance matrix.
   :type X: ndarray of shape (n_samples, n_features) or (n_samples, n_samples)

   :returns: **P** -- The prepared P-matrix representing pairwise similarities.
   :rtype: ndarray of shape (n_samples, n_samples)

   :raises ValueError: If the learning rate (`eta`) is invalid.


.. py:function:: sqeuclidean_dist(Y)

   Calculate the pairwise squared Euclidean distance matrix.

   :param Y: The coordinates of the points in the low-dimensional space.
   :type Y: np.ndarray of shape (n_samples, n_dims)

   :returns: **D** -- The squared Euclidean distance matrix.
   :rtype: np.ndarray of shape (n_samples, n_samples)


.. py:function:: calc_q_matrix(Y, inclusions)

   Calculate the Q-matrix of joint probabilities in low-dimensional space.

   The Q-matrix represents the joint probabilities in the low-dimensional
   space based on the pairwise Euclidean distances between points. A small
   constant is added to avoid division by zero. The method also allows
   excluding certain points from the calculation.

   :param Y: Array of map coordinates in the low-dimensional space.
   :type Y: np.ndarray of shape (n_samples, n_dims)
   :param inclusions: A binary array where 1 indicates the point is included and 0 indicates
                      the point is excluded from the probability matrix. If None, all points
                      are included by default.
   :type inclusions: np.ndarray of shape (n_samples,), optional

   :returns: * **Q** (*np.ndarray of shape (n_samples, n_samples)*) -- The joint probability matrix in the low-dimensional space.
             * **dist** (*np.ndarray of shape (n_samples, n_samples)*) -- The squared Euclidean distance matrix used to compute Q.


.. py:function:: _kl_divergence(Y, P, compute_error=True, compute_grad=True)

   Calculate the KL-divergence between high-dimensional and low-dimensional joint probabilities.

   This function computes the Kullback-Leibler (KL) divergence between the
   joint probability distribution in the high-dimensional space (P) and
   the low-dimensional space (Q). Optionally, it also computes the gradient
   of the KL-divergence with respect to the low-dimensional coordinates.

   :param Y: Array of map coordinates in the low-dimensional space.
   :type Y: np.ndarray of shape (n_samples, n_dims)
   :param P: The joint probability matrix in the high-dimensional space.
   :type P: np.ndarray of shape (n_samples, n_samples)
   :param compute_error: Whether to compute the KL-divergence value, by default True.
   :type compute_error: bool, optional
   :param compute_grad: Whether to compute the gradient of the KL-divergence, by default True.
   :type compute_grad: bool, optional

   :returns: * **error** (*float or None*) -- The KL-divergence value, or None if `compute_error` is False.
             * **grad** (*np.ndarray or None*) -- The gradient of the KL-divergence, or None if `compute_grad` is False.


.. py:function:: _kl_divergence_grad(Y, P, Q, dist)

   Calculate the gradient of the KL-divergence with respect to the low-dimensional coordinates.

   :param Y: Array of map coordinates in the low-dimensional space.
   :type Y: np.ndarray of shape (n_samples, n_dims)
   :param P: The joint probability matrix in the high-dimensional space.
   :type P: np.ndarray of shape (n_samples, n_samples)
   :param Q: The joint probability matrix in the low-dimensional space.
   :type Q: np.ndarray of shape (n_samples, n_samples)
   :param dist: The squared Euclidean distance matrix in the low-dimensional space.
   :type dist: np.ndarray of shape (n_samples, n_samples)

   :returns: **dY** -- The gradient of the KL-divergence with respect to the map coordinates.
   :rtype: np.ndarray of shape (n_samples, n_dims)


.. py:function:: cond_to_joint(P)

   Convert a conditional probability matrix to a symmetric joint probability matrix.

   This function takes an asymmetric conditional probability matrix (P) and
   converts it into a symmetric joint probability matrix by averaging the
   pairwise probabilities and normalizing the result.

   :param P: The conditional probability matrix.
   :type P: np.ndarray of shape (n_samples, n_samples)

   :returns: **P** -- The symmetric joint probability matrix.
   :rtype: np.ndarray of shape (n_samples, n_samples)