:py:mod:`evomap.preprocessing`
==============================

.. py:module:: evomap.preprocessing

.. autoapi-nested-parse::

   Useful transformation for data pre-processing.


Module Contents
---------------


Functions
~~~~~~~~~

.. autoapisummary::

   evomap.preprocessing.diss2sim
   evomap.preprocessing.sim2diss
   evomap.preprocessing.coocc2sim
   evomap.preprocessing.edgelist2matrix
   evomap.preprocessing.edgelist2matrices
   evomap.preprocessing.normalize_diss_mat
   evomap.preprocessing.normalize_diss_mats
   evomap.preprocessing.expand_matrices
   evomap.preprocessing.calc_distances


.. py:function:: diss2sim(diss_mat, transformation='inverse', eps=0.001)

   Transform a dissimilarity matrix to a similarity matrix

   :param diss_mat: Matrix of pairwise dissimilarities.
   :type diss_mat: ndarray of shape (n_samples, n_samples)
   :param transformation: Transformation function, either 'inverse' or 'mirror', by default 'inverse'
   :type transformation: str, optional
   :param eps: Incremental constant to avoid division by zero, by default 1e-3
   :type eps: float, optional

   :returns: Matrix of pairwise similarities.
   :rtype: ndarray of shape (n_samples, n_samples)


.. py:function:: sim2diss(sim_mat, transformation='inverse', eps=0.0001)

   Transform a similarity matrix to a dissimilarity matrix.

   :param sim_mat: Matrix of pairwise similarities
   :type sim_mat: ndarray of shape (n_samples, n_samples)
   :param transformation: Transformation function, either 'inverse' or 'mirror', by default 'inverse'
   :type transformation: str, optional
   :param eps: Incremental constant to avoid division by zero, by default 1e-3
   :type eps: float, optional

   :returns: Matrix of pairwise dissimilarities.
   :rtype: ndaray of shape (n_samples, n_samples)


.. py:function:: coocc2sim(coocc_mat)

   Transform a matrix with co-occurrence counts to a similarity matrix.

   :param coocc_mat: Matrix of co-occurrence counts.
   :type coocc_mat: ndarray of shape (n_samples, n_samples)

   :returns: Matrix of pairwise similarities.
   :rtype: ndarray of shape (n_samples, n_samples)


.. py:function:: edgelist2matrix(df, score_var, id_var_i, id_var_j, time_var=None, time_selected=None)

   Transform an edgelist to a relationship matrix.

   :param df: Data containing the edgelist. Each row should include a pair. Needs to include
              two id variables and a score variable. Can also include a time variable.
   :type df: DataFrame
   :param score_var: The score variable.
   :type score_var: string
   :param id_var_i: The first id variable.
   :type id_var_i: string
   :param id_var_j: The second id variable.
   :type id_var_j: string
   :param time_var: The time variable (int), by default None
   :type time_var: string, optional
   :param time_selected: The selected time, by default None
   :type time_selected: int, optional

   :returns: * **S** (*ndarray of shape (n_samples, n_samples)*) -- A matrix of pairwise relationships.
             * **ids** (*ndarray of shape (n_samles, )*) -- Identifiers for each element of the matrix.


.. py:function:: edgelist2matrices(df, score_var, id_var_i, id_var_j, time_var)

   Transform a time-indexed edgelist to a sequence of relationship matrices.

   :param df: Data containing the edgelist. Each row should include a pair. Needs to include
              two id variables, a score variable, and a time variable.
   :type df: DataFrame
   :param score_var: The score variable.
   :type score_var: string
   :param id_var_i: The first id variable.
   :type id_var_i: string
   :param id_var_j: The second id variable.
   :type id_var_j: string
   :param time_var: The time variable (int)
   :type time_var: string

   :returns: * **S_t** (*list of ndarrays of shape (n_samples, n_samples) with length (n_periods)*) -- A sequence of relationship matrices.
             * **ids_t** (*ndarray of shape (n_samles, )*) -- Identifiers for each element of the matrix.


.. py:function:: normalize_diss_mat(D)


.. py:function:: normalize_diss_mats(D_ts)

   Normalize a sequence of dissimilarity matrices by a common factor
   (the max. dissimilarity within the sequence).

   :param D_ts: Sequence of dissimilarity matrices.
   :type D_ts: list of ndarrays, each of shape (n_samples, n_samples)

   :returns: **D_ts** -- Sequence of dissimilarity matrices, normalized by the maximum dissimilarity within
             the input sequence.
   :rtype: ndarray of shape (n_samples, n_samples)


.. py:function:: expand_matrices(Xts, names_t)

   Exand list of similarity matrices to equal shape and calculate inclusion vectors.

   Args:

   :returns: list of similarity matrices (equal size), list of inclusion vectors (0/1) and list of all labels.
   :rtype: (list, list, list)


.. py:function:: calc_distances(X, metric='euclidean')

   Caluclate matrix of pairwise distances among the rows of an input
   matrix.

   :param X: Input matrix.
   :type X: ndarray of shape (n_samples, n_dims)
   :param metric: The distance metric to use. Can be any of
                  'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation',
                  'cosine', 'dice', 'euclidean', 'hamming', 'jaccard', 'jensenshannon',
                  'kulsinski', 'kulczynski1', 'mahalanobis', 'matching', 'minkowski',
                  'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener',
                  'sokalsneath', 'sqeuclidean', 'yule'.

                  Returns:
                          ndarray of shape (n_samples, n_samples): Matrix of pairwise distances.
   :type metric: string