Dynamic Mapping#

In many cases, data for creating maps are retrievable in regular intervals or available historically. In such cases, dynamic mapping can uncover additional insights beyond what static maps can offer.

Dynamic mapping, however, is more challenging than simply creating a (static) map at each point in time (see [1]). The dynamic mapping framework EvoMap provides one solution to this problem. EvoMap integrates various static mapping methods and allows the analyst to use them for creating dynamic market maps. Thereby, it provides the analyst with a high degree of control over its output (e.g., by letting the analyst set certain constraints on the resultant maps, such as the degree of smoothing).

For methodological background and a detailed description, see the original paper [1].

This tutorial demonstrates how to use EvoMap to create dynamic market maps from time-evolving relationship data. The tutorial covers data preparation, running the method, evaluating its results, tuning its hyperparameter and several useful functions provided as part of this package (e.g., for drawing dynamic market maps).


  1. Data Preparation

  2. Running the Method

  3. Output Exploration

  4. Quantitative Evaluation

  5. Hyperparameter Tuning

  6. Chosing Different Mapping Methods

  7. Special Cases and Extensions


Last updated: September 2022

This tutorial is based on the following paper. If you use this package or parts of its code, please cite our work.


[1]Matthe, M., Ringel, D. M., Skiera, B. (2022), "Mapping Market Structure Evolution", Marketing Science, forthcoming.

Read the full paper here (open access): https://doi.org/10.1287/mksc.2022.1385

Contact: In case of questions, problems or for feedback, please get in touch.

Data Preparation#

First, load all required imports for this tutorial and set the seed to ensure reproducibility.

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns


For this tutorial, we use a subsample of the ‘Text-Based Network Industry Classification’ data also used in [1]. The original data is provided at https://hobergphillips.tuck.dartmouth.edu/. The data consists of a time-indexed edgelist, that is, firm-firm relationships.

For more background on TNIC data, see:

[2] Hoberg, G & Phillips, G. (2016), "Text-Based Network Industries and Endogenous Product Differentiation.", Journal of Political Economy 124 (5), 1423-1465.

If you intend to use these data, make sure to cite these authors’ original work!

We augment the TNIC edgelist with further firm information. Moreover, we limit our analysis to a subsample of firms (i.e., those which are present throughout the whole observation period of 20 years). Our augmented data includes the following variables:





Fiscal year



Firm identifier



Similarity score



Firm name



Firm size (synthetic variable, correlated with market capitalization)



SIC Code


In the resultant dataset, each row corresponds to a single firm-firm pair at a specific point in time. Thus, each firm variable appears twice in each row (once for each firm). We provide this dataset in the datasets module.

from evomap.datasets import load_tnic_sample_small
df_sample = load_tnic_sample_small()
year gvkey1 gvkey2 score name1 name2 sic1 sic2 size1 size2
0 1998 1078 1602 0.0274 ABBOTT LABORATORIES AMGEN INC 3845 2836 74.211937 36.866437
1 1999 1078 1602 0.0352 ABBOTT LABORATORIES AMGEN INC 3845 2836 87.854384 48.541222
2 2000 1078 1602 0.0348 ABBOTT LABORATORIES AMGEN INC 3845 2836 70.098508 93.428689
3 2001 1078 1602 0.0218 ABBOTT LABORATORIES AMGEN INC 3845 2836 110.299430 34.410965
4 2002 1078 1602 0.0366 ABBOTT LABORATORIES AMGEN INC 3845 2836 40.140853 42.840198

The original sample includes more than 1,000 different firms. To simplify our demonstration (and accelerate the runtime of this notebook), we pick a smaller subsample from these data focused on the biggest firms (by market value) and their competitors.

print("The subsample includes {0} firms and {1} years".format(df_sample.name1.nunique(), df_sample.year.nunique()))
The subsample includes 273 firms and 20 years

We need to turn this edgelist into a sequence of relationship matrices. Specifically, the expected input for EvoMap is a list of numpy ndarrays, each containing the pairwise relationships among all firms at a specific point in time.

Important: Make sure that each matrix has the same size and that its rows & columns are ordered consistently! If you do not observe certain firms at some points in time (e.g., due to entry or exit), you should provide inclusion vectors (covered later).

evomap.preprocessing provides helpful functions for the required transformations:

from evomap.preprocessing import edgelist2matrix, sim2dist

def build_distance_matrices(df):
    """Build a sequence of distance matrices from the tnic edgelist.

    As the tnic data consists of pairwise similarities, we apply the 'sim2dist' 
    function to each similarity matrix to yield distance matrices.

    df : pd.DataFrame
        Tnic data in edgelist format. 

    Ds, names, years
        Ds: list of distance matrices, each as a numpy ndarray
        names: array of firm labels
        years: array o year labels
    years = df['year'].sort_values().unique()
    Ds = []
    for year in years:
        S_t, names = edgelist2matrix(
            df = df,
            score_var = 'score',
            id_var_i = 'name1',
            id_var_j = 'name2',
            time_var = 'year',
            time_selected = year)
        D_t = sim2dist(S_t, transformation = 'inverse')
    return Ds, names, years

Ds, names, years = build_distance_matrices(df_sample)
n_samples = Ds[0].shape[0]
n_periods = len(Ds)

Running EvoMap#

EvoMap is part of the evomap.mapping module.

Before we run the method, we first fix a random starting initialization. While you do not need to provide a starting initialization explicitly, doing so is helpful when comparing the derived maps to solutions by other approaches.

Y_init = np.random.normal(0.0,1.0,size=(n_samples,2))
Y_inits = [Y_init]*n_periods

Then, choose a value for the hyperparamter alpha, initialize the model accordingly, and fit it to the data. We will cover ways how to identify appropriate values for EvoMap’s hyperparameters later

from evomap.mapping.evomap import EvoTSNE
model = EvoTSNE(
    verbose = 2,
    alpha = 0.001,
    p = 1)

Ys = model.fit_transform(Ds)
[EvoTSNE] Initialization 1/1
[EvoTSNE] Gradient descent with Momentum: 0.5
[EvoTSNE] Iteration 50 -- Cost: 234.07 -- Gradient Norm: 0.1705
[EvoTSNE] Iteration 100 -- Cost: 228.16 -- Gradient Norm: 0.0807
[EvoTSNE] Iteration 150 -- Cost: 227.12 -- Gradient Norm: 0.0448
[EvoTSNE] Iteration 200 -- Cost: 226.96 -- Gradient Norm: 0.0522
[EvoTSNE] Iteration 250 -- Cost: 226.94 -- Gradient Norm: 0.0697
[EvoTSNE] Gradient descent with Momentum: 0.8
[EvoTSNE] Iteration 300 -- Cost: 17.11 -- Gradient Norm: 0.0053
[EvoTSNE] Iteration 350 -- Cost: 16.87 -- Gradient Norm: 0.0025
[EvoTSNE] Iteration 388: gradient norm vanished.

The output is a list of map coordinates, each stored in a ndarray of shape (n_samples, d), where d is typically = 2.

This package provides multiple tools to explore these results. Here, we look at the first 4 maps as a sequence. We cover more advanced alternatives for exploration in Section 3: Exploring Model Output.

from evomap.printer import draw_map_sequence

Exploring Model Output#

This package provides three ways to explore EvoMap’s output:

  1. Draw a static map

  2. Draw a sequence of static maps

  3. Draw a dynamic map (i.e., an overlay of subsequent maps)

All necessary functions are included in the “printer” module and provide lots of flexibility to adjust their aesthetics.

Draw a Static Map#

To draw a single static map, simply use the corresponding function draw_map().

from evomap.printer import draw_map


Aesthetics of the map can easily be adjusted via additional arguments. For instance, if class labels are available (e.g., obtained via clustering or additional metadata), they can be added as colors. Here, we can use SIC codes for coloring:

sic_codes = pd.DataFrame({'name1' : names}).merge(df_sample[['name1', 'sic1']], on = 'name1', how = 'left').drop_duplicates('name1')['sic1'].map(lambda x: str(x)[:1]).values
draw_map(Ys[0], c = sic_codes)
sizes = pd.DataFrame({'name1' : names}).merge(df_sample[['name1', 'size1']], on = 'name1', how = 'left').drop_duplicates('name1')['size1'].values
draw_map(Ys[0], c = sic_codes, size = sizes)

You can further annotate the map, using clusters

draw_map(Ys[0], c = sic_codes, size = sizes, annotate = 'clusters')
c:\Users\MPMATTHE\Anaconda3\envs\evomap\lib\site-packages\numpy\core\fromnumeric.py:3579: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.
  return std(axis=axis, dtype=dtype, out=out, ddof=ddof, **kwargs)

or labels. Note that you can also use additional keyword arguments to adjust the plot and its labels further.

    c = sic_codes, 
    size = sizes,
    annotate = 'labels', 
    labels = names, 
    highlight_labels = ['APPLE INC', 'ADOBE INC'],
    fig_size = (10,10),
    fontdict = {'size': 8},
    scatter_kws = {'s' : 20})

Draw a Sequence of Maps#

Rather than drawing a single map, you can also draw a sequence via draw_map_sequence().

The function takes a list of ndarrays (each containing the map coordinates for one period) and creates a map for each of them:

from evomap.printer import draw_map_sequence
draw_map_sequence(Ys[:4], n_cols = 4, time_labels = years)

Map aesthetics can be adjusted analogously to drawing a static map. To do so, simply provide arguments of draw_map() as keyword arguments to draw_map_sequence():

draw_map_sequence(Ys[:4], time_labels = years, 
                  c = sic_codes, labels = names, highlight_labels = ['APPLE INC', 'INTUIT INC'])

Draw a Dynamic Map#

The third option - and often the most interesting one - is to explore all periods jointly via a dynamic map (i.e., an overlay of multiple subsequent maps). To do so, use draw_dynamic_map() as follows:

from evomap.printer import draw_dynamic_map

draw_dynamic_map() also provides some options to reveal the individual trajectories of each firm.

Thereby, all arguments for draw_map() can also be passed to draw_dynamic_map() as keyword arguments to control the resultant map’s aesthetics. Naturally, the arguments of both functions can also be combined arbitrarily (for instance, highlighting labels for same objects, while highlighting the trajectories of others):

draw_dynamic_map(Ys, show_arrows = True, show_last_positions_only= True, highlight_trajectories = ['WALMART INC', 'AT&T INC'],
                 labels = names, highlight_labels = ['PFIZER INC', 'MORGAN STANLEY'])

Besides drawing the full map, you can also focus on individual firms and their trajectories. To do so, the dedicated function


is available.

Besides map coordinates and labels (e.g., firm names), this function expects you to provide a list (or array) of focus firms, for which the trajectories should be displayed:

from evomap.printer import draw_trajectories
focus_firms = [
    'APPLE INC']    
draw_trajectories(Ys, labels = names, selected_labels = focus_firms)

To ease interpretation, you can add annotations for all periods:

draw_trajectories(Ys, labels = names, selected_labels = focus_firms, title_str = "Selected Trajectories", period_labels = years)

Evaluating Model Output#

How “good” are these maps (e.g., how well do they fit the input data? How well do they reveal underyling changes?)?

To answer this question, the following metrics are available. All functions are located within the ‘metrics’ module. For more background on these metrics, see [1].




Computed for


Nearest neighbor recovery (in %)


Single Map

Adjusted Hitrate

Hitrate, adjusted for random agreement


Single Map

Avg. Hitrate

Avgerage Hitrate


Sequence of Maps

Avg. Adjusted Hitrate

Adjusted Hitrate, averaged across subsequent maps


Sequence of Maps


Average Distance of subsequent map positions


Sequence of Maps


Cosine similarity of subsequent map positions


Sequence of Maps


Autocorrelation coeff. of first differences
of objects’ subsequent map positions


Sequence of Maps

from evomap.metrics import avg_hitrate_score, misalign_score, persistence_score
df_eval = pd.DataFrame({
    'Avg Hitrate': avg_hitrate_score(Ys,Ds, input_type = 'distance'),
    'Misalignment': misalign_score(Ys),
    'Persistence': persistence_score(Ys)
}, index = ['EvoMap'])

Avg Hitrate Misalignment Persistence
EvoMap 0.645311 0.01496 0.571098

To benchmark these values, we apply t-SNE independently to each distance matrix.

evomap.mapping also includes the respective static variant for each method:

from evomap.mapping import TSNE
tsne_model = TSNE(init = Y_init)

Ys_indep = []
for t in range(n_periods):
    tsne_results = tsne_model.fit_transform(Ds[t])
df_eval = pd.concat((df_eval, pd.DataFrame({
    'Avg Hitrate': avg_hitrate_score(Ys_indep, Ds, input_type = 'distance'),
    'Misalignment': misalign_score(Ys_indep),
    'Persistence': persistence_score(Ys_indep)
}, index = ['Indep. TSNE'])), axis = 0)
Avg Hitrate Misalignment Persistence
EvoMap 0.645311 0.014960 0.571098
Indep. TSNE 0.659579 0.308769 -0.440110
df_eval.T.plot(kind = 'bar')

Hyperparameter Selection:#

The metrics introduced in the last section are well suited to tune EvoMap’s hyperparameters.

Specifically, there are two hyperparameters one needs to set when applying EvoMap:

  • alpha (float): Controls the degree of alignment

  • p (int): Controls the degree of smoothing

‘Good’ values for these parameters naturally depend on the given input data. Therefore, one should always test multiple values for them and compare their results visually and quantitatively (e.g., via visual inspection together with the metrics introduced in the previous section).

To make such comparisons as easy as possible, EvoMap features a grid_search() function. Given some input data and a grid of parameter values, this function creates a map sequence for each parameter combination and summarizes the results.

First, define the parameter grid as a Python dictionary. We use a rather narrow one here for demonstration purpose (and to speed up runtime). We recommend, however, to always start with a broad range and narrow it down later.

param_grid = {
    'alpha': [0.00001, 0.0001, 0.001], 
    'p': [1,2]}

Next, define which metrics should be used to evaluate each combination from the grid:

metrics = [misalign_score, persistence_score, avg_hitrate_score]
metric_labels = ['Misalignment', 'Persistence', 'Hitrate']

Then, initialize the model and start the grid search (available for each implementation of EvoMap):

model = EvoTSNE(verbose = 2, init = Y_inits)

df_grid_results = model.grid_search(
    Xs = Ds, 
    param_grid = param_grid, 
    eval_functions =  metrics,
    eval_labels = metric_labels)
[EvoTSNE] Evaluating parameter grid..
[EvoTSNE] .. evaluating parameter combination: {'alpha': 1e-05, 'p': 1}
[EvoTSNE] .. evaluating parameter combination: {'alpha': 1e-05, 'p': 2}
[EvoTSNE] .. evaluating parameter combination: {'alpha': 0.0001, 'p': 1}
[EvoTSNE] .. evaluating parameter combination: {'alpha': 0.0001, 'p': 2}
[EvoTSNE] .. evaluating parameter combination: {'alpha': 0.001, 'p': 1}
[EvoTSNE] .. evaluating parameter combination: {'alpha': 0.001, 'p': 2}
[EvoTSNE] Grid Search Completed.
alpha p cost_static_avg Misalignment Persistence Hitrate
0 0.00001 1 0.797206 0.074205 -0.214033 0.658114
1 0.00001 2 0.799421 0.058934 0.005587 0.656630
2 0.00010 1 0.805572 0.037886 0.109157 0.655568
3 0.00010 2 0.815860 0.029201 0.538365 0.650385
4 0.00100 1 0.831836 0.014704 0.558148 0.641777
5 0.00100 2 0.834073 0.011675 0.831560 0.640055
fig, ax = plt.subplots(1,3, figsize = (20,7))

sns.barplot(x = 'alpha', y = 'cost_static_avg', hue = 'p', data = df_grid_results, ax = ax[0])
sns.barplot(x = 'alpha', y = 'Misalignment', hue = 'p', data = df_grid_results, ax = ax[1])
sns.barplot(x = 'alpha', y = 'Persistence', hue = 'p', data = df_grid_results, ax = ax[2])
<AxesSubplot:xlabel='alpha', ylabel='Persistence'>

Based on this evaluation, you can select suitable parameter combinations and inspect them further visually.

Here, alpha = 0.001 and p = 2 seem reasonable, as this combination decreases misalignment, increases persistence, but does not increase static cost substantially. To use them, either create a new model instance or use the set_params() function to override the parameters of an existing model instance.

Y_t = model.set_params({'alpha': 0.001, 'p': 2}).fit_transform(Ds)
[EvoTSNE] Initialization 1/1
[EvoTSNE] Gradient descent with Momentum: 0.5
[EvoTSNE] Iteration 50 -- Cost: 235.02 -- Gradient Norm: 0.2881
[EvoTSNE] Iteration 100 -- Cost: 229.98 -- Gradient Norm: 0.2061
[EvoTSNE] Iteration 150 -- Cost: 228.48 -- Gradient Norm: 0.1105
[EvoTSNE] Iteration 200 -- Cost: 227.98 -- Gradient Norm: 0.0354
[EvoTSNE] Iteration 250 -- Cost: 227.79 -- Gradient Norm: 0.0557
[EvoTSNE] Gradient descent with Momentum: 0.8
[EvoTSNE] Iteration 300 -- Cost: 17.85 -- Gradient Norm: 0.1069
[EvoTSNE] Iteration 350 -- Cost: 17.33 -- Gradient Norm: 0.0118
[EvoTSNE] Iteration 400 -- Cost: 17.20 -- Gradient Norm: 0.0077
[EvoTSNE] Iteration 450 -- Cost: 17.14 -- Gradient Norm: 0.0054
[EvoTSNE] Iteration 500 -- Cost: 17.11 -- Gradient Norm: 0.0049
[EvoTSNE] Iteration 550 -- Cost: 17.09 -- Gradient Norm: 0.0047
[EvoTSNE] Iteration 600 -- Cost: 17.02 -- Gradient Norm: 0.0071
[EvoTSNE] Iteration 650 -- Cost: 17.00 -- Gradient Norm: 0.0043
[EvoTSNE] Iteration 700 -- Cost: 16.99 -- Gradient Norm: 0.0035
[EvoTSNE] Iteration 750 -- Cost: 16.95 -- Gradient Norm: 0.0031
[EvoTSNE] Iteration 800 -- Cost: 16.95 -- Gradient Norm: 0.0019
[EvoTSNE] Iteration 850 -- Cost: 16.94 -- Gradient Norm: 0.0018
[EvoTSNE] Iteration 900 -- Cost: 16.93 -- Gradient Norm: 0.0026
[EvoTSNE] Iteration 950 -- Cost: 16.89 -- Gradient Norm: 0.0037
[EvoTSNE] Iteration 995: gradient norm vanished.
draw_dynamic_map(Y_t, c = sic_codes)
focus_firms = [
    'APPLE INC']

draw_trajectories(Y_t, labels = names, selected_labels = focus_firms, 
                  title_str = "Selected Trajectories (after tuning)", period_labels = years)

As with any unsupervised learning technique, tuning these maps is both science and art. As there typically is no ground-truth known, it is impossible to objectively identify a single best solution. Instead, the grid results should serve as a starting point to identify suitable values, from which you should always compare different solutions.

Further note that the grid was set relatively small to make this tutorial computationally inexpensive. In practical applications, it’s always advisable to test a more extensive grid to faithfully identify suitable hyperparameter values.

Choosing a Different Mapping Method#

Thus far, this tutorial only used a single mapping method (t-SNE). While t-SNE works particularly well for large datasets, it has certain properties that make it less suitable for smaller datasets (e.g., map positions are estimated based on nearest neighborhood probabilities, rather than the actual input distances).

In such (and other) cases, Multidimensional Scaling (MDS) and its variants might be the preferred choice. Therefore, we also provide an implementation of EvoMap for MDS: EvoMDS()

For this demonstration, let’s first pick a smaller sample:

sample_firms = [
    'AT&T INC', 
    'EBAY INC', 
    'INTUIT INC', 
    'APPLE INC'] 

df_sample = df_sample.query('name1 in @sample_firms').query('name2 in @sample_firms')

Ds, names, years = build_distance_matrices(df_sample)
n_samples = Ds[0].shape[0]
n_periods = len(Ds)

Note that MDS does not transform the input distances in any way, but rather tries to fit map distances to them as closely as possible. Therefore, always make sure that your input distances are on a reasonable scale. If input distances are very large, for instance, gradient norms can quickly explode and it can become challenging to reach a good solution. Normalizing the input distances to a smaller range can help to avoid such cases and does not affect the resultant maps up to scaling their coordinate system. Make sure, however, to normalize each distance matrix in the sequence by the same factor! You can do so via the preprocessing module.

from evomap.preprocessing import normalize_dist_mats
Ds_norm = normalize_dist_mats(Ds)

Running EvoMap for MDS follows the same syntax as running it for t-SNE. Note, however, that some arguments for both classes differ, as they use different optimization routines.

Note: MDS optimizes a different static cost function than t-SNE (Stress, rather than KL Divergence). As the output of these functions are scaled differently, ‘good’ values for the hyperparamters can (and will) differ. Thus, make sure to run the grid search for MDS separately.

from evomap.mapping import EvoMDS

param_grid = {
    'alpha': [0.1,1,10], 
    'p': [1,2]}

model_MDS = EvoMDS()

df_res = model_MDS.grid_search(
    Xs = Ds_norm, 
    param_grid = param_grid, 
    eval_functions =  metrics,
    eval_labels = metric_labels)
[EvoMDS] Iteration 107: gradient norm vanished.
[EvoMDS] Iteration 24: gradient norm vanished.
[EvoMDS] Iteration 96: gradient norm vanished.
[EvoMDS] Diverging gradient norm at iteration 71
[EvoMDS] Adjusting step sizes..
[EvoMDS] Iteration 497: gradient norm vanished.
[EvoMDS] Diverging gradient norm at iteration 22
[EvoMDS] Adjusting step sizes..
[EvoMDS] Diverging gradient norm at iteration 74
[EvoMDS] Adjusting step sizes..
[EvoMDS] Iteration 672: gradient norm vanished.
[EvoMDS] Diverging gradient norm at iteration 11
[EvoMDS] Adjusting step sizes..
[EvoMDS] Diverging gradient norm at iteration 13
[EvoMDS] Adjusting step sizes..
[EvoMDS] Diverging gradient norm at iteration 19
[EvoMDS] Adjusting step sizes..
[EvoMDS] Diverging gradient norm at iteration 40
[EvoMDS] Adjusting step sizes..
[EvoMDS] Iteration 1999: gradient norm vanished.

For (very) high values of alpha, lower step sizes are required to ensure convergence. Otherwise, the very large temporal gradient can explode and the optimization diverges. EvoMap implements a set of controls trying to avoid such behavior. For instance, it automatically adjusts (i.e., lowers) step sizes if the gradient norm starts to diverge. In most cases, these controls will suffice to find good solutions. If not, consider lowering alpha or decrease the step sizes manually.

fig, ax = plt.subplots(1,3, figsize = (20,7))

sns.barplot(x = 'alpha', y = 'cost_static_avg', hue = 'p', data = df_res, ax = ax[0])
sns.barplot(x = 'alpha', y = 'Misalignment', hue = 'p', data = df_res, ax = ax[1])
sns.barplot(x = 'alpha', y = 'Persistence', hue = 'p', data = df_res, ax = ax[2])
<AxesSubplot:xlabel='alpha', ylabel='Persistence'>

From this graph, alpha = 1 and p = 2 seem reasonable. For higher values of alpha, static cost rises substantially.

model_MDS.set_params({'alpha': 1, 'p': 2})
Ys_MDS = model_MDS.fit_transform(Ds_norm)
[EvoMDS] Diverging gradient norm at iteration 71
[EvoMDS] Adjusting step sizes..
[EvoMDS] Iteration 423: gradient norm vanished.
draw_dynamic_map(Ys_MDS, show_arrows = True)
draw_trajectories(Ys_MDS, labels = names, title_str = "Selected Trajectories (MDS; after tuning)", period_labels = years)

Special Cases, Extensions and Troubleshooting#