Quickstart#

This tutorial provides a quick overview about the different tools available in evomap.

In general, input data is expected in the form of either higher-dimensional feature vectors, or in the form of pairwise relationships.

Given such data, evomap provides a flexible set of tools to process and manipulate the data, map it to lower-dimensional space, and to evaluate and explore the resultant maps.

Background#

Last updated: September 2023

This quickstart guide is based on the following paper. If you use this package or parts of its code, please cite our work.

References

[1]Matthe, M., Ringel, D. M., Skiera, B. (2022), "Mapping Market Structure Evolution", Marketing Science, forthcoming.

Read the full paper here (open access): https://doi.org/10.1287/mksc.2022.1385

Contact: For questions or feedback, please get in touch.

Module Overview#

evomap entails the following main modules:

evomap.preprocessing: Tools for preprocessing input data.
evomap.mapping: Tools for mapping input data to lower-dimensional space.
evomap.printer: Tools for drawing and annotating maps.
evomap.metrics: Tools for evaluating maps quantitatively.

Besides, it includes a few additional module (such as evomap.datasets, which provides example datasets used for these tutorials).

Example Application#

For a high-level overview of how these modules work together, we generate a market structure map for the ‘Text-Based Network Industry’ (TNIC) data, provided by Hoberg & Philips. The original data is provided at https://hobergphillips.tuck.dartmouth.edu/. If you use these data, please cite their work.

Step 1: Loading the Relationship Data#

We use a smal subsample taken from these data. The sample is included in the evomap.datasets module.

from evomap.datasets import load_tnic_sample_small
df_tnic_sample = load_tnic_sample_small()

df_tnic_sample.head()

	year	gvkey1	gvkey2	score	name1	name2	sic1	sic2	size1	size2
0	1998	1078	1602	0.0274	ABBOTT LABORATORIES	AMGEN INC	3845	2836	74.211937	36.866437
1	1999	1078	1602	0.0352	ABBOTT LABORATORIES	AMGEN INC	3845	2836	87.854384	48.541222
2	2000	1078	1602	0.0348	ABBOTT LABORATORIES	AMGEN INC	3845	2836	70.098508	93.428689
3	2001	1078	1602	0.0218	ABBOTT LABORATORIES	AMGEN INC	3845	2836	110.299430	34.410965
4	2002	1078	1602	0.0366	ABBOTT LABORATORIES	AMGEN INC	3845	2836	40.140853	42.840198

The data consists of a time-indexed edgelist. That is, each row corresponds to a firm-pair. The ‘score’ variable captures each pair’s similarity.

To build a small subsample, we first select a handful of firms:

firms = ['APPLE INC', 'AT&T INC', 'COMCAST CORP', 'HP INC',
       'INTUIT INC', 'MICROSOFT CORP', 'ORACLE CORP', 'US CELLULAR CORP',
       'WESTERN DIGITAL CORP']

We then collect these firms’ pairwise relationships at a single point in time:

df_tnic_sample = df_tnic_sample.query('year == 2000').query('name1 in @firms').query('name2 in @firms')
df_tnic_sample.head()

	year	gvkey1	gvkey2	score	name1	name2	sic1	sic2	size1	size2
4796	2000	1690	5606	0.0314	APPLE INC	HP INC	3663	3570	60.079253	190.637477
4852	2000	1690	11399	0.0813	APPLE INC	WESTERN DIGITAL CORP	3663	3572	10.652736	15.988003
4884	2000	1690	12141	0.0930	APPLE INC	MICROSOFT CORP	3663	7372	44.120740	619.890226
4904	2000	1690	12142	0.0096	APPLE INC	ORACLE CORP	3663	7370	33.605576	79.457232
10644	2000	3226	14369	0.0143	COMCAST CORP	US CELLULAR CORP	4841	4812	40.733093	9.311580

To process these data via mapping methods, we first need to transform the edgeliste into square matrix form:

from evomap.preprocessing import edgelist2matrix
sim_mat, labels = edgelist2matrix(
    df_tnic_sample, score_var = 'score', id_var_i= 'name1', id_var_j= 'name2')

sim_mat.round(2)

array([[0.  , 0.  , 0.  , 0.03, 0.  , 0.09, 0.01, 0.  , 0.08],
       [0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.02, 0.  ],
       [0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.01, 0.  ],
       [0.03, 0.  , 0.  , 0.  , 0.  , 0.06, 0.1 , 0.  , 0.04],
       [0.  , 0.  , 0.  , 0.  , 0.  , 0.04, 0.  , 0.  , 0.  ],
       [0.09, 0.  , 0.  , 0.06, 0.04, 0.  , 0.08, 0.  , 0.06],
       [0.01, 0.  , 0.  , 0.1 , 0.  , 0.08, 0.  , 0.  , 0.02],
       [0.  , 0.02, 0.01, 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.08, 0.  , 0.  , 0.04, 0.  , 0.06, 0.02, 0.  , 0.  ]])

As a result, we obtain a symmetric matrix of pairwise similarities.

import numpy as np 
print("Smallest matrix entry: {0:.2f} \n Largest matrix entry: {1:.2f}".format(np.min(sim_mat), np.max(sim_mat)))
print("Similarity between {0} and {1}: {2:.2f}".format(labels[5], labels[6], sim_mat[5,6]))
print("Similarity between {0} and {1}: {2:.2f}".format(labels[0], labels[3], sim_mat[0,3]))

Smallest matrix entry: 0.00 
 Largest matrix entry: 0.10
Similarity between MICROSOFT CORP and ORACLE CORP: 0.08
Similarity between APPLE INC and HP INC: 0.03

Step 2: Preprocessing#

Different mapping methods require different input data. Here, the input data connsists of pairiwse similarities. We will map them to 2D space via Classic Multidimensional Scaling (CMDS). CMDS, however, requires pariwise distances. Among other features, evomap.preprocessing provides various transformations between such different types of relationship data.

One simple way to transform similarities to distances is by mirroring them:

from evomap.preprocessing import sim2diss
dist_mat = sim2diss(sim_mat, transformation= 'mirror')
print("Smallest matrix entry: {0:.2f} \n Largest matrix entry: {1:.2f}".format(np.min(dist_mat), np.max(dist_mat)))
print("Distance between {0} and {1}: {2:.2f}".format(labels[5], labels[6], dist_mat[5,6]))
print("Distance between {0} and {1}: {2:.2f}".format(labels[0], labels[3], dist_mat[0,3]))

Smallest matrix entry: 0.00 
 Largest matrix entry: 1.00
Distance between MICROSOFT CORP and ORACLE CORP: 0.92
Distance between APPLE INC and HP INC: 0.97

Step 3: Mapping relationship data to lower-dimensional space#

With all input data in the right format, you can map it to lower-dimensional space. To do so, evomap.mapping provides implementations of multiple different mapping methods.

Here, we apply (Classic) Multidimensional Scaling (aka. Principal Coordinate Analysis):

from evomap.mapping import CMDS
model = CMDS(n_dims = 2).fit(dist_mat)
map_coords = model.Y_

The resultant model output is a 2D array of shape (n_samples, 2) containing the map coordinates.

map_coords.shape

(9, 2)

Step 4: Draw the map#

To visualize the estimated map coordinates, evomap.printer provides several functions (such as draw_map()), which can create highly customizable maps.

from evomap.printer import draw_map
draw_map(X = map_coords,
        label = labels,
        fig_size= (7,7))

_images/d1d4ad849c3aa7560d1195c3de4276d0810146f3780ba80aa75a14c3d7f9c33e.png

Step 5: Evaluating maps#

Finally, evomap.metrics provides typically used metrics to evaluate the resultant maps’ goodness-of-fit (such as the hitrate of nearest neighbor recovery):

from evomap.metrics import hitrate_score 
score = hitrate_score(
    D = dist_mat, X = map_coords, n_neighbors = 3, input_format = 'dissimilarity')

print("Hitrate of 3-nearest neighbor recovery (adjusted or random agreement): {0:.2f}".format(score))

Hitrate of 3-nearest neighbor recovery (adjusted or random agreement): 0.56

Naturally, evomap becomes more useful when moving beyond such a very simple application.

For such more complex examples, check out the further examples.

Quickstart

Contents