This document illustrates how to link Markit RED data with CRSP data
Identifiers from RED
Markit RED (RED) is the market standard for reference data in the credit markets. RED provides a unique 6-digit identifier, redcode
, for each entity in the database. In addition, it also carries several other common entity identifiers, such as 6-digit entity_cusip
, ticker
as well as company name strings.
Researchers can use these identifiers to link the CDS data with other data sources, such as CRSP and TRACE for equity and fixed income respectively.
Connecting with CRSP
We illustrate below how to link RED data to CRSP data using the new CIZ format of CRSP data. The logic would be the the same for the legacy SIZ format of CRSP data, just with different database syntax.
The primary linking key is through the 6-digit CUSIP. We also try to establish linkage through a secondary linking key, the ticker. However, it is important to emphasize here that the linking quality through ticker is fairly poor. As a result, we include an additional layer of quality check using the string comparison between the two databases' company names.
We strongly advise our users to carefully examine the linking output, and set their own quality criteria suitable for their individual research agenda.
SAS Illustration
/* ************************************* */;
/* Create Linking Table Between */;
/* Markit RED and CRSP */;
/* Author: Qingyi (Freda) Song Drechsler */;
/* Date: December 2022 */;
/* ************************************* */;
* No need to run the library definition below
* if you are running code on SAS studio;
libname markit '/wrds/markit/sasdata/red';
libname crspv2 ('/wrds/crsp/sasdata/a_stock_v2');
* Link with 6-digit CUSIP or Ticker;
* count how many cusip records per redcode;
proc sql;
create table _cnt as select distinct
redcode, count(*) as nrecords
from markit.redent
group by redcode;
quit;
proc means data = _cnt;
var nrecords;
run;
*** Finding: each redcode has only one record;
*** so it's a header record;
* First Route - Link with 6-digit cusip;
proc sql;
create table _cdscrsp1 as select distinct
a.redcode, a.entity_cusip, a.ticker, a.referenceentity,
a.shortname, a.entity_type,
b.permno, b.hdrCusip, b.ticker as crspTicker, b.permco, b.issuernm
from markit.redent as a
left join crspv2.stksecurityinfohdr as b
on a.entity_cusip = substr(b.hdrCusip, 1,6);
quit;
* store records linked through CUSIP;
data _cdscrsp_cusip;
set _cdscrsp1;
where permno ne .;
flg ='cusip';
run;
* keep the unmatched for another route;
data _cdscrsp2;
set _cdscrsp1;
where permno = .;
drop permno hdrcusip crspticker permco issuernm;
run;
* Second Route - Link with ticker;
* This yields much worse linking result;
proc sql;
create table _cdscrsp3 as select distinct a.*,
b.permno, b.hdrCusip, b.ticker as crspTicker, b.permco, b.issuernm
from _cdscrsp2 as a
left join crspv2.stksecurityinfohdr as b
on a.ticker = b.ticker;
quit;
* store linked output through Ticker;
data _cdscrsp_ticker;
set _cdscrsp3;
where permno ne .;
flg = 'ticker';
run;
* Combine the linked results through CUSIP and Ticker;
data cdscrsp;
length flg $10.;
set _cdscrsp_cusip _cdscrsp_ticker;
* calculate spelling distance of company names from CRSP and Red;
dist = spedis(upcase(shortname), upcase(issuernm));
run;
* Organize output;
data cdscrsp;
set cdscrsp;
if flg = 'cusip' and dist <= 40 then score = 0;
if flg = 'cusip' and dist > 40 then score = 1;
if flg = 'ticker' and dist <=20 then score = 2;
if flg = 'ticker' and dist >20 then score = 3;
run;
Sample SAS Code Output
As CUSIP is a fairly reliable linking key, we set a relatively relaxed requirement for spelling distance between paired company names (a low distance number implies two strings are highly similar). likewise, as a ticker linking often yields noisy results, we impose a stricter requirement of the company name comparison. As shown in the sample output below, there are quite a few records linked through ticker that yield very different company names.
We would like to emphasize here that the code above is simply for illustration, and we highly recommend users adapt the code and the criteria to meet their own research goals.
flg | redcode | entity_cusip | ticker | shortname | PERMNO | HdrCUSIP | crspTicker | IssuerNm | dist | score |
---|---|---|---|---|---|---|---|---|---|---|
cusip | 001AAV | 00191U | ASGNINC | ASGN Inc | 77917 | 00191U10 | ASGN | ASGN INC | 0 | 0 |
cusip | 001AEC | 001957 | T | AT&T Corp. | 10401 | 00195750 | T | A T & T CORP | 25 | 0 |
cusip | 002AA6 | 002824 | ABT | Abbott Labs | 20482 | 00282410 | ABT | ABBOTT LABORATORIES | 36 | 0 |
cusip | 002AF5 | 00287Y | ABBVINC | ABBVIE INC | 13721 | 00287Y10 | ABBV | ABBVIE INC | 0 | 0 |
cusip | 002AHF | 002896 | ABFC | ABERCROMBIE & FITCH CO | 10533 | 00289610 | ABERCROMBIE & FITCH CO | 0 | 0 | |
ticker | 004CC9 | 4845 | ACOM | ACOM CO LTD | 15449 | 04639010 | ACOM | ASTROCOM CORP | 54 | 3 |
ticker | 004DC3 | 4930 | ACT | Activision Inc | 10532 | 04257310 | ACT | ARNOLD CONSTABLE CORP | 96 | 3 |
ticker | 008B49 | 8318 | ATAC | Aftermarket Tech Corp | 84291 | 00211W10 | ATAC | A T C TECHNOLOGY CORP | 57 | 3 |
ticker | 014B98 | 13817 | AA | Alcoa Inc. | 16347 | 01387210 | AA | ALCOA CORP | 40 | 3 |
Python Illustration
## ######################################### ##
## Link between Markit RED and CRSP ##
## Relies on 6-digit CUSIP & Ticker ##
## Author: Qingyi (Freda) Song Drechsler ##
## Date: December 2022 ##
## ######################################### ##
import wrds
import os
import numpy as np
import pandas as pd
# display all columns
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import pickle as pkl
from fuzzywuzzy import fuzz
###################
# Connect to WRDS #
###################
conn=wrds.Connection()
### Get red entity information
redent = conn.get_table(library='markit', table='redent')
# Quick check to confirm that it is the header information
# i.e. each redcode is mapped to only one entity
# and doesn't contain historical records
redcnt = redent.groupby(['redcode'])['entity_cusip'].count().reset_index().rename(columns={'entity_cusip':'cusipCnt'})
redcnt.cusipCnt.max()
### Each redcode has only one entity_cusip mapped
### so it is indeed only header record
### Get information from CRSP header table
crspHdr = conn.raw_sql('select permno, permco, hdrcusip, ticker, issuernm from crsp.stksecurityinfohdr')
crspHdr['cusip6'] = crspHdr.hdrcusip.str[:6]
crspHdr = crspHdr.rename(columns={'ticker': 'crspTicker'})
### First Route - Link with 6-digit cusip
_cdscrsp1 = pd.merge(redent, crspHdr, how='left', left_on='entity_cusip', right_on='cusip6')
# store linked results through CUSIP
_cdscrsp_cusip = _cdscrsp1.loc[_cdscrsp1.permno.notna()].copy()
_cdscrsp_cusip['flg'] = 'cusip'
# continue to work with non-linked records
_cdscrsp2 = _cdscrsp1.loc[_cdscrsp1.permno.isna()].copy().drop(columns=['permno','permco','hdrcusip','crspTicker','issuernm','cusip6'])
### Second Route - Link with Ticker
_cdscrsp3 = pd.merge(_cdscrsp2, crspHdr, how = 'left', left_on = 'ticker', right_on='crspTicker')
_cdscrsp_ticker = _cdscrsp3.loc[_cdscrsp3.permno.notna()].copy()
_cdscrsp_ticker['flg']='ticker'
### Consolidate Output and Company Name Distance Check
cdscrsp = _cdscrsp_cusip.append(_cdscrsp_ticker)
# Check similarity ratio of company names
crspNameLst= cdscrsp.issuernm.str.upper().tolist()
redNameLst = cdscrsp.shortname.str.upper().tolist()
len(crspNameLst), len(redNameLst)
nameRatio = [] # blank list to store fuzzy ratio
for i in range(len(redNameLst)):
ratio = fuzz.partial_ratio(redNameLst[i], crspNameLst[i])
nameRatio.append(ratio)
cdscrsp['nameRatio']=nameRatio
Sample Python Code Output
Notice that in Python with the package we used, if two strings are completely the same, the nameRatio
yields 100, the opposite of the spelling distance from the SAS version.
As in the SAS version, ticker-linked pair tends to yield quite different company names, and hence lower nameRatio
. We encourage our users to explore the code and output to fine tune the criteria.
redcode | entity_cusip | ticker | shortname | permno | hdrcusip | crspTicker | issuernm | flg | nameRatio |
---|---|---|---|---|---|---|---|---|---|
9CCC54 | 939322 | WM | WA Mut Inc | 11955 | 94106L10 | WM | WASTE MANAGEMENT INC DEL | ticker | 60 |
04A85I | 04623F | ASMS | Asteroid Merger Sub LLC | 50219 | 4573510 | ASMS | ASSOCIATED MORTGAGE INVESTORS | ticker | 48 |
237CAC | 216648 | COOCS | Cooper Cos Inc | 65541 | 21664840 | COO | COOPER COMPANIES INC | cusip | 71 |
34AIF9 | 319963 | FDC | 1st Data Corp | 77546 | 31996310 | FDC | FIRST DATA CORP | cusip | 92 |
5A835G | 55305B | MIHOM | M I HOMES INC | 79909 | 55305B10 | MHO | M I HOMES INC | cusip | 100 |
NN7749 | N07045 | ASMIF | ASM Intl N V | 11077 | N0704510 | ASMI | A S M INTERNATIONAL N V | cusip | 67 |
DG6A9H | D3372F | HK | HECKLER KOCH GMBH | 87054 | 71649510 | HK | PETROHAWK ENERGY CORP | ticker | 36 |
0F886X | 05351W | AVANGIN | Avangrid Inc | 15859 | 05351W10 | AGR | AVANGRID INC | cusip | 100 |