This document illustrates how to link IBES to other Refinitiv databases
Identifiers within IBES
IBES is arguably the most widely used estimates data on WRDS platform. It covers the forecasts and actual earnings of both US and International companies. IBES carries two identifier tables:
ibes.idfrom the detail history package, and
ibes.idsumfrom the summary history package
These two tables contain the common identifiers used for companies, such as IBES Ticker, official ticker, CUSIP/SEDOL and company names. They can be found on WRDS server at:
While these two tables are very similar, there's slight difference in the coverage. Based on the 2022/09 data, there are 82,995 unique IBES Tickers in the
ibes.id table, compared to 73,920 unique IBES Tickers in the
ibes.idsum table. If we further analyze the overlapping coverage, there are 354 unique IBES Tickers that are only in
ibes.idsum but not in
ibes.id, compared to 9,429 unique IBES Tickers that are only in
ibes.id but not in
Therefore, for the purpose of this demonstration, we will be using
ibes.id table but users are welcome to explore using
ibes.idsum if they deem that approach more appropriate for their research task.
The table below shows sample records of the identifier columns from the
Parsing out CUSIP and SEDOL
Researchers can find in this table the IBES unique identifier, TICKER, as well as other common identifiers such as CUSIP, Official TICKER and company names. Here is an important caveat that researchers should be aware of: according to the IBES documentation, the variable CUSIP in the
ibes.id table is actually dubbed as CUSIP or SEDOL:
" For US companies, the value that appears in the CUSIP/SEDOL field will always be a CUSIP. SEDOLs are used for non-US companies."
Therefore, we can rely on this CUSIP column in the
ibes.id table to extract CUSIP information for US companies, and SEDOL information for international firms. Furthermore, IBES explains how one should parse out the SEDOL information from the the CUSIP column in
"The SEDOL field consists of a country code followed by the first six digits of the official SEDOL (the last digit of a SEDOL is a check digit and is not used by I/B/E/S)."
Hence, we can parse out SEDOL information by taking digit 3 through 8 of this particular column. Below is an example of how SEDOL is parsed out from the CUSIP column:
Alternatively, users may consider taking advantage of the COUNTRY column included in the
ibes.idsum table for the pre-parsed two digit country code when slicing out the SEDOL information from the CUSIP column.
Connecting with Refinitiv
To connect the universe of IBES companies with other Refinitiv data, researchers should use CUSIP as the linking key for US companies and SEDOL for international firms.
On the Refintiv front, users can rely on the
vw_securitymasterx table to look up company identifier information such as CUSIP and SEDOL. For a detailed discussion of the Refinitiv master tables and how to link within the Refinitiv data system, please refer to the Linking within Refinitiv documentation.
More on Parsing Out SEDOL
Before diving into the actual reference code, there is yet another caveat of IBES's CUSIP column from
ibes.id table that researchers should be aware of:
If we read in between the lines of the statement defining this variable listed above, while US companies always report CUSIP in this particular column, it is not necessarily the case that all international companies report SEDOL.
As a matter of fact, it is quite common for international companies (USFIRM = 0) to report CUSIP instead of SEDOL in this column (see table below for several examples). Hence, one can not naively rely on the dummy variable USFIRM to determine if the content of the CUSIP column is indeed CUSIP or SEDOL.
|A1G1||G0259M10||ALD||ALLIED GOLD PLC||0|
|AAC1||01855R10||AAC||ALLIANCE COM 'B'||0|
Given this data irregularity, when linking IBES with the rest of Refinitiv, we will opt to first treat all records in the CUSIP column as CUSIP: in other words, we first link all entities using the CUSIP column presuming it is indeed CUSIP. The logic is, given the fact that CUSIP is 8-digit long and a combination of letters and numbers, it is highly unlikely that a SEDOL with leading country code from
ibes.id table could accidentally be mapped with a valid CUSIP from Refinitiv's
Once all the records mapped with presumed CUSIP are taken out, we can then parse the remaining records following the SEDOL convention, assuming they meet the condition of having leading two-digit country code.
Reference SAS Code
Below is SAS code to illustrate the linking logic described above. It is meant as a reference for researchers to start their own linking procedure. Please feel free to modify the code to suite your research design.
/* *********************************************** */ /* Create Linking Table Between IBES and Refinitiv */ /* Relies on CUSIP and SEDOL identifiers */ /* IBES ID Table: id */ /* Refinitiv ID Table: vw_securitymasterx */ /* Author: Qingyi (Freda) Song Drechsler */ /* Date: October 2022 */ /* *********************************************** */ /* Link US Companies by CUSIP */; * Link through CUSIP; proc sql; create table out1 as select distinct a.*, b.*, b.cusip as RefCUSIP from ibes.id as a left join trcommon.vw_securitymasterx as b on a.cusip = b.cusip having a.cusip ne ''; quit; * Store records linked through CUSIP and flag accordingly; data ibestfn1; set out1; where seccode ne .; linkflag = 'CUSIP'; run; * Keep unlinked records; data out1; set out1; where seccode = .; keep ticker cusip oftic cname dilfac pdi ccopcf mscip uai usfirm sdates; run; * Link through PrevCUSIP; proc sql; create table out2 as select distinct a.*, b.*, b.cusip as RefCUSIP from out1 as a left join trcommon.vw_securitymasterx as b on a.cusip = b.prevcusip having a.cusip ne ''; quit; * Store records linked through PrevCUSIP and flag accordingly; data ibestfn2; set out2; where seccode ne .; linkflag = 'PrevCUSIP'; run; * Keep unlinked records; data out2; set out2; where seccode = .; keep ticker cusip oftic cname dilfac pdi ccopcf mscip uai usfirm sdates; run; /* Link with SEDOL */; * Parse SEDOL from the CUSIP column; * Records need to have two digit leading country code; * And SEDOL is the next six digits; data out2; set out2; if anyalpha(substr(CUSIP, 1, 2)) and anyalpha(substr(CUSIP, 2, 3)) then SEDOL = substr(CUSIP, 3); else SEDOL = ''; run; * Link through SEDOL; proc sql; create table out3 as select distinct a.*, b.*, b.cusip as RefCUSIP from out2 as a left join trcommon.vw_securitymasterx as b on a.SEDOL = b.SEDOL having a.SEDOL ne ''; quit; * Store records linked through SEDOL and flag accordingly; data ibestfn3; set out3; where seccode ne .; linkflag ='SEDOL'; run; * Keep unlinked records; data out3; set out3; where seccode = .; keep ticker cusip oftic cname dilfac pdi ccopcf mscip uai usfirm sdates SEDOL; run; * Link through PrevSEDOL; proc sql; create table out4 as select distinct a.*, b.*, b.cusip as RefCUSIP from out3 as a left join trcommon.vw_securitymasterx as b on a.SEDOL = b.PrevSEDOL having a.SEDOL ne ''; quit; * Store records linked through PrevSEDOL and flag accordingly; data ibestfn4; set out4; where seccode ne .; linkflag ='PrevSEDOL'; run; * Keep unlinked records; data out4; set out4; where seccode = .; keep ticker cusip oftic cname dilfac pdi ccopcf mscip uai usfirm sdates SEDOL; run; * Link through Sedol2 ; proc sql; create table out5 as select distinct a.*, b.*, b.cusip as RefCUSIP from out4 as a left join trcommon.vw_securitymasterx as b on a.SEDOL = b.SEDOL2 having a.SEDOL ne ''; quit; data ibestfn5; set out5; where seccode ne .; linkflag ='SEDOL2'; run; data out5; set out5; where seccode = .; keep ticker cusip oftic cname dilfac pdi ccopcf mscip uai usfirm sdates SEDOL; run; * Link through PrevSEDOL2; proc sql; create table out6 as select distinct a.*, b.*, b.cusip as RefCUSIP from out5 as a left join trcommon.vw_securitymasterx as b on a.SEDOL = b.prevSEDOL2 having a.SEDOL ne ''; quit; data ibestfn6; set out6; where seccode ne .; linkflag ='PrevSEDOL2'; run; /* Consolidate Output */; data ibestfn; length LinkFlag $12.; set ibestfn1 ibestfn2 ibestfn3 ibestfn4 ibestfn5 ibestfn6; label RefCUSIP = 'Refinitiv CUSIP'; run;
Discussion of Linking Outcome
After consolidating the linked outputs using CUSIP and SEDOL, we examine the linking output here and also provide a discussion on additional quality improvement.
/* Analyze Linking Results */; proc sql; create table idlnk as select distinct ticker from ibestfn; quit; proc sql; create table idtot as select distinct ticker from ibes.id; quit;
As both IBES and Refinitiv provide company names in the respective identifier table, we also calculate the spelling distance in SAS comparing the two linked entity names. One can calculate similarity measure in Python or R.
data ibestfn; set ibestfn; dist = spedis(upcase(CNAME), upcase(NAME)); run;
Below are some sample outputs showing different levels of similarity of linked entity names.
|CUSIP||0||87482X10||TLMR||TALMER BANCORP||1||11014856||...||TALMER BANCORP INC COM||28|
|CUSIP||11||G0702112||HELI||CHC GRP||1||11114286||...||CHC GROUP LTD SHS NEW||100|
|CUSIP||001C||4022120||ARGS||ARGOS||1||11009913||...||ARGOS THERAPEUTICS INC COM NEW||250|
|CUSIP||001K||45780R10||IBP||IBP||1||11129271||...||INSTALLED BLDG PRODS INC COM||416|
The case of company with official ticker TLMR yields relatively short spelling distance between the two versions of company names,
dist = 28, as both IBES and Refinitv spell out the two key components of the company name: "TALMER" and "BANCORP".
The three other cases listed above all yield quite large spelling distance at a first glance, but with closer examination of the company names, one can be fairly sure that they are indeed the same companies. The high value of spelling distance is caused by adoption of abbreviation in one but not the other, and listing the full company name versus only the initial of each word.
Historical versus Header
While IBES provides historical CUSIP/SEDOL with corresponding date ranges, as indicated by time periods in between consecutive SDATES, such data depth is not available in Refinitiv's
vw_securitymasterx table. In this linking exercise, we are agnostic to the date ranges, nor do we differentiate strictly between header and historical CUSIP/SEDOL. As long as one IBES's CUSIP/SEDOL finds an exact match in Refinitiv, whether it's the current or previous CUSIP/SEDOL, we deem this a successfully linked pair.
Of course, for more rigorous linking exercise, users may want to consider the additional dimension of date ranges provided by IBES data. We leave this to our users' own devices.
With the linking table built between IBES identifiers and Refinitiv identifier,
SecCode, users can now use
SecCode as intermediary to connect with other Refinitiv databases.
We illustrate below how to link IBES to Refinitiv's DataStream data using the linking table as a bridge.
/* Utilizing the Linked IDs */; proc sql; create table sample as select distinct a.ticker, a.oftic, a.cusip, a.seccode, a.name, b.vencode, c.DsCode, c.DsSecCode, c.DsSecName from ibestfn (where=(oftic = 'AAPL')) as a, /* select the case for Apple */ trcommon.vw_securitymappingx (where=(VenType=33)) as b, /* VenType = 33 is DataStream in Refinitiv */ tfn.wrds_ds_names as c where a.seccode = b.seccode and b.vencode = c.infocode; quit;
In this code,
- we extract the IBES identifiers, TICKER, OFTIC, CUSIP as well as the linked
SecCodeand company name from the linking table for the case of Apple;
- then join with Refinitiv's
vw_securitymappingxtable to look up DataStream specific
VenType= 33 (for details on VenTypes and their corresponding databases, please refer to the Refinitiv documentation here);
- lastly, link with DataStream specific tables, in this example,
wrds_ds_names, to extract DataStream identifiers,
DsSecCode, and etc.
The code snippet above generates the following output, containing identifier information from IBES, Refinitiv master file as well as DataStream.
|AAPL||AAPL||3783310||6027||APPLE INC COM||72990||992816||29765||APPLE|
Researchers can then use these DataStream identifiers to continue with data work with various DataStream data components.