This document illustrates how to connect IBES and FactSet data
Identifiers within IBES
IBES carries both CUSIP
and SEDOL
identifiers. Please refer to the Linking IBES to other Refinitiv databases documentation for a detailed discussion on understanding the IBES variable CUSIP
, and how to parse out SEDOL
information from this field.
Identifiers within FactSet
FactSet is a broad database that carries many different data components, and hence it maintains several different identifiers.
At the security level, there are two main identifier systems:
- current
FSYM_ID
, and - legacy
FS_PERM_SEC_ID
For a more detailed discussion on various identifiers within the FactSet system, please refer to the FactSet CRSP Linking Overview documentation.
Connecting with FactSet
Before diving into the actual reference code, we would like to first outline the flow on how to link IBES data with the FactSet, as it will make the next section's programs easier to follow.
To connect the universe of IBES companies with FactSet data, researchers should use CUSIP as linking key for US companies, and SEDOL for international firms.
On the FactSet front:
- For the legacy identifier
FS_PERM_SEC_ID
, users can look up its corresponding historical- CUSIP using table
h_security_cusip_id
- SEDOL using table
h_security_sedol_id
- CUSIP using table
- For the current identifier
FSYM_ID
, users can look up its corresponding header- CUSIP using table
sym_cusip
- SEDOL using
sym_sedol
- CUSIP using table
Here we would like to emphasize that linking through the current identifier, FSYM_ID
, yields only the current/header CUSIP and SEDOL matching.
Lastly, as some researchers might want to compare linked entity pairs using company names as sanity check, we will also extract FactSet's company names, proper_name
, from the following locations:
- legacy
FS_PERM_SEC_ID
:- table
h_security_cusip
for linking company names via historicalCUSIP
- table
h_security_sedol
for linking company names via historicalSEDOL
- table
- current
FSYM_ID
:- table
sym_coverage
for linking company names via headerCUSIP
orSEDOL
- table
Reference SAS Code
This section a sample SAS code illustrating the logic of linking IBES identifiers with FactSet IDs following the same linking logic as outlined above. We strongly encourage users read through the code to make sure the linking logic is clear. The sample code is meant for illustration purpose only, and users should adapt the code to suite their specific research design.
Please make sure to select "SAS" in the kernel list when trying to run the code in Jupyter Notebook environment.
Jupyter Notebook: ResearchApps/LinkIBESFactSet.ipynb Download
Reference Python Code
The Python code below performs the same task as the SAS code listed above. Again, the code is meant for illustration purpose only, and researchers should adapt the code to suite their specific research design.
Jupyter Notebook: ResearchApps/ibes_factset.ipynb Download
Discussion of Linking Outcome
Here we briefly discuss the linking outcome using the above sample code. For brevity, we will focus on the linking outcome when using the Python code.
Link Rate
Using the historical CUSIP/SEDOL matching approach to link IBES ID with FactSet's FS_PERM_SEC_ID
, as outlined in Part 1 of the sample code above, we are able to link 74,882 out of the 82,995 unique IBES TICKER with a FS_PERM_SEC_ID
, roughly 90% linking rate, using the data available as of Q3 2022.
Using the header CUSIP/SEDOL approach, as outlined in Part 2 of the sample codes, we are able to achieve a slightly higher matching percentage: linking 79,299 out of the 82,995 unique IBES TICKERs with a FSYM_ID
, over 95% linking rate, using the same Q3 2022 data vintage.
Company Name Comparison
A natural sanity check on the linking quality can be performed by comparing the linked pair's names in the respective database. In SAS code, we rely on the spelling distance function, spedis
. In Python code, we utilize a common package, fuzzywuzzy
, to compare the Levenshtein distance between the two company name strings. We use the partial_ratio
function but researchers can explore other variations if they need stricter or more relaxed distance calculation.
Below is a random sample of the linked output through historical CUSIP\SEDOL approach.
ticker | cusip | oftic | cname | usfirm | fs_perm_sec_id | flag | proper_name | nameRatio |
---|---|---|---|---|---|---|---|---|
@WNB | FCB3Q73G | 601058 | SAILUN CO LTD | 0 | RN2KBN-S-CN | sedol | Sailun Jinyu Group Co., Ltd. Class A | 62 |
@4CF | LCBGSHDB | IACSA | INVERSIONES AGRI | 0 | BCKLYK-S-CL | sedol | Inversiones Agricolas y Comerciales S.A. | 100 |
@54M | FC664994 | 600435 | NNCC | 0 | SYHFWG-S-CN | sedol | North Navigation Control Technology Co., Ltd. ... | 50 |
DLPHW | G2709G10 | DLPH | DELPHI TECHNO | 1 | T3776L-S-US | cusip | Delphi Technologies Plc | 100 |
@HXE | FC680268 | 900933 | HUAXINCEM | 0 | LLWJHJ-S-CN | sedol | Huaxin Cement Co., Ltd. Class B | 89 |
Note that a high fuzzywuzzy
similarity score, nameRatio
, indicates high similarity between the paired company names, which is the opposite of the SAS string comparison function spedis
.
Using the sample output above, company IBES ticker = @4CF reports cname
as "INVERSIONES AGRI", an abbreviated version in IBES. In FactSet, the proper_name
shows up as "Inversiones Agricolas y Comerciales S.A.", a more complete version. Since we opt to use fuzzywuzzy
's partial_ratio
function, these two strings return similarity score = 100 in spite of slight difference in the actual text string. Users are encouraged to explore other similarity functions to fine tune their company name comparisons.
Generally speaking, a similarity ratio of 50 means two strings are not that close, but in the case of our comparison, the company with IBES ticker = @54M reports cname
as "NNCC" in IBES, while in FactSet, the proper_name
shows up as "North Navigation Control Technology Co., Ltd. ...". The concatenated initial of the FactSet name is exactly the IBES company name. Hence even at low similarity ratio of 50, these two are still solid match.
Usage Case
With the linking table built between IBES identifiers and FactSet identifiers, fs_perm_sec_id
or fsym_id
, users can now use these FactSet identifiers as intermediary to connect with other FactSet databases.
We illustrate below how to link IBES to FactSet's Annual Fiscal V2 and V3 data using the linking table as a bridge.
Legacy Identifier fs_perm_sec_id
The legacy identifier fs_perm_sec_id
is used among the FactSet legacy fundamental data, commonly labeled as V2.
/* Using the legacy fs_perm_sec_id */;
proc sql;
create table sample as select distinct
a.ticker, a.oftic, a.cusip, a.proper_name, a.fs_perm_sec_id,
b.date, b.ff_assets, b.ff_sales
from ibessec (where=(oftic = 'AAPL')) as a,
/* select the case for Apple */
factset.wrds_fund_af_usc as b
/* link with legacy fundamental data (v2) */
where a.fs_perm_sec_id = b.fs_perm_sec_id;
quit;
The code snippet above connects the IBES identifiers (IBES ticker, official ticker and cusip) with FactSet's legacy annual fiscal data (V2) to extract Apple's annual total assets and sales. The table below reports a partial output of the code.
TICKER | OFTIC | CUSIP | PROPER_NAME | FS_PERM_SEC_ID | DATE | FF_ASSETS | FF_SALES |
---|---|---|---|---|---|---|---|
AAPL | AAPL | 3783310 | Apple Inc. | R85KLC-S-US | 20100930 | 75183 | 65067 |
AAPL | AAPL | 3783310 | Apple Inc. | R85KLC-S-US | 20110930 | 116371 | 108598 |
AAPL | AAPL | 3783310 | Apple Inc. | R85KLC-S-US | 20120930 | 176064 | 155971 |
Current Identifier fsym_id
The current identifier fsym_id is used among the new FactSet fundamental data, commonly labeled as V3.
/* Using the current fsym_id */;
proc sql;
create table sample1 as select distinct
a.ticker, a.oftic, a.cusip, a.proper_name, a.fsym_id,
b.fsym_primary_listing_id,
c.date, c.ff_sales, c.ff_assets
from ibessym (where=(oftic = 'AAPL')) as a,
/* select the case for Apple */
factset.wrds_securities_v3 as b,
/* look up associated fsym_primary_listing_id for data extraction */
factset.wrds_fund_af_usc_v3 as c
/* actual annual fiscal data from fundamental data v3 */
where a.fsym_id = b.fsym_id
and b.fsym_primary_listing_id = c.fsym_id
;
quit;
The code snippet above is slightly more complicated than the legacy case, as it requires one additional step to look up the fsym_primary_listing_id
of a given fsym_id
in order to extract fiscal or pricing related data for the company. And the output is below.
TICKER | OFTIC | CUSIP | PROPER_NAME | FSYM_ID | FSYM_PRIMARY_LISTING_ID | DATE | FF_SALES | FF_ASSETS |
---|---|---|---|---|---|---|---|---|
AAPL | AAPL | 3783310 | Apple Inc. | R85KLC-S | MH33D6-R | 20170930 | 228572 | 375319 |
AAPL | AAPL | 3783310 | Apple Inc. | R85KLC-S | MH33D6-R | 20180930 | 265809 | 365725 |
AAPL | AAPL | 3783310 | Apple Inc. | R85KLC-S | MH33D6-R | 20190930 | 259968 | 338516 |
AAPL | AAPL | 3783310 | Apple Inc. | R85KLC-S | MH33D6-R | 20200930 | 274150 | 323888 |
AAPL | AAPL | 3783310 | Apple Inc. | R85KLC-S | MH33D6-R | 20210930 | 365817 | 351002 |