Linking IBES with FactSet

This document illustrates how to connect IBES and FactSet data

Identifiers within IBES

IBES carries both CUSIP and SEDOL identifiers. Please refer to the Linking IBES to other Refinitiv databases documentation for a detailed discussion on understanding the IBES variable CUSIP, and how to parse out SEDOL information from this field.

Top of Section

Identifiers within FactSet

FactSet is a broad database that carries many different data components, and hence it maintains several different identifiers.

At the security level, there are two main identifier systems:

  • current FSYM_ID, and
  • legacy FS_PERM_SEC_ID

For a more detailed discussion on various identifiers within the FactSet system, please refer to the FactSet CRSP Linking Overview documentation.

Top of Section

Connecting with FactSet

Before diving into the actual reference code, we would like to first outline the flow on how to link IBES data with the FactSet, as it will make the next section's programs easier to follow.

To connect the universe of IBES companies with FactSet data, researchers should use CUSIP as linking key for US companies, and SEDOL for international firms.

On the FactSet front:

  • For the legacy identifier FS_PERM_SEC_ID, users can look up its corresponding historical
    • CUSIP using table h_security_cusip_id
    • SEDOL using table h_security_sedol_id
  • For the current identifier FSYM_ID, users can look up its corresponding header
    • CUSIP using table sym_cusip
    • SEDOL using sym_sedol

Here we would like to emphasize that linking through the current identifier, FSYM_ID, yields only the current/header CUSIP and SEDOL matching.

Lastly, as some researchers might want to compare linked entity pairs using company names as sanity check, we will also extract FactSet's company names, proper_name, from the following locations:

  • legacy FS_PERM_SEC_ID:
    • table h_security_cusip for linking company names via historical CUSIP
    • table h_security_sedol for linking company names via historical SEDOL
  • current FSYM_ID:
    • table sym_coverage for linking company names via header CUSIP or SEDOL

Top of Section

Reference SAS Code

This section a sample SAS code illustrating the logic of linking IBES identifiers with FactSet IDs following the same linking logic as outlined above. We strongly encourage users read through the code to make sure the linking logic is clear. The sample code is meant for illustration purpose only, and users should adapt the code to suite their specific research design.

Please make sure to select "SAS" in the kernel list when trying to run the code in Jupyter Notebook environment.

Jupyter Notebook: ResearchApps/LinkIBESFactSet.ipynb Download

Top of Section

Reference Python Code

The Python code below performs the same task as the SAS code listed above. Again, the code is meant for illustration purpose only, and researchers should adapt the code to suite their specific research design.

Jupyter Notebook: ResearchApps/ibes_factset.ipynb Download

Top of Section

Discussion of Linking Outcome

Here we briefly discuss the linking outcome using the above sample code. For brevity, we will focus on the linking outcome when using the Python code.

Using the historical CUSIP/SEDOL matching approach to link IBES ID with FactSet's FS_PERM_SEC_ID, as outlined in Part 1 of the sample code above, we are able to link 74,882 out of the 82,995 unique IBES TICKER with a FS_PERM_SEC_ID, roughly 90% linking rate, using the data available as of Q3 2022.

Using the header CUSIP/SEDOL approach, as outlined in Part 2 of the sample codes, we are able to achieve a slightly higher matching percentage: linking 79,299 out of the 82,995 unique IBES TICKERs with a FSYM_ID, over 95% linking rate, using the same Q3 2022 data vintage.

Company Name Comparison

A natural sanity check on the linking quality can be performed by comparing the linked pair's names in the respective database. In SAS code, we rely on the spelling distance function, spedis. In Python code, we utilize a common package, fuzzywuzzy, to compare the Levenshtein distance between the two company name strings. We use the partial_ratio function but researchers can explore other variations if they need stricter or more relaxed distance calculation.

Below is a random sample of the linked output through historical CUSIP\SEDOL approach.

ticker cusip oftic cname usfirm fs_perm_sec_id flag proper_name nameRatio
@WNB FCB3Q73G 601058 SAILUN CO LTD 0 RN2KBN-S-CN sedol Sailun Jinyu Group Co., Ltd. Class A 62
@4CF LCBGSHDB IACSA INVERSIONES AGRI 0 BCKLYK-S-CL sedol Inversiones Agricolas y Comerciales S.A. 100
@54M FC664994 600435 NNCC 0 SYHFWG-S-CN sedol North Navigation Control Technology Co., Ltd. ... 50
DLPHW G2709G10 DLPH DELPHI TECHNO 1 T3776L-S-US cusip Delphi Technologies Plc 100
@HXE FC680268 900933 HUAXINCEM 0 LLWJHJ-S-CN sedol Huaxin Cement Co., Ltd. Class B 89

Note that a high fuzzywuzzy similarity score, nameRatio, indicates high similarity between the paired company names, which is the opposite of the SAS string comparison function spedis.

Using the sample output above, company IBES ticker = @4CF reports cname as "INVERSIONES AGRI", an abbreviated version in IBES. In FactSet, the proper_name shows up as "Inversiones Agricolas y Comerciales S.A.", a more complete version. Since we opt to use fuzzywuzzy's partial_ratio function, these two strings return similarity score = 100 in spite of slight difference in the actual text string. Users are encouraged to explore other similarity functions to fine tune their company name comparisons.

Generally speaking, a similarity ratio of 50 means two strings are not that close, but in the case of our comparison, the company with IBES ticker = @54M reports cname as "NNCC" in IBES, while in FactSet, the proper_name shows up as "North Navigation Control Technology Co., Ltd. ...". The concatenated initial of the FactSet name is exactly the IBES company name. Hence even at low similarity ratio of 50, these two are still solid match.

Usage Case

With the linking table built between IBES identifiers and FactSet identifiers, fs_perm_sec_id or fsym_id, users can now use these FactSet identifiers as intermediary to connect with other FactSet databases.

We illustrate below how to link IBES to FactSet's Annual Fiscal V2 and V3 data using the linking table as a bridge.

Legacy Identifier fs_perm_sec_id

The legacy identifier fs_perm_sec_id is used among the FactSet legacy fundamental data, commonly labeled as V2.

/* Using the legacy fs_perm_sec_id */;
proc sql;
 create table sample as select distinct 
 a.ticker, a.oftic, a.cusip, a.proper_name, a.fs_perm_sec_id,
 b.date, b.ff_assets, b.ff_sales
 from ibessec (where=(oftic = 'AAPL')) as a,
 /* select the case for Apple */

 factset.wrds_fund_af_usc as b
/* link with legacy fundamental data (v2) */
 where a.fs_perm_sec_id = b.fs_perm_sec_id;
quit;

The code snippet above connects the IBES identifiers (IBES ticker, official ticker and cusip) with FactSet's legacy annual fiscal data (V2) to extract Apple's annual total assets and sales. The table below reports a partial output of the code.

TICKER OFTIC CUSIP PROPER_NAME FS_PERM_SEC_ID DATE FF_ASSETS FF_SALES
AAPL AAPL 3783310 Apple Inc. R85KLC-S-US 20100930 75183 65067
AAPL AAPL 3783310 Apple Inc. R85KLC-S-US 20110930 116371 108598
AAPL AAPL 3783310 Apple Inc. R85KLC-S-US 20120930 176064 155971

Current Identifier fsym_id

The current identifier fsym_id is used among the new FactSet fundamental data, commonly labeled as V3.

/* Using the current fsym_id */;
proc sql;
 create table sample1 as select distinct 
 a.ticker, a.oftic, a.cusip, a.proper_name, a.fsym_id,
 b.fsym_primary_listing_id,
 c.date, c.ff_sales, c.ff_assets
 from ibessym (where=(oftic = 'AAPL')) as a,
 /* select the case for Apple */

 factset.wrds_securities_v3 as b,
 /* look up associated fsym_primary_listing_id for data extraction */

 factset.wrds_fund_af_usc_v3 as c
 /* actual annual fiscal data from fundamental data v3 */

 where a.fsym_id = b.fsym_id
 and b.fsym_primary_listing_id = c.fsym_id
 ;
quit;

The code snippet above is slightly more complicated than the legacy case, as it requires one additional step to look up the fsym_primary_listing_id of a given fsym_id in order to extract fiscal or pricing related data for the company. And the output is below.

TICKER OFTIC CUSIP PROPER_NAME FSYM_ID FSYM_PRIMARY_LISTING_ID DATE FF_SALES FF_ASSETS
AAPL AAPL 3783310 Apple Inc. R85KLC-S MH33D6-R 20170930 228572 375319
AAPL AAPL 3783310 Apple Inc. R85KLC-S MH33D6-R 20180930 265809 365725
AAPL AAPL 3783310 Apple Inc. R85KLC-S MH33D6-R 20190930 259968 338516
AAPL AAPL 3783310 Apple Inc. R85KLC-S MH33D6-R 20200930 274150 323888
AAPL AAPL 3783310 Apple Inc. R85KLC-S MH33D6-R 20210930 365817 351002

Top of Section

Top