Using Python on WRDS Platform

-- Connection and Basic Function

  • Qingyi (Freda) Song Drechsler, Ph.D.
  • August 2020

0. Setup WRDS Python API

  • To be done before running this notebook
  • https://pypi.org/project/wrds/
  • run "pip install wrds"
    • Mac: in terminal
    • PC: in Anaconda prompt
  • Spyder or Jupyter

1. Import pre installed WRDS package

In [ ]:
import wrds

2. Establish connection with WRDS server

  • log in using your WRDS username and password
  • set up a pgpass file to store the info
In [ ]:
conn = wrds.Connection()

3. List all libraries

  • "Library" refers to databases on WRDS: e.g. CRSP, Compustat
  • list_libraries() function to explore all subscribed databases
In [ ]:
conn.list_libraries().sort()
type(conn.list_libraries())

4. List all datasets within a given library

  • databases contain many sub datasets
  • list_tables() function to list all datasets
  • specify which "library/database"
In [ ]:
conn.list_tables(library='comp')

5. Query Data from WRDS Server

  • get_table() method
  • straightforward if getting data from one single dataset
  • specify which library/database and table/dataset to "get"
  • can slice data by:
    • number of rows
    • column names
In [ ]:
# Extract first 5 obs from comp.company

company = conn.get_table(library='comp', table='company', obs=5)
company.shape
In [ ]:
company
In [ ]:
# Narrow down the specific columns to extract

company_narrow = conn.get_table(library='comp', table='company', 
                                columns = ['conm', 'gvkey', 'cik'], obs=5)
company_narrow.shape
In [ ]:
company_narrow

6. Subsetting Dataset

  • raw_sql() method
  • when "conditioning" is needed
  • familiar SQL syntax
  • can pre-specify date column format
In [ ]:
# Select one stock's monthly price
# from 2019 onwards

apple = conn.raw_sql("""select permno, date, prc, ret, shrout 
                        from crsp.msf 
                        where permno = 14593
                        and date>='01/01/2019'""", 
                     date_cols=['date'])

apple 
In [ ]:
apple.dtypes

7. Join Multiple Datasets

  • again raw_sql() method
  • synatx similar to "proc sql" in SAS
  • handle conditioning statement
In [ ]:
apple_fund = conn.raw_sql("""select a.gvkey, a.iid, a.datadate, a.tic, a.conm,
                            a.at, b.prccm, b.cshoq 
                            
                            from comp.funda a 
                            inner join comp.secm b 
                            
                            on a.gvkey = b.gvkey
                            and a.iid = b.iid
                            and a.datadate = b.datadate
                        
                            where a.tic = 'AAPL' 
                            and a.datadate>='01/01/2010'
                            and a.datafmt = 'STD' 
                            and a.consol = 'C' 
                            and a.indfmt = 'INDL'
                            """, date_cols=['datadate'])

apple_fund.shape
In [ ]:
apple_fund 

8. Saving Your Output

  • Pandas support flexible output format
  • pickle for further python work
  • csv or excel
  • even SAS data format!
In [ ]:
import pandas as pd

# pickle the dataframe

apple_fund.to_pickle("/your local directory/apple_fund.pkl")
In [ ]:
# export the dataframe to csv format

apple_fund.to_csv('/your local directory/apple_fund.csv')

# export the dataframe to xlsx format

apple_fund.to_excel('/your local directory/apple_fund.xlsx')
In [ ]:
# export the dataframe to dta format for STATA

apple_fund.to_stata('/your local directory/apple_fund.dta')

The End

Thank you!