Introduction and Setup

Using Python to access WRDS data on the WRDS Cloud

Introduction to Python at WRDS

Python is a widely-used high-level programming language that is both powerful and easy to use, and is proving to a major player in large-scale data analytics applications.

WRDS provides a direct interface for Python access, alowing native querying of WRDS data right within your Python program. All WRDS data is stored in a PostgreSQL database, and is available through Python through our custom python module, wrds.

Here is an example of a simple query against the Dow Jones Averages & Total Return Indexes using the wrds python module:

import wrds
db = wrds.Connection()
db.raw_sql('SELECT date,dji FROM djones.djdaily')

Full usage and many examples are given throughout this section for using Python on the WRDS Cloud.

Alternatively, if you are interested in using Python on your workstation, via Jupyter or Spyder, to access WRDS data remotely, please reference Accessing WRDS Remotely via Python.

PostgreSQL vs SAS

In the past, WRDS data was only available as SAS flat files. In this format, these files were accessible to non-SAS clients (such as Python) only via a JDBC interface. While this generally worked well, it proved problematic for memory-intensive programs, and proved impossible for extremely large datasets such as TAQ. 

WRDS has now made all of our data additionally available as a series of PostgreSQL databases, which has significantly opened up research options for other, non-SAS programming languages such as Python, R, MATLAB, and Stata. Java and JDBC drivers are no longer required for Python connectivity, as a native Python wrds module exists to facilitate the connection. Compared to the previous JDBC connection method, the native method is faster, more robust, and capable of handling far larger queries.

Please let us know what you think of the new connection method, and how you're using it in your research!

Python 3 vs Python 2

The WRDS Cloud supports both Python 3 and Python 2; you may use your preference when working with WRDS data. This applies both to working with interactive and batch jobs on the WRDS Cloud.

However, while both are supported, WRDS always recommends the latest version. If you have a choice in your programming, you should use Python 3. If you are just starting out in Python, you should absolutely use Python 3.

All examples in the document are given in Python 3 code.

Across the board at WRDS, Python 3 may be accessed as python3 and Python 2 may be accessed as python2.

Top of Section

Initial Setup

The first step to connectiing to WRDS data from within Python on the WRDS Cloud is setting up your .pgpass file in your WRDS Cloud home directory. This step only needs to be done once.

The .pgpass file includes your WRDS username and password so that you do not need to enter them each time you wish to connect to WRDS within Python.

To create your .pgpass file, first SSH to the WRDS Cloud as described in Using SSH to Connect to the WRDS Cloud, and create a new file named .pgpass directly in your WRDS Cloud home directory (you can use any editor to do this, such as nano or vi).

Your .pgpass file should contain the following:

wrds-pgdata.wharton.upenn.edu:9737:wrds:your_username:your_password

Where 'your_username' is your WRDS Username and 'your_password' is your WRDS Password.

Next, secure your .pgpass file with the following command:

chmod 600 ~/.pgpass

Failure to secure the .pgpass file in the above manner will cause connection attempts to fail with a warning message indicating the need to secure this file.

Example of .pgpass File Creation

Here is an example showing an SSH session to the WRDS Cloud, and the creation of the .pgpass file:

my_laptop$ ssh joe@wrds-cloud.wharton.upenn.edu
[joe@wrds-cloud-login2-h ~]$
[joe@wrds-cloud-login2-h ~]$ nano .pgpass
[joe@wrds-cloud-login2-h ~]$ cat .pgpass
wrds-pgdata.wharton.upenn.edu:9737:wrds:joe:mypassword
[joe@wrds-cloud-login2-h ~]$ chmod 600 ~/.pgpass
[joe@wrds-cloud-login2-h ~]$ ls -l .pgpass
-rw------- 1 joe wharton 60 Jan 1  2018 .pgpass

Top of Section

Next Steps

Now that your .pgpass file is created and secured, you won't need to follow these steps again.

Please proceed to the next section to learn how to write and submit Python jobs to the WRDS Cloud.

Next section: Submitting Python Programs.

Top of Section

Top