# MATLAB Programming

MATLAB is a high-level language and interactive environment for numerical computation, visualization, and programming. Using MATLAB, WRDS subscribers can analyze data, develop algorithms, and create models and applications. Currently, MATLAB is only available on the WRDS Cloud, which is freely accessible to all subscribers of the WRDS platform. This allows current users to take advantage of MATLAB with any datasets they can access. The NYSE TAQ dataset is the sole exception to this, as its size poses a particular challenge in MATLAB.

## Before You Begin

Before using MATLAB, WRDS recommends reading the following documentation in order to become familiar with the WRDS Cloud and MATLAB.

### MATLAB Software Requirements

In order to run MATLAB, make sure you have the following:

• WRDS-compatible installation of MATLAB: Releases 2013a, 2013b, 2014a and 2014b have been tested against the WRDS infrastructure.
• An installation of the MATLAB Database Toolbox.
• An up-to-date MATLAB software license.

The Database Toolbox is the required MATLAB component for connecting to WRDS, and provides more advanced database connectivity methods than a standard MATLAB installation. The Database Toolbox may have been included with your version of MATLAB; otherwise you will need to license it separately from MathWorks. To check if the Database Toolbox is installed on your machine, in the MATLAB command window enter ver and look for "Database Toolbox" in the list.

To begin using MATLAB on WRDS, prepare your environment by first setting up the machine to properly connect to WRDS.

Your connection to WRDS through the MATLAB Database Toolbox requires loading a pair of Java drivers. These drivers provide a JDBC interface between the MATLAB Database Toolbox and the SAS/SHARE on WRDS Cloud. These .jar files are provided by SAS and are compatible with Windows and MacOS machines. They are available as ZIP files and can be downloaded directly here.

Unzip file after the download has completed and place the resulting two Java driver files in a safe location on your computer's hard drive. WRDS recommends a dedicated subfolder within the Documents folder in your home directory.

Do not change the names of the drivers; MATLAB requires these exact filenames. Remember where you have placed these drivers, as you will need the full path in the following steps.

For MacOS users, the home directory filepath looks like:

For PC users, the following filepath will work:

### Creating a Connection Script

After installing the drivers, you must create a connection script containing information required to connect to WRDS. Creating this script allows users to connect in the future with a single command. This file is created within MATLAB.

• Using the file browser pane on the left side of the MATLAB interface, select a directory to store the connection script. Connection scripts are commonly stored in a user's home directory. To change it to another location, use cd command.
• Enter edit WRDS_Connect in the Command window.
• Create the file when prompted by the application.
• Enter the following code in the new script window, making sure to replace "username" with your WRDS username. Make sure to include the quoted empty string as the first argument before the username. If you are running MATLAB on a Windows-compatible machine, use Windows-style paths (i.e. C:\path\to\sas.core.jar).

clear java;
WRDS = database('','username','password','com.sas.net.sharenet.ShareNetDriver','jdbc:sharenet://wrds-cloud.wharton.upenn.edu:8551/');
• Select "Save" from the menu bar, then close the Edit window.

Note: In the code above, clear java; resets the Java path declaration for the two drivers, permitting them to be set again without warning messages if WRDS_Connect is run multiple times per MATLAB session.

After creating the connection script, your WRDS password needs to be encrypted as the connection script file is stored as plaintext on the system. This step requires SAS and connecting to the WRDS Cloud via SSH using the command line interface (CLI). Refer to the "Introduction to the WRDS Cloud" document for additional information on using the WRDS Cloud.

To create the connection, enter the following command in the CLI, replacing username with your username on WRDS:

After you have successfully logged into the WRDS Cloud, enter the following commands to launch SAS and encode your password.

1. Begin an interactive grid job with the command qrsh.
2. Enter sas -nodms to start SAS without a graphical display (since this is a command-line session) after being assigned a job node.
4. At the end of the output, you will receive the encrypted version of your pass word. It will look similar to this: {SAS002}DBCC5712369DE1C65B19864C1564FB850F398DCF
5. Copy the entire string—including the {SAS002} part—and quit SAS using the endsas; command.

Not encrypting your password is a security risk, and WRDS does not recommend this practice. If you choose to keep your password unencrypted, users can replace 'password' in the connection script with their plaintext password. This will still work, but your password will be insecure.

### Testing & Troubleshooting Your Connection

Test your connection to ensure that it properly works by entering WRDS_Connect in the Command window. If the prompt returns with no errors and displays a variable called WRDS in the upper-right hand "Workspace" window, your connection has been properly set up. You are now connected to WRDS via MATLAB.

If you are unsuccessful, check the path you provided to the two .JAR files. Additionally, check your username and password. were properly entered. To edit your connection script, repeat the steps in "Creating the WRDS Connection Script".

## Working in MATLAB

### Connecting to WRDS

When you connect to WRDS via WRDS_Connect, the script generates a variable that represents that connection, WRDS. You may use this variable anywhere you would normally connect to a database, and easily run queries against it.

After connecting, view all datasets available on WRDS via MATLAB by entering schemas(WRDS).

Enter the following code to view all the databases within a selected dataset. In the sample code below, dataset is the dataset you have chosen to work with from Step 1:

my_databases = exec(WRDS,['select distinct memname from dictionary.columns where libname="dataset"']);
result = fetch(my_databases);
result.data

To determine the available variables in your selected database, enter the follwoing code, replacing dataset with the selected dataset (e.g. CRSP), and database for the specific database (e.g. DSF). The variables are returned in the order they appear in the database columns starting from the left, and moving to the right:

my_variables = exec(WRDS,['select name from dictionary.columns where libname="dataset" and memname="database"']);
result = fetch(my_variables);
result.data

### Querying Datasets

After choosing a dataset, database, and variables, you're ready to construct queries. Queries in MATLAB are constructed using SAS SQL, which is functionally identical to other types of SQL—except for the most advanced commands. In SQL query statements, databases are formatted as dataset.database; for example, the Daily Stock File from CRSP would be listed as "CRSP.DSF". For more on SAS SQL, consult the SAS SQL documentation.

The examples below show how to construct queries in MATLAB.

#### Gather All Data from A Specific Variable Value

query = exec(WRDS,['select * from DATASET.DATABASE where varaible1 >= 100']);
result = fetch(query);
result.data

#### Gather & Return Data of A Specific Variable

The following gathers the data of a specific variable, and only returns the values of variables that have been explicitly named:

query = exec(WRDS,['select variable1,variable2,variable3 from DATASET.DATABASE where variable1 >= 100']);
result = fetch(query);
result.data

#### Date & Time Formatting

When working with date formats, SAS datetime format conventions must be used, which are discussed in the SAS Datetime documentation. For example, January 1st, 2014 would look like "01jan2014"d, as used here:

query = exec(WRDS, ['select * from DATASET.DATABASE where date = "01jan2014"d']);
result = fetch(query);
result.data

### Performance Considerations

Each time exec(WRDS,['SQL']) is paired with the fetch() command, a request for data is sent to WRDS, and the results are downloaded. While this is relatively fast with the examples shown above, many WRDS datasets are quite large and would take a long time to download in their entirety. Most researchers choose to limit their SQL queries to small subsets of data—typically specific date ranges—in order to return faster results. WRDS recommends writing a MATLAB program that sequentially makes multiple small queries, as opposed to a few large ones.

MATLAB's "Techniques for Improving Performance" guide dives into further detail on this topic, and is a useful read for anyone considering working with large amounts of data.

## Sample Programs

The following programs provide various sample queries for downloading subsets of data from WRDS and for doing various tasks in MATLAB with that data. These examples build upon some of the basic techniques discussed in the "Working in MATLAB" section.

### Examining A Database & Saving Results as Variables

Before writing a query, sometimes one might want to take a high-level look at a given dataset to better understand its layout.

The example below pulls the first 100 lines of data from the CRSP Daily Stock File (DSF) without any filtering. You are able to limit the returned data using the (obs=100) command. Neglecting to include this command would cause the query to download the database in its entirety, making the query take a long time to run.

query = exec(WRDS, ['SELECT * FROM CRSP.DSF (obs=100)']);
result = fetch(query);
result.data

If you wanted to work further within the returned results, save the results to a variable first. The following example finds the mean value of the vol variable in CRSP.DSF for the first 100 observations of the dataset, then saves it as the variable WRDSCRSP_mean_vol.

query = exec(WRDS, ['SELECT * FROM CRSP.DSF (obs=100)']);
result = fetch(query);
WRDSCRSP = result.data;
WRDSCRSP_mean_vol = mean(WRDSCRSP{1, 11});
WRDSCRSP_mean_vol

### Common MATLAB Research Workflow

Researchers frequently start by exploring a dataset for available databases and variables, and gradually refine their code to find specific data.

Below is an example of what a typical MATLAB workflow looks like. In the steps that follow you'll explore the Dow Jones dataset to determine the layout of the data, decide and select variables, and construct a specific query around our intended research.

In MATLAB, begin by determining what datasets are available from WRDS by entering the schemas(WRDS) command.

To retrieve all the databases within DJONES use the following code:

query = exec(WRDS,['select distinct memname from dictionary.columns where libname="DJONES"']);
result = fetch(query);
result.data

The results indicate that there are two databases available: DJDAILY and DJMONTHLY. Select the DJDAILY database, and return a list of available variables:

query = exec(WRDS,['select name from dictionary.columns where libname="DJONES" and memname="DJDAILY"']);
result = fetch(query);
result.data

This returns a list of 9 variables in the order they appear in the dataset by column, moving from left to right. Viewing the first 100 observations as a sample allows us to gain a better sense of the database:

query = exec(WRDS, ['SELECT * FROM DJONES.DJDAILY (obs=100)']);
result = fetch(query);
result.data

After selecting your variables, narrow the results further. Here, we'll use the variables date and dji:

query = exec(WRDS, ['SELECT date,dji FROM DJONES.DJDAILY (obs=100)']);
result = fetch(query);
result.data

In this example, the results are limited by using the 'dji' variable with the following commands:

query = exec(WRDS, ['SELECT date,dji FROM DJONES.DJDAILY where dji >= 14000']);
result = fetch(query);
result.data

The following set of commands allow you to exclude specific months. In the example that follows, the Dow's performance in October 2007 will be treated as an anomaly. As mentioned in the previous section, you'll need to make sure your dates are formatted according to SAS's datetime format:

query = exec(WRDS, ['SELECT date,dji FROM DJONES.DJDAILY where dji >= 14000 and date not between "01oct2007"d and "30oct2007"d']);
result = fetch(query);
result.data

### Plotting Data

This example gathers and plots the Dow Jones Industrial Average over its existence (since 1896). The code below will supply and manipulate the data, then reformat it as a MATLAB-compatible dataset. In MATLAB 2014a the expected input type for timeseries() changed, thus versions previous to 2014a do not need the data manipulation and column header as shown in the first example.

Regardless of which MATLAB release you are using, the resulting plot should resemble the following image:

### MATLAB 2014a/2014b Sample Code

c = exec(WRDS, 'SELECT date,dji from DJONES.DJDAILY');
c = fetch(c);
close(c);
DJdata = c.Data;

% Create column headers for dataset

DJdata = cell2dataset(DJdata);
ts=timeseries(DJdata.dji,DJdata.date);
plot(ts);

% Add plot title and label axes
title('Dow Jones Industrial Average 1896 - 2008');
xlabel('Date');
ylabel('DJI');

### MATLAB 2013a/2013b Sample Code

c = exec(WRDS,['SELECT date,dji from DJONES.DJDAILY']);
c = fetch(c);
close(c);
DJdata= c.Data;
ts=timeseries(DJdata.dji,DJdata.date);
plot(ts);

% Add plot title and label axes
title('Dow Jones Industrial Average 1896 - 2008');
xlabel('Date');
ylabel('DJI');