SASPy Sample Program (1) Basic Functions

Eunji Oh (WRDS), June 2020

Requirements:




0. Select Python 3 Kernel (The default kernel on WRDS Jupyter Hub)



1. Import packages

In [1]:
import wrds
import saspy
from IPython.display import HTML

import pandas as pd

import datetime



2. SAS: Work with SASPy

2.1 Start a SAS Session

In [2]:
# Start a SAS session and check configuration information
sas = saspy.SASsession(cfgname='default') # Available configurations: 'default', 'sasu8'
sas 
SAS Connection established. Subprocess id is 21168

No encoding value provided. Will try to determine the correct encoding.
Setting encoding to latin_1 based upon the SAS session encoding value of latin1.

Out[2]:
Access Method         = STDIO
SAS Config name       = default
SAS Config file       = /usr/local/sas/jupyterhub/prod/venvs/20200506/lib/python3.8/site-packages/saspy/sascfg.py
WORK Path             = /sastemp/SAS_work7E43000052B0_wrds-sas30-w.wharton.private/
SAS Version           = 9.04.01M6P11072018
SASPy Version         = 3.3.5
Teach me SAS          = False
Batch                 = False
Results               = Pandas
SAS Session Encoding  = latin1
Python Encoding value = latin_1
SAS process Pid value = 21168

2.2 Set a library reference

In [ ]:
# Set a library reference to home directory
home = sas.saslib(libref='home', path='~')
scratch_dir = '/scratch/' + 'your-instituion-nickname' # Please see: https://wrds-www.wharton.upenn.edu/pages/support/the-wrds-cloud/managing-data/storing-your-data-wrds-cloud/
scrt = sas.saslib(libref='scrt', path=scratch_dir)
In [4]:
# Check assinged libraries
assigned_librefs = sas.assigned_librefs() # Check all assigned libraries
print(assigned_librefs)
['CRLCMACR', 'SCRT', 'HOME', 'ZACKSAMP', 'ZACKS', 'WRDSAPPS', 'WRDSSEC', 'WQA', 'WCAI', 'WAPPSAMP', 'TWOIQSMP', 'TWOIQ', 'TRWS', 'TRSAMP', 'TROWN', 'TRACE', 'TOTALQ', 'TFN', 'TASS', 'TAQMSAMP', 'TAQSAMP', 'TAQMSEC', 'TAQ', 'SUSTSAMP', 'SUSTAIN', 'SPRAT', 'SNLSAMP', 'SNLBR', 'SNL', 'SNAPSAMP', 'SECSAMP', 'SDCSAMP', 'SDC', 'RPNASAMP', 'RPNA', 'RISKSAMP', 'RISK', 'REVERE', 'REPSAMP', 'REPRISK', 'RENT', 'PWT', 'PUBLIC', 'PREQSAMP', 'PREQIN', 'PPUBSAMP', 'PPUBLICA', 'PHLX', 'PACAP', 'OTC', 'OPTIONM', 'OMTRIAL', 'NASTRAQ', 'MSRBSAMP', 'MSRB', 'MFL', 'MRKTSAMP', 'MSFINST', 'MSFANLY', 'MARKIT', 'LSPDSAMP', 'LSPD', 'LVNSAMP', 'LEVIN', 'KTSAMP', 'KTMINE', 'KPISAMP', 'KLD', 'ISSM', 'ISS', 'IRI', 'IMSSAMP', 'IMS', 'IFGRSAMP', 'IFGR', 'IBESKPI', 'IBESCORP', 'IBES', 'HFRSAMP', 'HFR', 'HBASESMP', 'HBASE', 'HBASEPRM', 'GOVPXSMP', 'GOVPX', 'GMI', 'GINSIGHT', 'FRB', 'FTSESAMP', 'FTSE', 'FISDSAMP', 'FISD', 'FF', 'FDIC', 'FSSAMP', 'FACTSET', 'EURSAMP', 'EUREKA', 'ETFGSAMP', 'ETFG', 'EMDB', 'DOE', 'DMEF', 'DJONES', 'DEALSCAN', 'CUSIPM', 'CSMAR', 'CRSPSAMP', 'CRSP', 'CRSPM', 'CRSPQ', 'CRSPA', 'CONTRIB', 'COMSCORE', 'COMPSNAP', 'COMPSAMP', 'COMPH', 'COMP', 'COMPD', 'EXECCOMP', 'COMPSEGD', 'COMPDCUR', 'COMPNAD', 'COMPA', 'COMPGD', 'COMPG', 'COMPBD', 'COMPB', 'CLRVTSMP', 'CLRVT', 'CISDMSMP', 'CISDM', 'CIQSAMP', 'CIQ', 'CENTRIS', 'CBOE', 'CALCSAMP', 'CALCBNCH', 'BVDSAMP', 'BVD', 'BOARDSMP', 'BOARDEX', 'BLOCK', 'BANK', 'AUDIT', 'AHASAMP', 'AHA', 'VOLINDEX', 'SASHELP', 'MAPS', 'MAPSSAS', 'MAPSGFK', 'SASUSER', 'WORK']

2.3 Access SAS Datasets

2.3.1 Access a SAS dataset
In [5]:
# Access SAS Datasets
## Set dataset options (SAS datastep options)
### Available options: where (str), keep (str or list), drop (str or list), firstobs (int or str), obs (int or str), format (dict)

options = {'where'    : 'fyear = 2018 and indfmt="INDL" and datafmt="STD" and popsrc="D" and consol="C"',
           'keep'     : 'gvkey fyear fyr datadate indfmt datafmt popsrc consol at lt ni',
           'format'  : {'datadate':'YYMMDDN8.'}
}

sd = sas.sasdata(table='funda', libref='compa', dsopts=options)
2.3.2 SAS Data object methods
In [6]:
# Summarize 
sd.describe()
Out[6]:
Variable Label N NMiss Median Mean StdDev Min P25 P50 P75 Max
0 datadate Data Date 10147 0 21549.000 21537.901153 48.733448 21365 21549.000 21549.000 21549.000 21700
1 fyear Data Year - Fiscal 10147 0 2018.000 2018.000000 0.000000 2018 2018.000 2018.000 2018.000 2018
2 fyr Fiscal Year-end Month 10147 0 12.000 10.996649 2.477692 1 12.000 12.000 12.000 12
3 at Assets - Total 7059 3088 622.104 14895.579907 104435.243626 0 61.684 622.104 3640.800 3418318
4 lt Liabilities - Total 7053 3094 305.875 11995.001959 96308.824512 0 17.061 305.875 2267.116 3412078
5 ni Net Income (Loss) 7026 3121 4.096 317.629223 1744.255505 -22355 -8.722 4.096 108.080 59531
In [7]:
# First 5 obs
sd.head()
Out[7]:
gvkey datadate fyear indfmt consol popsrc datafmt fyr at lt ni
0 001045 2018-12-31 2018 INDL C D STD 12 60580.000 60749.000 1412.000
1 001050 2018-12-31 2018 INDL C D STD 12 392.582 214.022 -7.121
2 001062 2018-11-30 2018 INDL C D STD 11 196.072 1.238 -48.789
3 001072 2019-03-31 2018 INDL C D STD 3 2813.278 429.098 271.813
4 001075 2018-12-31 2018 INDL C D STD 12 17664.202 12315.497 511.047
In [8]:
# Last 5 obs
sd.tail()
Out[8]:
gvkey datadate fyear indfmt consol popsrc datafmt fyr at lt ni
0 327451 2018-12-31 2018 INDL C D STD 12 447.627 155.124 -20.640
1 328032 2018-12-31 2018 INDL C D STD 12 413.527 231.054 12.700
2 328692 2018-12-31 2018 INDL C D STD 12 76.962 26.414 -0.665
3 328795 2018-12-31 2018 INDL C D STD 12 2172.200 487.700 75.700
4 330227 2018-09-30 2018 INDL C D STD 9 9.624 0.149 0.981
In [9]:
# Histogram
sd.hist(var='ni', title = 'Net Income histogram (Fiscal year 2018)', label = 'NI')
SAS Output
The SGPlot Procedure
In [10]:
# To see the last SAS log
print(sas.lastlog())
290  ods listing close;ods html5 (id=saspy_internal) file=stdout
290! options(bitmap_mode='inline') device=svg style=HTMLBlue; ods graphics on /
290! outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
291  
292  proc sgplot data=compa.'funda'n (where=(fyear = 2018 and indfmt="INDL" and
292! datafmt="STD" and popsrc="D" and consol="C") keep=gvkey fyear fyr datadate
292! indfmt datafmt popsrc consol at lt ni );
293  	format datadate YYMMDDN8. ;;
294  	histogram 'ni'n / scale=count LegendLABEL='NI';
295  	title "Net Income histogram (Fiscal year 2018)";
296  	density 'ni'n;
297  run;
ERROR: The Java extension is unable to access SAS options.
NOTE: The Java locale will default to the host's, unless overridden.
NOTE: The Java timezone will default to the host's, unless overridden.
NOTE: There were 10147 observations read from the data set COMPA.FUNDA.
      WHERE (fyear=2018) and (indfmt='INDL') and (datafmt='STD') and 
      (popsrc='D') and (consol='C');
NOTE: PROCEDURE SGPLOT used (Total process time):
      real time           6.96 seconds
      cpu time            0.76 seconds
      
298  title;
299  
300  ods html5 (id=saspy_internal) close;ods listing;

301  
In [11]:
# Sort
sd.sort(by = 'gvkey fyear', out='funda_copy') # This code raises an error!
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-11-124194c58377> in <module>
      1 # Sort
----> 2 sd.sort(by = 'gvkey fyear', out='funda_copy') # This code raises an error!

/usr/local/sas/jupyterhub/prod/venvs/20200506/lib/python3.8/site-packages/saspy/sasdata.py in sort(self, by, out, **kwargs)
    831                     elog.append(line)
    832             if len(elog):
--> 833                 raise RuntimeError("\n".join(elog))
    834         if out:
    835             if not isinstance(out, str):

RuntimeError: ERROR 180-322: Statement is not valid or it is used out of proper order.

By sd.sort, the syntax below runs in the SAS session.

  • proc sort data=compa.funda(where=...) ; format datadate yymmddn8. ; out=funda_2018;run;

which is invalide sort procedure syntax!

This is because SASPy's _dsopts() method generates dataset options that cannot be used properly in proc sort as shown below.

In [12]:
sd._dsopts()
Out[12]:
'(where=(fyear = 2018 and indfmt="INDL" and datafmt="STD" and popsrc="D" and consol="C") keep=gvkey fyear fyr datadate indfmt datafmt popsrc consol at lt ni );\n\tformat datadate YYMMDDN8. ;'

Thus, use sort method for a SASdata object without dsopts to avoid the error or use submit method instead. Please see section 4 below for 'submit' method.

In [13]:
# Sort
new_sd = sas.sasdata(table='funda', libref='compa')
new_sd.sort(by = 'gvkey fyear', out = 'funda_copy') # This code does not raise any error!
print(sas.lastlog())
350  
351  data _null_; e = exist("WORK.'funda_copy'n");
352  v = exist("WORK.'funda_copy'n", 'VIEW');
353   if e or v then e = 1;
354  put 'TABLE_EXISTS=' e 'TAB_EXTEND=';run;
TABLE_EXISTS=1 TAB_EXTEND=
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      
355  
356  

2.4 Exchange data between SAS Dataset and Pandas Dataframe

2.4.1 SASPy SAS Data Object to Pandas DataFrame
In [14]:
df_from_sd = sd.to_df()
df_from_sd.head()
type(df_from_sd), type(sd)
Out[14]:
(pandas.core.frame.DataFrame, saspy.sasdata.SASdata)
2.4.2 From the dataset location (using table name and library reference name)
In [15]:
df_from_sas_ds = sas.sd2df(table = 'idx_index', libref = 'compa', method = 'memory') # method: [memory, CSV, DISK], DISK is recommended for large datasets
df_from_sas_ds.head()
Out[15]:
conm gvkeyx idx13key idxcstflg idxstat indexcat indexgeo indexid indextype indexval spii spmi tic tici
0 S&P Industrials-Wed 000001 0000000000000 N A S&P USA 500 LGCAP 000000000 NaN NaN I0001 I0001
1 S&P Industrials-Ltd 000002 0000000000425 N A S&P USA 500 COMPOSITE 105 NaN 10 I0002 I0002
2 S&P 500 Comp-Ltd 000003 0000000000500 Y A S&P USA 500 LGCAP 500 NaN 10 I0003 I0003
3 American Stock Exchange Ind 000004 0000000001010 N A EXCHG USA NONE NONE NONE NaN NaN I0004 I0004
4 Dow Jones Industrials-30 Stk 000005 0000000001020 Y A DOW USA NONE NONE NONE NaN NaN I0005 I0005
2.4.3 From Pandas Dataframe to a SAS dataset
In [16]:
db = wrds.Connection()
wrds_table = db.get_table(library = 'crsp', table = 'msi')

# Convert a pandas df to a SAS dataset
wrds_table_sd = sas.df2sd(df = wrds_table, table = 'wrds_table', libref='work')

# Check if 'wrds_table' is correctly created
print(sas.lastlog())
Loading library list...
Done

1631  
1632  data _null_; e = exist("work.'wrds_table'n");
1633  v = exist("work.'wrds_table'n", 'VIEW');
1634   if e or v then e = 1;
1635  put 'TABLE_EXISTS=' e 'TAB_EXTEND=';run;
TABLE_EXISTS=1 TAB_EXTEND=
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      
1636  
1637  
In [17]:
wrds_table_sd.head()
Out[17]:
date vwretd vwretx ewretd ewretx sprtrn spindx totval totcnt usdval usdcnt
0 1925-12-31 NaN NaN NaN NaN NaN 12.46 27487487.2 503 NaN NaN
1 1926-01-30 0.000561 -0.001395 0.023174 0.021395 0.022472 12.74 27624240.8 506 27412916.2 496.0
2 1926-02-27 -0.033046 -0.036587 -0.053510 -0.055547 -0.043956 12.18 26752064.1 514 27600952.1 500.0
3 1926-03-31 -0.064002 -0.070021 -0.096824 -0.101404 -0.059113 11.46 25083173.4 519 26683758.1 507.0
4 1926-04-30 0.037019 0.034031 0.032946 0.030121 0.022688 11.72 25886743.8 521 24899755.6 512.0



3. Submit a SAS job

3.1 Submit SAS code

In [18]:
# Submit the job (Add code that gives both log and output)
sas_code = '''%let i=1; %let j=%eval(&i*100+3); 
               proc odstext;
                   title "SAS Output";
                   p "&j is calculated by the formula (i*100 + 3) where i is &i";
               run;
           '''
results = sas.submit(sas_code)
In [19]:
print(results['LOG'])
1704  ods listing close;ods html5 (id=saspy_internal) file=stdout
1704! options(bitmap_mode='inline') device=svg style=HTMLBlue; ods graphics on /
1704!  outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
1705  
1706  %let i=1; %let j=%eval(&i*100+3);
1707                 proc odstext;
1708                     title "SAS Output";
1709                     p "&j is calculated by the formula (i*100 + 3) where i
1709! is &i";
1710                 run;
NOTE: PROCEDURE ODSTEXT used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
      
1711  
1712  
1713  ods html5 (id=saspy_internal) close;ods listing;

1714  
In [20]:
HTML(results['LST'])
Out[20]:
SAS Output

SAS Output

103 is calculated by the formula (i*100 + 3) where i is 1

In [21]:
# Submit and print LOG
sas.submitLOG(sas_code)
1716  ods listing close;ods html5 (id=saspy_internal) file=stdout
1716! options(bitmap_mode='inline') device=svg style=HTMLBlue; ods graphics on /
1716!  outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
1717  
1718  %let i=1; %let j=%eval(&i*100+3);
1719                 proc odstext;
1720                     title "SAS Output";
1721                     p "&j is calculated by the formula (i*100 + 3) where i
1721! is &i";
1722                 run;
NOTE: PROCEDURE ODSTEXT used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
      
1723  
1724  
1725  ods html5 (id=saspy_internal) close;ods listing;

1726  
In [22]:
# Submit and show OUTPUT
sas_code = '''proc means data=compa.funda; var at lt; where fyear=2018;run; quit;'''
sas.submitLST(sas_code)
SAS Output

SAS Output

The MEANS Procedure

Variable Label N Mean Std Dev Minimum Maximum
at
lt
Assets - Total
Liabilities - Total
13990
8038
16622.72
17002.00
117786.70
123814.04
0
0
3418318.00
3412078.00

3.2 Submit from a sas file (SAS %include)

In [ ]:
# Let's run a WRDS SAS macro for Event Study (SAS macro: https://wrds-www.wharton.upenn.edu/pages/support/applications/event-studies/event-study-research-application/)
# Run your SAS macro in the SASPy SAS Sessioin by submitting %include code
sas.submitLOG('''%include "~/SASPy_Application/SAS_macro/event_study.sas";''')

3.3 Jupyter Magic: Work like you were using SAS

The first line of the cell must start with "%%SAS session_name"

In [24]:
%%SAS sas

/* Get S&P 500 companies*/
proc sql;
   create table crspd500 as
   select a.*, b.ret, b.date
   from crsp.dsp500list as a,
         crsp.dsf as b
   where a.permno=b.permno
   and b.date >= a.start and b.date<= a.ending
   order by date;
quit;
Out[24]:

2108  ods listing close;ods html5 (id=saspy_internal) file=stdout
2108! options(bitmap_mode='inline') device=svg style=HTMLBlue; ods graphics on /
2108! outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
2109
2110
2111 /* Get S&P 500 companies*/
2112 proc sql;
2113 create table crspd500 as
2114 select a.*, b.ret, b.date
2115 from crsp.dsp500list as a,
2116 crsp.dsf as b
2117 where a.permno=b.permno
2118 and b.date >= a.start and b.date<= a.ending
2119 order by date;
NOTE: Table WORK.CRSPD500 created, with 8712482 rows and 5 columns.

2120 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 11.66 seconds
cpu time 11.88 seconds

2121
2122
2123 ods html5 (id=saspy_internal) close;ods listing;

2124



4. Exchange macro variables

In [25]:
#From SAS to Python (int, float, str types are available)
sas.submitLST('''%let today = %sysfunc(today(),date9.); 
                 %let num_var=10; 
                 proc odstext;
                     title SAS Macro Variables;
                       mvar today num_var;
                       p "SAS Macro variable ''today'': "|| today;
                       p "SAS Macro variable ''num_var'': "|| num_var;                    
                  run;
                 ''')
SAS Output

SAS Macro Variables

SAS Macro variable ''today'': 30JUN2020

SAS Macro variable ''num_var'': 10

In [26]:
today = sas.symget('today')
num_var = sas.symget('num_var')

print ('Transfer SAS macro variables to Python session\n','today:', today, 'type:', type(today), 'num_var:', num_var, 'type:', type(num_var))
Transfer SAS macro variables to Python session
 today: 30JUN2020 type: <class 'str'> num_var: 10 type: <class 'int'>
In [27]:
# Create and assign value to a SAS macro var from Python
tomorrow = datetime.date.today() + datetime.timedelta(days = 1)

sas.symput(name='tomorrow_py', value=tomorrow)
sas.submitLST('''proc odstext;
                    title "Python to SAS";
                    p "&tomorrow_py from Python, &today from SAS";
                    run;
                 ''')
SAS Output

Python to SAS

2020-07-01 from Python, 30JUN2020 from SAS

In [28]:
# Change value of a SAS macro var
sas.symput(name='tomorrow_py', value='06262020')
sas.submitLST('''proc odstext;
                    title "Change value from Python";
                    p "&tomorrow_py from Python, &today from SAS";
                    run;
                 ''')
SAS Output

Change value from Python

06262020 from Python, 30JUN2020 from SAS



5. Connect to WRDS via PC-SAS Connect from your Local Computer (Skip if you work on WRDS Jupyter Hub)

In [ ]:
# PC-SAS connect
PC_SAS = '''%let wrds = wrds.wharton.upenn.edu 4016;
options comamid=TCP remote=wrds;
signon username="&username" password="&pw";
'''
prompt = {'username':True, 'pw': True}

sas.submitLOG(code=PC_SAS, prompt=prompt)

# Submit SAS code by 'submit' method
main_code = '''your main sas code to run on WRDS server'''
main_code_remote = ' '.join['rsubmit;', main_code, 'endrsubmit;']
sas.submit(main_code_remote)

# Disconnect from the remote SAS session. Note that this does not end your LOCAL SASPY SAS session.
sas.submit('''signoff;''')



6. Finish the sas session and WRDS db connection

In [29]:
# Finish the SAS 
sas.endsas()
SAS Connection terminated. Subprocess id was 21168
In [30]:
db.close()