 
                 STATISTICAL INFERENCE ENGINE DOCUMENTATION

                         R. Ashley (ashleyr@vt.edu)
                              January 14, 2001
                         


I. OVERVIEW & USAGE

     A. This program implements the bootstrap-based test described in Ashley
     (1998) {"A New Technique for Postsample Model Selection and Validation"
     in Journal of Economic Dynamics And Control, vol. 22, pp. 647-665} for
     testing whether one postsample forecast error series is larger than
     another.  

     B. This simplified implementation of the test obtains the p-value
     (significance level) at which you can reject the null hypothesis that
     the ratio of the mean square of the first forecast error series you
     input to that of the second series you input is less than or equal to
     one, versus the alternative hypothesis that this ratio exceeds one. 
     This is fundamentally a one-tailed test, so you should really double the
     p-value if you have chosen which way to run the test based on which
     series has the smaller observed mean square value rather than a priori,
     as in a Granger causality test.  

     C. As with any statistical test, the null hypothesis is couched in terms
     of population quantities (here, the expected squared values of the two
     error series) and hence only makes sense if these population quantities
     are constant across the data set.  Consequently, the user is strongly
     advised to plot both series versus time.  This (or any other) test is
     inapplicable if there are large outliers in either series or if the mean
     and/or variance of either series seems to be varying grotesquely across
     the data set.  

     D. As will become evident below, all that the user needs to do is to
     read in both forecast error series and to specify the order of an
     adequate AR(p) model for each series.  (In my experience the off-
     diagonal lag structures in the VAR model discussed in the paper are
     rarely if ever worth worrying about.)  This is easier than it sounds. 
     Once you have input the data into some kind of program to make the time
     plots mentioned in point C above, just estimate a regression model for
     each series against a constant and itself lagged a few periods.  Then
     choose p large enough to capture all of the significant terms in
     regression model.


II. Files

A. enginzip.exe     self extracting archive containing the rest of the
                    files
B. engine.exe       MSDOS executable
C. engine.inp       input file
D. asg.dat          an example of a data file
E. asg.out          an example of an output file
F. manuscript.pdf   pdf vesion of the paper

               


III. Format for data file

A. First two lines are skipped -- you can use these to label the file
B. First 21 characters of next line are label for first series
C. Then put the data on first series, in free format, one datum per line.
D. First 21 characters of next line are label for second series.
D. Then put the data on second series, in free format, one datum per line.


IV. Format for input file (engine.inp), by line number:

 1. File name for data file
 2. File name for output file
 3. Title for job
 4. # of sample observations to be read in on each series. {It is assumed
      that the same number of observations are available for both
      series.  Max: 250}
 5. Largest lag in VAR model {max: 30}
 6. Skipped. This line is a label included to make the input file more 
    readable.
 7. Skipped. This line is a label included to make the input file more 
    readable.
 8. Skipped. This line is a label included to make the input file more 
    readable.
 9. The next set of lines specify the VAR model.  In the example file, line 9
    specifies that numterm = 1 lagged values of variable 1 appear in the
    equation for variable 1. These integers are entered in free format -- it
    doesn't matter what columns they are in, but separate them with at least
    one space.
10. Since numterm > 1 on the previous line, this line lists the lags with 
    which this explanatory variable appears in this equation, in free format.  
    If this first equation had been an AR(2) instead of an AR(1), then numterm 
    would have been 2 and this next line would have a 1 and a 2 on it. If 
    numterm = 0, then no lags of this variable appear, so this line would be 
    omitted.
11. In the example file, line 11 specifies that numterm = 0 lagged values of
    variable 2 appear in the equation for variable one.
12. In the example file, line 12 specifies that numterm = 0 lagged values of
    variable 1 appear in the equation for variable two.
13. In the example file, line 13 specifies that numterm = 1 lagged values of
    variable 2 appear in the equation for variable two.
14. Similarly to line 10, this line lists the numterm lags with which variable
    2 appears in the equation for variable two.


V. Output file

 A. First the title, data read in, and VAR specification are echoed.
 B. The OLS estimates of the VAR model are then given.
 C. Next the sample MSE ratio is given, nicely labelled if you put labels in 
    the data file.
 D. Parameter estimates are bias corrected, as described in the paper.
    (The intercepts are not bias corrected; these are adjusted to force the
    sample mean of the implied residuals to be zero.)
 E. Next the p-values are given.  "SMPL" is from applying the bootstrap
    directly to the data read in.  The next 100 p-values are from data
    generated using the sample data read in as if they were the original
    data.  See the paper: the dispersion of these quantifies how imprecise the
    inference is due to (a) using the empirical distribution to approximate
    the population distribution (this is the bootstrap approximation) and due
    to estimating the VAR parameters.
 F. I recommend quoting the median of these 100 p-values as the point estimate
    significance level for the test and quantifying the (finite sample)
    imprecision in this estimate by quoting a 50% confidence interval whose
    lower endpoint is the 25% fractile (the decile labelled "2.5") and whose
    upper endpoint is the 75% fractile (the decile labelled "7.5").
