University of Helsinki Department of Mathematics and Statistics
Faculty of Science
Faculty of Social Sciences

 

BSiZer MATLAB software

From this page you can download MATLAB functions for creating "Bayesian (BSiZer) SiZer" maps.

A few technical points:

  • Routines are written mainly for MATLAB version 5.x. The numerics in later versions are somewhat different (e.g. matrix inverse seem to be different), and therefore some caution should be taken when using versions x.y., x=>6.
  • The routines (i) and (ii) use the MATLAB Statistics toolbox (ver 4). If you do not have this toolbox, then the random number generations (for gamma, inverse gamma, multivariate-t, multivariate normal and inverse Wishart) and inverse of the beta cumulative distribution function should be supplied using your own code. For writing random number generators for the above distributions we refer to Gelman et. al: Bayesian Data Analysis.
  • Instructions for using the routines are found by typing "help function_name". Two examples of Matlab commands for creating maps in the case of independent observations (i.i.d. errors) are provided below.
  • The author makes no claims as to the optimality of the code. Even some bugs might be found!

    Any comments and suggestions are warmly welcome!

Matlab functions

i) Standard BSiZer1: Independent observations and fixed design

  • bsizer.m. Software for fixed \lambda_0.
  • bsizer_rand_l.m. Software for random \lambda_0. Parameters \eta and \beta, for Gamma prior of \lambda_0 should be given.

ii) Extended BSiZer2: Dependent observations and errors in predictors

iii) Model within BSizer3: Realizations from the posterior distributions provided by a problem specific regression model are being used as input for BSizer.


Some tryout data

test_data.mat. Test data: a realization of X+17*sin(X)+20*sin(.5.*X)+15*randn(1,200), where X=[1:200].
test_data.txt. The same in text format; first column - the X values, second column - the response values (variable name in the examples: sini_data).


Some toy examples (in the i.i.d. fixed design case)

  • Fixed lambda_0:
     
    load test_data
    bsizer(sini_data,X,100,1.0,0.4,0,0.95,1)
         
    Produces the map1 with prior parameters 100 (deg.of.dreedom) and 1.0 (scale) for \sigma^2, \lambda_0=0.4, pointwise inference (0), \alpha=0.95, and a color map with shades of blue and red (1).

  • Random lambda_0:
    load test_data
    bsizer_rand_l(sini_data,X,100,1.0,0.2,2,1,0.80,0);
          
    Produces the map2 with prior parameters 100 (deg.of.dreedom) and 1.0 (scale) for \sigma^2, \eta=.2, \beta=2, simultaneous inference, \alpha=0.80, and a color map without shades (0) (binary blue/red coloring).

BSiZer in practice

We have tried to make BSiZer software as easy to use as possible. Of course, the methodology has various switches and therefore the number of input parameters might seem exhausting.

Potentially the most difficult part of a practical data analysis problem is the elicitation of priors necessary for routines (i) and (ii). We have written a separate page about prior selection is BSiZer that is designed mainly for beginners not familiar with the prior distributions used in BSiZer.

Problems with a large n (number of data points) can be rather time consuming, especially with simultaneous features and/or dependent observations & errors in predictors. For dependent observations and fixed design or independent observations with errors errors in predictors, editing the program accordingly can be a good idea. Another idea for computationally demanding problems is, of course, to make the preliminary analysis only with a fraction of data and/or with fewer values of \lambda.

The default selection of the grid for \lambda is just one possibility. We recommend using your own problem specific grid. This can be found by trial-and-error procedure or, if the desired range of values for smoothing parameter in kernel smoothing is known, then this information can be used together with the approximate connection of kernel and spline smoothing (Green, Silverman: Nonparametric Regression and Generalized Linear Models).

Disclaimer

This software is free only for academic and personal use. It must not be modified and distributed without prior permission of the author. The author is not responsible for implications from the use of this software. User agrees that any reports, publications, or other disclosure of results obtained with this software will attribute its use by an appropriate citation:

  1. P. Erästö and L. Holmström. Bayesian multiscale smoothing for making inferences about features in scatter plots. Journal of Computational and Graphical Statistics, 14(3):569-589, 2005.
  2. P. Erästö and L. Holmström. Bayesian analysis of features in a scatter plot with dependent observations and errors in predictors. Journal of Statistical Computation and Simulation, online version, DOI:10.1080/10629360600711988, June 22, 2006.
  3. P. Erästö and L. Holmström. Selection of Prior Distributions and Multiscale Analysis in Bayesian Temperature Reconstructions Based on Fossil Assemblages. Journal of Paleolimnology, 36(1):69-80, 2006.

Last update 18 Mar 2010.

Panu Erästö (panu.erasto'at'helsinki.fi)