
Function for Fitting Univariate Bayesian Conjugate Spatial Regression Models
spConjNNGP.RdThe function spConjNNGP fits Gaussian univariate Bayesian conjugate spatial
regression models using Nearest Neighbor Gaussian Processes (NNGP).
Usage
spConjNNGP(formula, data = parent.frame(), coords, knots, n.neighbors = 15,
theta.alpha, sigma.sq.IG, cov.model = "exponential",
k.fold = 5, score.rule = "crps",
X.0, coords.0, n.omp.threads = 1, search.type = "cb",
ord, return.neighbor.info = TRUE,
neighbor.info, fit.rep = FALSE, n.samples, verbose = TRUE, ...)Arguments
- formula
a symbolic description of the regression model to be fit. See example below.
- data
an optional data frame containing the variables in the model. If not found in data, the variables are taken from
environment(formula), typically the environment from whichspConjNNGPis called.- coords
an \(n \times 2\) matrix of the observation coordinates in \(R^2\) (e.g., easting and northing), or if
datais a data frame thencoordscan be a vector of length two comprising coordinate column names or indices. There can be no duplicate locations.- knots
an \(r \times 2\) matrix of the observation coordinates in \(R^2\) (e.g., easting and northing). Adding the
knotsargument invokes SLGP, see Shin et al. (2019) below.- n.neighbors
number of neighbors used in the NNGP.
- theta.alpha
a vector or matrix of parameter values for
phi,nu, andalpha, where \(\alpha=\tau^2/\sigma^2\) andnuis only required ifcov.model="matern". A vector is passed to run the model using one set of parameters. The vector elements must be named and hold values forphi,nu, andalpha. If a matrix is passed, columns must be named and hold values forphi,nu, andalpha. Each row in the matrix defines a set of parameters for which the model will be run.- sigma.sq.IG
a vector of length two that holds the hyperparameters, shape and scale respectively, for the inverse-Gamma prior on \(\sigma^2\).
- cov.model
a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the observations. Supported covariance model key words are:
"exponential","matern","spherical", and"gaussian". See below for details.- k.fold
specifies the number of k folds for cross-validation. If
theta.alphais a vector then cross-validation is not performed andk-foldandscore.ruleare ignored. In k-fold cross-validation, the data specified inmodelis randomly partitioned into k equal sized subsamples. Of the k subsamples, k-1 subsamples are used to fit the model and the remaining k samples are used for prediction. The cross-validation process is repeated k times (the folds). Root mean squared prediction error (RMSPE) and continuous ranked probability score (CRPS; Gneiting and Raftery, 2007) rules are averaged over the k fold prediction results and reported for the parameter sets defined bytheta.alpha. The parameter set that yields the best performance based on the scoring rule defined byscore.ruleis used to fit the final model that uses all the data and make predictions ifX.0andcoords.0are supplied. Results from the k-fold cross-validation are returned in thek.fold.scoresmatrix.- score.rule
a quoted keyword
"rmspe"or"crps"that specifies the scoring rule used to select the best parameter set, see argument definition fork.foldfor more details.- X.0
the design matrix for prediction locations. An intercept should be provided in the first column if one is specified in
model.- coords.0
the spatial coordinates corresponding to
X.0.- n.omp.threads
a positive integer indicating the number of threads to use for SMP parallel processing. The package must be compiled for OpenMP support. For most Intel-based machines, we recommend setting
n.omp.threadsup to the number of hyperthreaded cores. Note,n.omp.threads> 1 might not work on some systems.- search.type
a quoted keyword that specifies type of nearest neighbor search algorithm. Supported method key words are:
"cb"and"brute". The"cb"should generally be much faster. If locations do not have identical coordinate values on the axis used for the nearest neighbor ordering (seeordargument) then"cb"and"brute"should produce identical neighbor sets. However, if there are identical coordinate values on the axis used for nearest neighbor ordering, then"cb"and"brute"might produce different, but equally valid, neighbor sets, e.g., if data are on a grid.- ord
an index vector of length \(n\) used for the nearest neighbor search. Internally, this vector is used to order
coords, i.e.,coords[ord,], and associated data. Nearest neighbor candidates for the i-th row in the orderedcoordsare rows1:(i-1), with then.neighborsnearest neighbors being those with the minimum euclidean distance to the location defined by orderedcoords[i,]. The default used whenordis not specified is x-axis ordering, i.e.,order(coords[,1]). This argument should typically be left blank. This argument will be ignored if theneighbor.infoargument is used.- return.neighbor.info
if
TRUE, a list calledneighbor.infocontaining several data structures used for fitting the NNGP model is returned. If there is no change in input data orn.neighbors, this list can be passed to subsequentspNNGPcalls via theneighbor.infoargument to avoid the neighbor search, which can be time consuming if \(n\) is large. In addition to the several cryptic data structures inneighbor.infothere is a list calledn.indxthat is of length \(n\). The i-th element inn.indxcorresponds to the i-th row incoords[ord,]and holds the vector of that location's nearest neighbor indices. This list can be useful for plotting the neighbor graph if desired.- neighbor.info
see the
return.neighbor.infoargument description above.- fit.rep
if
TRUE, regression fitted and replicate data will be returned. The argumentn.samplescontrols the number of fitted and replicated data samples.- n.samples
gives the number of posterior samples returned. Note, point and associated variance estimates for model parameters are not based on posterior samples. Only specify
n.samplesif you wish to generate samples from parameters' posteriors (this is an exact sampling algorithm). Iffit.repisTRUE, thenn.samplesalso controls the number of fitted and replicated data samples.- verbose
if
TRUE, model specification and progress is printed to the screen. Otherwise, nothing is printed to the screen.- ...
currently no additional arguments.
Value
An object of class NNGP and conjugate, and, if knots
is provided, SLGP. Among other elements, the object contains:
- theta.alpha
the input
theta.alphavector, or best (according to the selected scoring rule) set of parameters in thetheta.alphamatrix. All subsequent parameter estimates are based on this parameter set.- beta.hat
a matrix of regression coefficient estimates corresponding to the returned
theta.alpha.- beta.var
beta.hatvariance-covariance matrix.- sigma.sq.hat
estimate of \(\sigma^2\) corresponding to the returned
theta.alpha.- sigma.sq.var
sigma.sq.hatvariance.- k.fold.scores
results from the k-fold cross-validation if
theta.alphais a matrix.- y.0.hat
prediction if
X.0andcoords.0are specified.- y.0.var.hat
y.0.hatvariance.- n.neighbors
number of neighbors used in the NNGP.
- neighbor.info
returned if
return.neighbor.info=TRUEsee thereturn.neighbor.infoargument description above.- run.time
execution time for parameter estimation reported using
proc.time(). This time does not include nearest neighbor search time for building the neighbor set.
References
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi:10.1080/01621459.2015.1044091 .
Finley, A.O., A. Datta, S. Banerjee (2022) spNNGP R Package for Nearest Neighbor Gaussian Process Models. Journal of Statistical Software, doi:10.18637/jss.v103.i05 .
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, doi:10.1080/10618600.2018.1537924 .
Gneiting, T and A.E. Raftery. (2007) Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, doi:10.1198/016214506000001437 .
Shirota, S., A.O. Finley, B.D. Cook, and S. Banerjee (2019) Conjugate Nearest Neighbor Gaussian Process models for efficient statistical interpolation of large spatial data. https://arxiv.org/abs/1907.10109.
Author
Andrew O. Finley finleya@msu.edu,
Abhirup Datta abhidatta@jhu.edu,
Sudipto Banerjee sudipto@ucla.edu
Examples
rmvn <- function(n, mu=0, V = matrix(1)){
p <- length(mu)
if(any(is.na(match(dim(V),p))))
stop("Dimension problem!")
D <- chol(V)
t(matrix(rnorm(n*p), ncol=p)%*%D + rep(mu,rep(n,p)))
}
##Make some data
set.seed(1)
n <- 2000
coords <- cbind(runif(n,0,1), runif(n,0,1))
x <- cbind(1, rnorm(n))
B <- as.matrix(c(1,5))
sigma.sq <- 5
tau.sq <- 1
phi <- 3/0.5
D <- as.matrix(dist(coords))
R <- exp(-phi*D)
w <- rmvn(1, rep(0,n), sigma.sq*R)
y <- rnorm(n, x%*%B + w, sqrt(tau.sq))
ho <- sample(1:n, 1000)
y.ho <- y[ho]
x.ho <- x[ho,,drop=FALSE]
w.ho <- w[ho]
coords.ho <- coords[ho,]
y <- y[-ho]
x <- x[-ho,,drop=FALSE]
w <- w[-ho,,drop=FALSE]
coords <- coords[-ho,]
##Fit a Conjugate NNGP model and predict for the holdout
sigma.sq.IG <- c(2, sigma.sq)
cov.model <- "exponential"
g <- 10
theta.alpha <- cbind(seq(phi,30,length.out=g), seq(tau.sq/sigma.sq,5,length.out=g))
colnames(theta.alpha) <- c("phi", "alpha")
m.c <- spConjNNGP(y~x-1, coords=coords, n.neighbors = 10,
X.0 = x.ho, coords.0 = coords.ho,
k.fold = 5, score.rule = "crps",
n.omp.threads = 1,
theta.alpha = theta.alpha, sigma.sq.IG = sigma.sq.IG, cov.model = cov.model)
#> ----------------------------------------
#> Starting k-fold
#> ----------------------------------------
#>
|
| | 0%
|
|****** | 20%
|
|************ | 40%
|
|****************** | 60%
|
|************************ | 80%
|
|******************************| 100%
#> ----------------------------------------
#> Model description
#> ----------------------------------------
#> NNGP Conjugate model fit with 1000 observations.
#>
#> Number of covariates 2 (including intercept if specified).
#>
#> Using the exponential spatial correlation model.
#>
#> Using 10 nearest neighbors.
#>
#> Source compiled with OpenMP support and model fit using 1 thread(s).
#> ------------
#> Priors and hyperpriors:
#> beta flat.
#> sigma.sq IG hyperpriors shape=2.00000 and scale=5.00000
#> ------------
#> Predicting at 1000 locations.
#> ------------
#> Estimation for parameter set(s)
#> Set phi=6.00000 and alpha=0.20000
m.c
#>
#> Call:
#> spConjNNGP(formula = y ~ x - 1, coords = coords, n.neighbors = 10,
#> theta.alpha = theta.alpha, sigma.sq.IG = sigma.sq.IG, cov.model = cov.model,
#> k.fold = 5, score.rule = "crps", X.0 = x.ho, coords.0 = coords.ho,
#> n.omp.threads = 1)
#>
#> Model class is NNGP, method conjugate, family gaussian.