
Function for Fitting Univariate Bayesian Spatial Regression Models
spNNGP.RdThe function spNNGP fits Gaussian and non-Gaussian univariate Bayesian spatial
regression models using Nearest Neighbor Gaussian Processes (NNGP).
Usage
spNNGP(formula, data = parent.frame(), coords, method = "response",
family="gaussian", weights, n.neighbors = 15,
starting, tuning, priors, cov.model = "exponential",
n.samples, n.omp.threads = 1, search.type = "cb", ord,
return.neighbor.info = FALSE, neighbor.info,
fit.rep = FALSE, sub.sample, verbose = TRUE, n.report = 100, ...)Arguments
- formula
a symbolic description of the regression model to be fit. See example below.
- data
an optional data frame containing the variables in the model. If not found in data, the variables are taken from
environment(formula), typically the environment from whichspNNGPis called.- coords
an \(n \times 2\) matrix of the observation coordinates in \(R^2\) (e.g., easting and northing), or if
datais a data frame thencoordscan be a vector of length two comprising coordinate column names or indices. There can be no duplicate locations.- method
a quoted keyword that specifies the NNGP sampling algorithm. Supported method keywords are:
"response"and"latent". When \(n\) is large, the"response"algorithm should be faster and provide finer control over Metropolis acceptance rate for covariance parameters. In general, unless estimates of spatial random effects are needed, the"response"algorithm should be used.- family
a quoted keyword that specifies the data likelihood. Choices are "gaussian" for continuous outcome and "binomial" for discrete outcome which assumes a logistic link modeled using Polya-Gamma latent variables.
- weights
specifies the number of trials for each observation when
family="binomial". The default is 1 trial for each observation. Valid arguments are a scalar value that specifies the number of trials used if all observations have the same number of trials, and a vector of length \(n\) that specifies the number of trials for each observation used there are differences in the number of trials.- n.neighbors
number of neighbors used in the NNGP.
- starting
a list with each tag corresponding to a parameter name. Valid tags are
beta,sigma.sq,tau.sq,phi, andnu.nuis only specified ifcov.model="matern". The value portion of each tag is the parameter's startingvalue.- tuning
a list with each tag corresponding to a parameter name. Valid tags are
sigma.sq,tau.sq,phi, andnu. Ifmethod="latent"then onlyphiandnuneed to be specified. The value portion of each tag defines the variance of the Metropolis sampler Normal proposal distribution.- priors
a list with each tag corresponding to a parameter name. Valid tags are
sigma.sq.ig,tau.sq.ig,phi.unif, andnu.unif. Variance parameters,simga.sqandtau.sq, are assumed to follow an inverse-Gamma distribution, whereas the spatial decayphiand smoothnessnuparameters are assumed to follow Uniform distributions. The hyperparameters of the inverse-Gamma are passed as a vector of length two, with the first and second elements corresponding to the shape and scale, respectively. The hyperparameters of the Uniform are also passed as a vector of length two with the first and second elements corresponding to the lower and upper support, respectively.- cov.model
a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the observations. Supported covariance model key words are:
"exponential","matern","spherical", and"gaussian". See below for details.- n.samples
the number of posterior samples to collect.
- n.omp.threads
a positive integer indicating the number of threads to use for SMP parallel processing. The package must be compiled for OpenMP support. For most Intel-based machines, we recommend setting
n.omp.threadsup to the number of hyperthreaded cores. Note,n.omp.threads> 1 might not work on some systems.- fit.rep
if
TRUE, regression fitted and replicate data will be returned. The argumentsub.samplecontrols which MCMC samples are used to generate the fitted and replicated data.- sub.sample
an optional list that specifies the samples used for
fit.rep. Valid tags arestart,end, andthin. Given the value associated with the tags, the sample subset is selected usingseq(as.integer(start), as.integer(end), by=as.integer(thin)). The default values arestart=floor(0.5*n.samples),end=n.samplesandthin=1.- verbose
if
TRUE, model specification and progress of the sampler is printed to the screen. Otherwise, nothing is printed to the screen.- search.type
a quoted keyword that specifies type of nearest neighbor search algorithm. Supported method key words are:
"cb"and"brute". The"cb"should generally be much faster. If locations do not have identical coordinate values on the axis used for the nearest neighbor ordering (seeordargument) then"cb"and"brute"should produce identical neighbor sets. However, if there are identical coordinate values on the axis used for nearest neighbor ordering, then"cb"and"brute"might produce different, but equally valid, neighbor sets, e.g., if data are on a grid.- ord
an index vector of length \(n\) used for the nearest neighbor search. Internally, this vector is used to order
coords, i.e.,coords[ord,], and associated data. Nearest neighbor candidates for the i-th row in the orderedcoordsare rows1:(i-1), with then.neighborsnearest neighbors being those with the minimum euclidean distance to the location defined by orderedcoords[i,]. The default used whenordis not specified is x-axis ordering, i.e.,order(coords[,1]). This argument should typically be left blank. This argument will be ignored if theneighbor.infoargument is used.- return.neighbor.info
if
TRUE, a list calledneighbor.infocontaining several data structures used for fitting the NNGP model is returned. If there is no change in input data orn.neighbors, this list can be passed to subsequentspNNGPcalls via theneighbor.infoargument to avoid the neighbor search, which can be time consuming if \(n\) is large. In addition to the several cryptic data structures inneighbor.infothere is a list calledn.indxthat is of length \(n\). The i-th element inn.indxcorresponds to the i-th row incoords[ord,]and holds the vector of that location's nearest neighbor indices. This list can be useful for plotting the neighbor graph if desired.- neighbor.info
see the
return.neighbor.infoargument description above.- n.report
the interval to report Metropolis sampler acceptance and MCMC progress.
- ...
currently no additional arguments.
Value
An object of class NNGP with additional class designations for
method and family. The return object is a list comprising:
- p.beta.samples
a
codaobject of posterior samples for the regression coefficients.- p.theta.samples
a
codaobject of posterior samples for covariance parameters.- p.w.samples
is a matrix of posterior samples for the spatial random effects, where rows correspond to locations in
coordsand columns hold then.samplesposterior samples. This is only returned ifmethod="latent".- y.hat.samples
if
fit.rep=TRUE, regression fitted values from posterior samples specified usingsub.sample. See additional details below.- y.hat.quants
if
fit.rep=TRUE, 0.5, 0.025, and 0.975 quantiles of they.hat.samples.- y.rep.samples
if
fit.rep=TRUE, replicated outcome from posterior samples specified usingsub.sample. See additional details below.- y.rep.quants
if
fit.rep=TRUE, 0.5, 0.025, and 0.975 quantiles of they.rep.samples.- s.indx
if
fit.rep=TRUE, the subset index specified withsub.sample.- neighbor.info
returned if
return.neighbor.info=TRUEsee thereturn.neighbor.infoargument description above.- run.time
execution time for parameter estimation reported using
proc.time(). This time does not include nearest neighbor search time for building the neighbor set.
The return object will include additional objects used for subsequent prediction and/or model fit evaluation.
Details
Model parameters can be fixed at their starting values by setting their
tuning values to zero.
The no nugget model is specified by setting tau.sq to zero
in the starting and tuning lists.
References
Finley, A.O., A. Datta, S. Banerjee (2022) spNNGP R Package for Nearest Neighbor Gaussian Process Models. Journal of Statistical Software, doi:10.18637/jss.v103.i05 .
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi:10.1080/01621459.2015.1044091 .
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, doi:10.1080/10618600.2018.1537924 .
Author
Andrew O. Finley finleya@msu.edu,
Abhirup Datta abhidatta@jhu.edu,
Sudipto Banerjee sudipto@ucla.edu
Examples
rmvn <- function(n, mu=0, V = matrix(1)){
p <- length(mu)
if(any(is.na(match(dim(V),p))))
stop("Dimension problem!")
D <- chol(V)
t(matrix(rnorm(n*p), ncol=p)%*%D + rep(mu,rep(n,p)))
}
##Make some data
set.seed(1)
n <- 100
coords <- cbind(runif(n,0,1), runif(n,0,1))
x <- cbind(1, rnorm(n))
B <- as.matrix(c(1,5))
sigma.sq <- 5
tau.sq <- 1
phi <- 3/0.5
D <- as.matrix(dist(coords))
R <- exp(-phi*D)
w <- rmvn(1, rep(0,n), sigma.sq*R)
y <- rnorm(n, x%*%B + w, sqrt(tau.sq))
##Fit a Response and Latent NNGP model
n.samples <- 500
starting <- list("phi"=phi, "sigma.sq"=5, "tau.sq"=1)
tuning <- list("phi"=0.5, "sigma.sq"=0.5, "tau.sq"=0.5)
priors <- list("phi.Unif"=c(3/1, 3/0.01), "sigma.sq.IG"=c(2, 5), "tau.sq.IG"=c(2, 1))
cov.model <- "exponential"
m.s <- spNNGP(y~x-1, coords=coords, starting=starting, method="latent", n.neighbors=10,
tuning=tuning, priors=priors, cov.model=cov.model,
n.samples=n.samples, n.omp.threads=1)
#> ----------------------------------------
#> Building the neighbor list
#> ----------------------------------------
#> ----------------------------------------
#> Building the neighbors of neighbors list
#> ----------------------------------------
#> ----------------------------------------
#> Model description
#> ----------------------------------------
#> NNGP Latent model fit with 100 observations.
#>
#> Number of covariates 2 (including intercept if specified).
#>
#> Using the exponential spatial correlation model.
#>
#> Using 10 nearest neighbors.
#>
#> Number of MCMC samples 500.
#>
#> Priors and hyperpriors:
#> beta flat.
#> sigma.sq IG hyperpriors shape=2.00000 and scale=5.00000
#> tau.sq IG hyperpriors shape=2.00000 and scale=1.00000
#> phi Unif hyperpriors a=3.00000 and b=300.00000
#>
#> Source compiled with OpenMP support and model fit using 1 thread(s).
#> ----------------------------------------
#> Sampling
#> ----------------------------------------
#> Sampled: 100 of 500, 20.00%
#> Report interval Metrop. Acceptance rate: 52.00%
#> Overall Metrop. Acceptance rate: 52.00%
#> -------------------------------------------------
#> Sampled: 200 of 500, 40.00%
#> Report interval Metrop. Acceptance rate: 53.00%
#> Overall Metrop. Acceptance rate: 52.50%
#> -------------------------------------------------
#> Sampled: 300 of 500, 60.00%
#> Report interval Metrop. Acceptance rate: 52.00%
#> Overall Metrop. Acceptance rate: 52.33%
#> -------------------------------------------------
#> Sampled: 400 of 500, 80.00%
#> Report interval Metrop. Acceptance rate: 44.00%
#> Overall Metrop. Acceptance rate: 50.25%
#> -------------------------------------------------
#> Sampled: 500 of 500, 100.00%
#> Report interval Metrop. Acceptance rate: 56.00%
#> Overall Metrop. Acceptance rate: 51.40%
#> -------------------------------------------------
summary(m.s)
#>
#> Call:
#> spNNGP(formula = y ~ x - 1, coords = coords, method = "latent",
#> n.neighbors = 10, starting = starting, tuning = tuning, priors = priors,
#> cov.model = cov.model, n.samples = n.samples, n.omp.threads = 1)
#>
#> Model class is NNGP, method latent, family gaussian.
#>
#> Model object contains 500 MCMC samples.
#>
#> Chain sub.sample:
#> start = 250
#> end = 500
#> thin = 1
#> samples size = 251
#> 2.5% 25% 50% 75% 97.5%
#> x1 1.4532 1.9494 2.1688 2.4433 2.7806
#> x2 4.4950 4.6990 4.8426 4.9718 5.2373
#> sigma.sq 2.3670 3.5773 4.1840 4.8601 6.5904
#> tau.sq 0.2882 0.5759 0.7437 1.1290 2.0091
#> phi 7.3423 10.2011 12.2097 15.6045 25.8783
plot(apply(m.s$p.w.samples, 1, median), w)
m.r <- spNNGP(y~x-1, coords=coords, starting=starting, method="response", n.neighbors=10,
tuning=tuning, priors=priors, cov.model=cov.model,
n.samples=n.samples, n.omp.threads=1)
#> ----------------------------------------
#> Building the neighbor list
#> ----------------------------------------
#> ----------------------------------------
#> Model description
#> ----------------------------------------
#> NNGP Response model fit with 100 observations.
#>
#> Number of covariates 2 (including intercept if specified).
#>
#> Using the exponential spatial correlation model.
#>
#> Using 10 nearest neighbors.
#>
#> Number of MCMC samples 500.
#>
#> Priors and hyperpriors:
#> beta flat.
#> sigma.sq IG hyperpriors shape=2.00000 and scale=5.00000
#> tau.sq IG hyperpriors shape=2.00000 and scale=1.00000
#> phi Unif hyperpriors a=3.00000 and b=300.00000
#>
#> Source compiled with OpenMP support and model fit using 1 thread(s).
#> ----------------------------------------
#> Sampling
#> ----------------------------------------
#> Sampled: 100 of 500, 20.00%
#> Report interval Metrop. Acceptance rate: 33.00%
#> Overall Metrop. Acceptance rate: 33.00%
#> -------------------------------------------------
#> Sampled: 200 of 500, 40.00%
#> Report interval Metrop. Acceptance rate: 28.00%
#> Overall Metrop. Acceptance rate: 30.50%
#> -------------------------------------------------
#> Sampled: 300 of 500, 60.00%
#> Report interval Metrop. Acceptance rate: 29.00%
#> Overall Metrop. Acceptance rate: 30.00%
#> -------------------------------------------------
#> Sampled: 400 of 500, 80.00%
#> Report interval Metrop. Acceptance rate: 19.00%
#> Overall Metrop. Acceptance rate: 27.25%
#> -------------------------------------------------
#> Sampled: 500 of 500, 100.00%
#> Report interval Metrop. Acceptance rate: 23.00%
#> Overall Metrop. Acceptance rate: 26.40%
#> -------------------------------------------------
summary(m.r)
#>
#> Call:
#> spNNGP(formula = y ~ x - 1, coords = coords, method = "response",
#> n.neighbors = 10, starting = starting, tuning = tuning, priors = priors,
#> cov.model = cov.model, n.samples = n.samples, n.omp.threads = 1)
#>
#> Model class is NNGP, method response, family gaussian.
#>
#> Model object contains 500 MCMC samples.
#>
#> Chain sub.sample:
#> start = 250
#> end = 500
#> thin = 1
#> samples size = 251
#> 2.5% 25% 50% 75% 97.5%
#> x1 1.1877 1.8342 2.1101 2.3642 2.9873
#> x2 4.4913 4.7032 4.8236 4.9365 5.2016
#> sigma.sq 1.7811 2.9784 3.3282 4.3400 7.5759
#> tau.sq 0.2571 0.7788 1.1362 1.6107 2.1387
#> phi 5.9495 8.0168 12.7328 15.3784 20.1438
##Fit with some user defined neighbor ordering
##ord <- order(coords[,2]) ##y-axis
ord <- order(coords[,1]+coords[,2]) ##x+y-axis
##ord <- sample(1:n, n) ##random
m.r.xy <- spNNGP(y~x-1, coords=coords, starting=starting, method="response", n.neighbors=10,
tuning=tuning, priors=priors, cov.model=cov.model,
ord=ord, return.neighbor.info=TRUE,
n.samples=n.samples, n.omp.threads=1)
#> ----------------------------------------
#> Building the neighbor list
#> ----------------------------------------
#> ----------------------------------------
#> Model description
#> ----------------------------------------
#> NNGP Response model fit with 100 observations.
#>
#> Number of covariates 2 (including intercept if specified).
#>
#> Using the exponential spatial correlation model.
#>
#> Using 10 nearest neighbors.
#>
#> Number of MCMC samples 500.
#>
#> Priors and hyperpriors:
#> beta flat.
#> sigma.sq IG hyperpriors shape=2.00000 and scale=5.00000
#> tau.sq IG hyperpriors shape=2.00000 and scale=1.00000
#> phi Unif hyperpriors a=3.00000 and b=300.00000
#>
#> Source compiled with OpenMP support and model fit using 1 thread(s).
#> ----------------------------------------
#> Sampling
#> ----------------------------------------
#> Sampled: 100 of 500, 20.00%
#> Report interval Metrop. Acceptance rate: 18.00%
#> Overall Metrop. Acceptance rate: 18.00%
#> -------------------------------------------------
#> Sampled: 200 of 500, 40.00%
#> Report interval Metrop. Acceptance rate: 19.00%
#> Overall Metrop. Acceptance rate: 18.50%
#> -------------------------------------------------
#> Sampled: 300 of 500, 60.00%
#> Report interval Metrop. Acceptance rate: 18.00%
#> Overall Metrop. Acceptance rate: 18.33%
#> -------------------------------------------------
#> Sampled: 400 of 500, 80.00%
#> Report interval Metrop. Acceptance rate: 32.00%
#> Overall Metrop. Acceptance rate: 21.75%
#> -------------------------------------------------
#> Sampled: 500 of 500, 100.00%
#> Report interval Metrop. Acceptance rate: 27.00%
#> Overall Metrop. Acceptance rate: 22.80%
#> -------------------------------------------------
summary(m.r.xy)
#>
#> Call:
#> spNNGP(formula = y ~ x - 1, coords = coords, method = "response",
#> n.neighbors = 10, starting = starting, tuning = tuning, priors = priors,
#> cov.model = cov.model, n.samples = n.samples, n.omp.threads = 1,
#> ord = ord, return.neighbor.info = TRUE)
#>
#> Model class is NNGP, method response, family gaussian.
#>
#> Model object contains 500 MCMC samples.
#>
#> Chain sub.sample:
#> start = 250
#> end = 500
#> thin = 1
#> samples size = 251
#> 2.5% 25% 50% 75% 97.5%
#> x1 0.7771 1.6841 2.0819 2.4073 2.8944
#> x2 4.4813 4.6922 4.8117 4.9609 5.1920
#> sigma.sq 1.9662 3.5184 4.0121 4.9067 6.8949
#> tau.sq 0.2933 0.6134 0.9782 1.4793 3.2605
#> phi 3.6888 6.3814 10.2711 13.9859 18.4597
if (FALSE) { # \dontrun{
##Visualize the neighbor sets and ordering constraint
n.indx <- m.r.xy$neighbor.info$n.indx
ord <- m.r.xy$neighbor.info$ord
##This is how the data are ordered internally for model fitting
coords.ord <- coords[ord,]
for(i in 1:n){
plot(coords.ord, cex=1, xlab="Easting", ylab="Northing")
points(coords.ord[i,,drop=FALSE], col="blue", pch=19, cex=1)
points(coords.ord[n.indx[[i]],,drop=FALSE], col="red", pch=19, cex=1)
readline(prompt = "Pause. Press <Enter> to continue...")
}
} # }