Save log likelihoods of k-fold cross-validation for sdmTMB models

sdmTMB_cv(
formula,
data,
mesh_args,
spde,
time = NULL,
k_folds = 8,
fold_ids = NULL,
parallel = TRUE,
use_initial_fit = FALSE,
...
)

## Arguments

formula |
Model formula. |

data |
A data frame. |

mesh_args |
Arguments for `make_mesh()` . If supplied, the mesh will be
reconstruncted for each fold. |

spde |
Output from `make_mesh()` . If supplied, the mesh will be constant
across folds. |

time |
The name of the time column. Leave as `NULL` if this is only
spatial data. |

k_folds |
Number of folds. |

fold_ids |
Optional vector containing user fold IDs. Can also be a
single string, e.g. `"fold_id"` representing the name of the variable in
`data` . |

parallel |
If `TRUE` and a `future::plan()` is supplied, will be run in
parallel. |

use_initial_fit |
Fit the first fold and use those parameter values
as starting values for subsequent folds? Can be faster with many folds. |

... |
All other arguments required to run `sdmTMB()` model with the
exception of `weights` , which are used to define the folds. |

## Value

A list:

`data`

: Original data plus columns for fold ID, CV predicted value,
and CV log likelihood.

`models`

: A list of models; one per fold.

`fold_loglik`

: Sum of left-out log likelihoods per fold.

`fold_elpd`

: Expected log predictive density per fold on left-out data.

`sum_loglik`

: Sum of `fold_loglik`

across all left-out data.

`elpd`

: Expected log predictive density across all left-out data.

`pdHess`

: Logical vector: Hessian was invertible each fold?

`converged`

: Logical: all `pdHess`

`TRUE`

?

`max_gradients`

: Max gradient per fold.

## Examples

if (inla_installed()) {
spde <- make_mesh(pcod, c("X", "Y"), cutoff = 25)
# Set parallel processing if desired:
# library(future)
# plan(multisession)
m_cv <- sdmTMB_cv(
density ~ 0 + depth_scaled + depth_scaled2,
data = pcod, spde = spde,
family = tweedie(link = "log"), k_folds = 2
)
m_cv$fold_elpd
m_cv$elpd
m_cv$fold_loglik
m_cv$sum_loglik
head(m_cv$data)
m_cv$models[[1]]
m_cv$max_gradients
# \donttest{
# Create mesh each fold:
m_cv2 <- sdmTMB_cv(
density ~ 0 + depth_scaled + depth_scaled2,
data = pcod, mesh_args = list(xy_cols = c("X", "Y"), cutoff = 20),
family = tweedie(link = "log"), k_folds = 2
)
# Use fold_ids:
m_cv3 <- sdmTMB_cv(
density ~ 0 + depth_scaled + depth_scaled2,
data = pcod, spde = spde,
family = tweedie(link = "log"),
fold_ids = rep(seq(1, 3), nrow(pcod))[seq(1, nrow(pcod))]
)
# }
}
#> Running fits with `future.apply()`.
#> Set a parallel `future::plan()` to use parallel processing.
#> Running fits with `future.apply()`.
#> Set a parallel `future::plan()` to use parallel processing.
#> Running fits with `future.apply()`.
#> Set a parallel `future::plan()` to use parallel processing.