Title: | Bootstrap Inference for Multiple Imputation |
---|---|
Description: | Bootstraps and imputes incomplete datasets. Then performs inference on estimates obtained from analysing the imputed datasets as proposed by von Hippel and Bartlett (2021) <doi:10.1214/20-STS793>. |
Authors: | Jonathan Bartlett |
Maintainer: | Jonathan Bartlett <[email protected]> |
License: | GPL-3 |
Version: | 1.2.1 |
Built: | 2024-11-14 05:14:59 UTC |
Source: | https://github.com/jwb133/bootimpute |
Bootstraps an incomplete dataset and then imputes each bootstrap a number
of times. The resulting list of bootstrapped then imputed datasets can
be analysed with bootImputeAnalyse
.
bootImpute( obsdata, impfun, nBoot = 200, nImp = 2, nCores = 1, seed = NULL, ... )
bootImpute( obsdata, impfun, nBoot = 200, nImp = 2, nCores = 1, seed = NULL, ... )
obsdata |
The data frame to be imputed. |
impfun |
A function which when passed an incomplete dataset will return a list of imputed data frames. |
nBoot |
The number of bootstrap samples to take. It is recommended
that you use a minimum of 200. If you specify |
nImp |
The number of times to impute each bootstrap sample. Two is recommended. |
nCores |
The number of CPU cores to use. If specified greater than one,
|
seed |
Random number seed. |
... |
Other parameters that are to be passed through to |
The impfun
must be a function which when passed an incomplete datasets
and possibly additional arguments, returns a list of (e.g. 2) imputed datasets.
The number of imputed datasets that impfun
returns should match the value
you specify for the argument nImp
. Depending on what your imputation function
returns by default, you may need to write a small wrapper function that calls
the imputation procedure and returns the list of nImp
datasets.See the
Example for an illustration with the mice
package.
To improve computation times, bootImpute
now supports
multiple cores through the nCores
argument which uses the parallel
package.
A list of imputed datasets.
#this example shows how you can use bootImpute to impute using the mice #package. If you do want to impute using MICE you can instead use the #bootMice function, which essentially contains the code below library(mice) #write a wrapper function to call mice generating M imputations impM <- function(inputData,M) { miceImps <- mice::mice(inputData, m=M) imps <- vector("list", M) for (i in 1:M) { imps[[i]] <- mice::complete(miceImps,i) } imps } #bootstrap twice and impute each twice #in practice you should bootstrap many more times, e.g. at least 200 #note you have to tell bootImpute how many imputations per bootstrap in #nImp=2 and also pass through whatever your imp function argument is called #for specifying number of imputations, which here is M=2. imps <- bootImpute(ex_linquad, impM, nBoot=2, nImp=2, M=2, seed=564764)
#this example shows how you can use bootImpute to impute using the mice #package. If you do want to impute using MICE you can instead use the #bootMice function, which essentially contains the code below library(mice) #write a wrapper function to call mice generating M imputations impM <- function(inputData,M) { miceImps <- mice::mice(inputData, m=M) imps <- vector("list", M) for (i in 1:M) { imps[[i]] <- mice::complete(miceImps,i) } imps } #bootstrap twice and impute each twice #in practice you should bootstrap many more times, e.g. at least 200 #note you have to tell bootImpute how many imputations per bootstrap in #nImp=2 and also pass through whatever your imp function argument is called #for specifying number of imputations, which here is M=2. imps <- bootImpute(ex_linquad, impM, nBoot=2, nImp=2, M=2, seed=564764)
Applies the user specified analysis function to each imputed dataset contained
in imps
, then calculates estimates, confidence intervals and p-values
for each parameter, as proposed by von Hippel and Bartlett (2021).
bootImputeAnalyse(imps, analysisfun, nCores = 1, quiet = FALSE, ...)
bootImputeAnalyse(imps, analysisfun, nCores = 1, quiet = FALSE, ...)
imps |
The list of imputed datasets returned by |
analysisfun |
A function which when applied to a single dataset returns
the estimate of the parameter(s) of interest. The dataset to be analysed
is passed to |
nCores |
The number of CPU cores to use. If specified greater than one,
|
quiet |
Specify whether to print a table of estimates, confidence intervals and p-values. |
... |
Other parameters that are to be passed through to |
Multiple cores can be used by using the nCores
argument, which may be
useful for reducing computation times.
A vector containing the point estimate(s), variance estimates, and degrees of freedom.
von Hippel PT, Bartlett JW. Maximum likelihood multiple imputation: faster, more efficient imputation without posterior draws. Statistical Science, 2021, 36(3):400-420. doi:10.1214/20-STS793
library(mice) set.seed(564764) #bootstrap twice and impute each twice #in practice you should bootstrap many more times, e.g. at least 200 imps <- bootMice(ex_linquad, nBoot=2, nImp=2) #analyse estimates #write a wapper to analyse an imputed dataset analyseImp <- function(inputData) { coef(lm(y~z+x+xsq,data=inputData)) } ests <- bootImputeAnalyse(imps, analyseImp)
library(mice) set.seed(564764) #bootstrap twice and impute each twice #in practice you should bootstrap many more times, e.g. at least 200 imps <- bootMice(ex_linquad, nBoot=2, nImp=2) #analyse estimates #write a wapper to analyse an imputed dataset analyseImp <- function(inputData) { coef(lm(y~z+x+xsq,data=inputData)) } ests <- bootImputeAnalyse(imps, analyseImp)
Bootstraps an incomplete dataset and then imputes each bootstrap a number
of times using the mice package. The resulting list of bootstrapped then
imputed datasets can be analysed with bootImputeAnalyse
.
To run this function requires the mice
package to be installed.
bootMice(obsdata, nBoot = 200, nImp = 2, nCores = 1, seed = NULL, ...)
bootMice(obsdata, nBoot = 200, nImp = 2, nCores = 1, seed = NULL, ...)
obsdata |
The data frame to be imputed. |
nBoot |
The number of bootstrap samples to take. It is recommended
that you use a minimum of 200. If you specify |
nImp |
The number of times to impute each bootstrap sample. Two is recommended. |
nCores |
The number of CPU cores to use. If specified greater than one, bootImpute will impute using the number of cores specified. |
seed |
Random number seed. |
... |
Other arguments that are to be passed to |
A list of imputed datasets.
library(mice) head(ex_linquad) #bootstrap 10 times and impute each twice imps <- bootMice(ex_linquad, nBoot=10, nImp=2, seed=564764)
library(mice) head(ex_linquad) #bootstrap 10 times and impute each twice imps <- bootMice(ex_linquad, nBoot=10, nImp=2, seed=564764)
Bootstraps an incomplete dataset and then imputes each bootstrap a number
of times using the smcfcs package. The resulting list of bootstrapped then
imputed datasets can be analysed with bootImputeAnalyse
.
To run this function requires the smcfcs
package to be installed.
bootSmcfcs(obsdata, nBoot = 200, nImp = 2, nCores = 1, seed = NULL, ...)
bootSmcfcs(obsdata, nBoot = 200, nImp = 2, nCores = 1, seed = NULL, ...)
obsdata |
The data frame to be imputed. |
nBoot |
The number of bootstrap samples to take. It is recommended
that you use a minimum of 200. If you specify |
nImp |
The number of times to impute each bootstrap sample. Two is recommended. |
nCores |
The number of CPU cores to use. If specified greater than one, bootImpute will impute using the number of cores specified. |
seed |
Random number seed. |
... |
Other arguments that are to be passed to |
A list of imputed datasets.
library(smcfcs) head(ex_linquad) #bootstrap twice and impute each twice #in practice you should bootstrap many more times, e.g. at least 200 imps <- bootSmcfcs(ex_linquad, nBoot=2, nImp=2, smtype="lm", smformula="y~z+x+xsq", method=c("","","norm","x^2",""), seed=564764)
library(smcfcs) head(ex_linquad) #bootstrap twice and impute each twice #in practice you should bootstrap many more times, e.g. at least 200 imps <- bootSmcfcs(ex_linquad, nBoot=2, nImp=2, smtype="lm", smformula="y~z+x+xsq", method=c("","","norm","x^2",""), seed=564764)
A dataset containing simulated data where the outcome depends quadratically on a partially observed covariate.
ex_linquad
ex_linquad
A data frame with 1000 rows and 5 variables:
Continuous outcome
Fully observed covariate, with linear effect on outcome
Partially observed normally distributed covariate, with quadratic effect on outcome
The square of x, which thus has missing values also
An auxiliary variable (i.e. not contained in the substantive model)