Title: | Multiple Imputation for Informative Censoring |
---|---|
Description: | Multiple Imputation for Informative Censoring. This package implements two methods. Gamma Imputation described in <DOI:10.1002/sim.6274> and Risk Score Imputation described in <DOI:10.1002/sim.3480>. |
Authors: | David Ruau [aut], Nikolas Burkoff [aut], Jonathan Bartlett [aut, cre], Dan Jackson [aut], Edmund Jones [aut], Martin Law [aut], Paul Metcalfe [aut] |
Maintainer: | Jonathan Bartlett <[email protected]> |
License: | GPL (>= 2) | file LICENSE |
Version: | 0.3.6 |
Built: | 2024-10-27 03:50:39 UTC |
Source: | https://github.com/jwb133/informativecensoring |
Perform methods of multiple imputation for time to event data
See Nonparametric comparison of two survival functions with dependent censoring via nonparametric multiple imputation. Hsu and Taylor Statistics in Medicine (2009) 28:462-475 for Hsu's method
See Relaxing the independent censoring assumption in the Cox proportional hazards model using multiple imputation. Jackson et al., Statistics in Medicine (2014) 33:4681-4694 for Jackson's method
Specify the columns of the data frame required by score imputation method
col.headings(arm, has.event, time, Id, DCO.time, to.impute, censor.type = NULL)
col.headings(arm, has.event, time, Id, DCO.time, to.impute, censor.type = NULL)
arm |
column name which will contain the subject's treatment group |
has.event |
column name which will contain whether the subject has an event (1) or not(0) |
time |
column name of censoring/event time |
Id |
column name of subject Id |
DCO.time |
column name of the time at which the subject would have been censored had they not had an event before data cut off |
to.impute |
column name of the logical column as to whether events should be imputed |
censor.type |
column name of the column containing the reason for censoring, 0=had event, 1=non-administrative censoring 2=administrative censoring – only subjects with 1 in this column count as having an ‘event’ in the Cox model for censoring (optionally used – if not used then all subjects who are censored are used) |
A list contain the given arguments
See cox.zph function in the survival package
cox.zph(fit, transform = "km", global = TRUE, ...)
cox.zph(fit, transform = "km", global = TRUE, ...)
fit |
the result of fitting a Cox regression model, using the
|
transform |
a character string specifying how the survival times should be transformed
before the test is performed.
Possible values are |
global |
should a global chi-square test be done, in addition to the per-variable or per-term tests tests. |
... |
Additional arguments to cox.zph, for example |
Extract a single risk score/gamma imputed data set/model fit
## S3 method for class 'GammaImputedSet' ExtractSingle(x, index) ## S3 method for class 'GammaStatList' ExtractSingle(x, index) ExtractSingle(x, index) ## S3 method for class 'ScoreImputedSet' ExtractSingle(x, index) ## S3 method for class 'ScoreStatList' ExtractSingle(x, index)
## S3 method for class 'GammaImputedSet' ExtractSingle(x, index) ## S3 method for class 'GammaStatList' ExtractSingle(x, index) ExtractSingle(x, index) ## S3 method for class 'ScoreImputedSet' ExtractSingle(x, index) ## S3 method for class 'ScoreStatList' ExtractSingle(x, index)
x |
The multiple imputed object |
index |
Integer, which imputed data set/model fit should be returned |
The individual data set/model fit
This function performs the Imputation described in Relaxing the independent censoring assumptions in the Cox proportional hazards model using multiple imputation. (2014) D. Jackson et al. Statist. Med. (33) 4681-4694
gammaImpute( formula, data, m, gamma, gamma.factor, bootstrap.strata = rep(1, nrow(data)), DCO.time, ..., parallel = c("no", "multicore", "snow")[1], ncpus = 1L, cl = NULL )
gammaImpute( formula, data, m, gamma, gamma.factor, bootstrap.strata = rep(1, nrow(data)), DCO.time, ..., parallel = c("no", "multicore", "snow")[1], ncpus = 1L, cl = NULL )
formula |
The model formula to be used when fitting the models to calculate
the cumulative hazard. A formula for coxph can include strata terms but not
cluster or tt and only right-censored |
data |
A time to event data set for which event times are to be imputed |
m |
The number of imputations to be created |
gamma |
Either column name containing the value of gamma or a vector of values giving the subject specific
size of the step change in the log hazard at censoring. If a subject has NA in this column then no imputation is performed
for this subject (i.e. the subject's censored time remains unchanged after imputation). If a subject has already had an
event then the value of gamma is ignored. If |
gamma.factor |
If used, a single numeric value. If no |
bootstrap.strata |
The strata argument for stratified bootstrap sampling, see argument |
DCO.time |
Either column name containing the subject's data cutoff time or a vector of DCO.times for the subjects or a single number to be used as the DCO.time for all subjects (if imputed events are > this DCO.time then subjects are censored at DCO.time in imputed data sets) |
... |
Additional parameters to be passed into the model fit function |
parallel |
The type of parallel operation to be used (if any). |
ncpus |
integer: number of processes to be used in parallel operation: typically one would chose this to be the number of available CPUs |
cl |
An optional parallel or snow cluster for use if |
See the Gamma Imputation vignette for further details
A GammaImputedSet.object
containing the imputed data sets
GammaImputedSet.object
GammaImputedData.object
## Not run: data(nwtco) nwtco <- nwtco[1:500,] #creating 2 imputed data sets (m=2) for speed, would normally create more ans <- gammaImpute(formula=Surv(edrel,rel)~histol + instit, data = nwtco, m=2, gamma.factor=1, DCO.time=6209) #subject specific gamma (multiplied by gamma.factor to give the jump) #NA for subjects that are not to be imputed jumps <- c(rep(NA,10),rep(1,490)) DCO.values <- rep(6209,500) ans.2 <- gammaImpute(formula=Surv(edrel,rel)~histol + instit + strata(stage), data = nwtco, m=2, bootstrap.strata=strata(nwtco$stage), gamma=jumps, gamma.factor=1, DCO.time=DCO.values) #can also use column names nwtco$gamma <- jumps nwtco$DCO.time <- DCO.values ans.3 <- gammaImpute(formula=Surv(edrel,rel)~histol + instit + strata(stage), data = nwtco, m=2, bootstrap.strata=strata(nwtco$stage), gamma="gamma", DCO.time="DCO.time") ## End(Not run)
## Not run: data(nwtco) nwtco <- nwtco[1:500,] #creating 2 imputed data sets (m=2) for speed, would normally create more ans <- gammaImpute(formula=Surv(edrel,rel)~histol + instit, data = nwtco, m=2, gamma.factor=1, DCO.time=6209) #subject specific gamma (multiplied by gamma.factor to give the jump) #NA for subjects that are not to be imputed jumps <- c(rep(NA,10),rep(1,490)) DCO.values <- rep(6209,500) ans.2 <- gammaImpute(formula=Surv(edrel,rel)~histol + instit + strata(stage), data = nwtco, m=2, bootstrap.strata=strata(nwtco$stage), gamma=jumps, gamma.factor=1, DCO.time=DCO.values) #can also use column names nwtco$gamma <- jumps nwtco$DCO.time <- DCO.values ans.3 <- gammaImpute(formula=Surv(edrel,rel)~histol + instit + strata(stage), data = nwtco, m=2, bootstrap.strata=strata(nwtco$stage), gamma="gamma", DCO.time="DCO.time") ## End(Not run)
GammaImputedData
objectAn object which contains
data
A data frame containing the time to event data with 3 new columns impute.time and impute.event, the imputed event/censoring times and event indicators (for subjects whose data is not imputed these columns contain the unchanged event/censoring time and event indicator) and internal_gamma_val which is the value of gamma used for each subject in this data set
default.formula
The default model formula which will be used when fitting the imputed data
GammaImputedSet
objectAn object which contains the set of gamma imputed data frames.
Use the ExtractSingle
function to extract a single
GammaImputedData
objects. Use the ImputeStat function to fit models
to the entire set of imputed data frames
It contains the following:
data
A data frame containing the unimputed time to event data (along with a column internal_gamma_val which is the value of gamma used for each subject in this data set)
m
The number of imputed data sets
gamma.factor
The value of gamma.factor used with the imputation
impute.time
A matrix (1 column per imputed data set) containing the imputed times
impute.event
A matrix (1 column per imputed data set) containing the imputed event indicators
default.formula
The default model formula which will be used when fitting the imputed data
GammaStat
objectAn S3 object which contains the point estimate
and test statistic after fitting a model to
a GammaImputedData
object.
The function print.GammaStat
has been implemented
The object contains the following:
model
The model used to create the fit
method
The model used for the fit
estimate
A point estimate of the test statistic
var
The estimate for the variance of the test statistic
GammaStatList
objectThe object containing the results of fitting models to
a GammaImputedSet
object.
A summary.GammaStatList
has been implemented which performs
Rubin's multiple imputation rules.
The object contains the following
fits
A list of GammaStat
objects containing the model fits for
the imputed data sets
statistics
A list with two elements: estimates and vars which contain the coefficient estimates and their variances one column per covariate one row per imputed data set
m
The number of model fits
S3 generic to fit model(s) to risk score/gamma Imputed objects
## S3 method for class 'GammaImputedData' ImputeStat( object, method = c("Cox", "weibull", "exponential")[1], formula = NULL, ... ) ## S3 method for class 'GammaImputedSet' ImputeStat( object, method = c("Cox", "weibull", "exponential")[1], formula = NULL, ..., parallel = c("no", "multicore", "snow")[1], ncpus = 1L, cl = NULL ) ImputeStat( object, method = c("logrank", "Wilcoxon", "Cox", "weibull", "exponential")[1], formula, ... ) ## S3 method for class 'ScoreImputedSet' ImputeStat( object, method = c("logrank", "Wilcoxon", "Cox")[1], formula = NULL, ..., parallel = c("no", "multicore", "snow")[1], ncpus = 1L, cl = NULL )
## S3 method for class 'GammaImputedData' ImputeStat( object, method = c("Cox", "weibull", "exponential")[1], formula = NULL, ... ) ## S3 method for class 'GammaImputedSet' ImputeStat( object, method = c("Cox", "weibull", "exponential")[1], formula = NULL, ..., parallel = c("no", "multicore", "snow")[1], ncpus = 1L, cl = NULL ) ImputeStat( object, method = c("logrank", "Wilcoxon", "Cox", "weibull", "exponential")[1], formula, ... ) ## S3 method for class 'ScoreImputedSet' ImputeStat( object, method = c("logrank", "Wilcoxon", "Cox")[1], formula = NULL, ..., parallel = c("no", "multicore", "snow")[1], ncpus = 1L, cl = NULL )
object |
A |
method |
The type of statistical model to fit. There are three methods which can be performed when using
Risk Score imputation For gamma imputation the model can be "Cox" (using |
formula |
The model formula to fit.
If no formula argument is used, then object$default.formula will be used.
For risk score imputation this is For In all cases only the right hand side of the formula is required The survival object on the left hand side is created automatically E.g. for a Cox model could use formula=~arm + covar1. The cluster and tt options cannot be used See the vignettes for further details |
... |
Additional arguments which are passed into the model fit function |
parallel |
The type of parallel operation to be used (if any), can be used for |
ncpus |
integer: number of processes to be used in parallel operation: typically one would chose this to be
the number of available CPUs, can be used for |
cl |
An optional parallel or snow cluster for use if |
ScoreStat.object
ScoreImputedData.object
ScoreTD
objectCreate a valid ScoreTD
object
MakeTimeDepScore(data, Id, time.start, time.end)
MakeTimeDepScore(data, Id, time.start, time.end)
data |
A data frame of time dependent covariates |
Id |
The column name of the subject Id |
time.start |
The covariates are valid for the time [time.start,time.end] where time.start is the column name of time.start |
time.end |
The covariates are valid for the time [time.start,time.end] where time.end is the column name of time.end |
A ScoreTD
object
Create a list of options which control the nearest neighbour algorithm for risk score imputation
NN.options(NN = 5, w.censoring = 0.2, min.subjects = 20)
NN.options(NN = 5, w.censoring = 0.2, min.subjects = 20)
NN |
The (maximum) number of subjects to be included in the risk set |
w.censoring |
The weighting on the censoring risk score when
calculating distances for the nearest neighbour calculation
A weighting of |
min.subjects |
If using time dependent score imputation include at least this number of subjects when fitting the Cox model (i.e. include some subjects who were censored/had event earlier than the cenosred observation if neccessary) |
A list of options used within the ScoreImputedData function
Perform risk score multiple imputation method
ScoreImpute( data, event.model, censor.model = event.model, col.control, NN.control = NN.options(), time.dep = NULL, m, bootstrap.strata = rep(1, nrow(data)), ..., parallel = c("no", "multicore", "snow")[1], ncpus = 1L, cl = NULL )
ScoreImpute( data, event.model, censor.model = event.model, col.control, NN.control = NN.options(), time.dep = NULL, m, bootstrap.strata = rep(1, nrow(data)), ..., parallel = c("no", "multicore", "snow")[1], ncpus = 1L, cl = NULL )
data |
The data set for which imputation is required |
event.model |
The right hand side of the formula to be used for fitting the Cox model for calculating the time to event score e.g. ~Z1+Z2+Z3. |
censor.model |
The right hand side of the formula to be used for fitting the Cox model for calculating the time to
censoring score if not included then |
col.control |
A list of the columns names of |
NN.control |
Parameters which control the nearest neighbour algorithm. See |
time.dep |
A ScoreTD object, to be included if the time dependent score imputation method is to be used, otherwise it should be NULL |
m |
The number of data sets to impute |
bootstrap.strata |
When performing the bootstrap procedure for fitting the models,
how should the data be stratified (see strata argument to |
... |
Additional arguments passed into the Cox model Note the subset and na.action arguments should not be used (na.fail will be used when fitting the Cox model) |
parallel |
The type of parallel operation to be used (if any). |
ncpus |
integer: number of processes to be used in parallel operation: typically one would chose this to be the number of available CPUs |
cl |
An optional parallel or snow cluster for use if |
Note that coxph may fail to converge and the following output Warning in fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge
It is possible to use ridge regression by including a ridge term in the model formula
(e.g. ~Z1+ridge(Z2,theta=1)
). See ridge
for further details
A ScoreImputedSet
object
data(ScoreInd) col.control <- col.headings(has.event="event", time="time", Id="Id",arm="arm", DCO.time="DCO.time", to.impute="to.impute") ## Not run: ans <- ScoreImpute(data=ScoreInd,event.model=~Z1+Z2+Z3+Z4+Z5, col.control=col.control, m=5, bootstrap.strata=ScoreInd$arm, NN.control=NN.options(NN=5,w.censoring = 0.2)) ## End(Not run)
data(ScoreInd) col.control <- col.headings(has.event="event", time="time", Id="Id",arm="arm", DCO.time="DCO.time", to.impute="to.impute") ## Not run: ans <- ScoreImpute(data=ScoreInd,event.model=~Z1+Z2+Z3+Z4+Z5, col.control=col.control, m=5, bootstrap.strata=ScoreInd$arm, NN.control=NN.options(NN=5,w.censoring = 0.2)) ## End(Not run)
ScoreImputedData
objectAn object which contains
data
A data frame containing the time to event data with 2 new columns impute.time and impute.event, the imputed event/censoring times and event indicators (for subjects whose data is not imputed these columns contain the unchanged event/censoring time and event indicator )
col.control
The list of column names the risk score imputation method requires see col.headings
for further details. If censor.type was not used then col.control$censor.type="using_has.event_col"
default.formula
The default model formula which will be used when fitting the imputed data using a Cox model
ScoreImputedSet
objectAn object which contains the set of score imputed data frames.
Use the ExtractSingle
function to extract a single
ScoreImputedData
object. Use the ScoreStat
function to fit models
to the entire set of imputed data frames
It contains the following:
data
A data frame containing the unimputed time to event data
col.control
The list of column names the score imputation method requires see col.headings
for further details
m
The number of imputed data sets
impute.time
A matrix (1 column per imputed data set) containing the imputed times
impute.event
A matrix (1 column per imputed data set) containing the imputed event indicators
default.formula
The default model formula which will be used when fitting the imputed data using a Cox model
This dataset is inspired by the simulation described in Hsu and Taylor, Statistics in Medicine (2009) 28:462-475 with an additional DCO.time column
A data.frame containing a row per subject with eleven columns:
Id
subject identifier
arm
factor for treatment group control=0, active=1
Z1
binary time independent covariate
Z2
continuous time independent covariate
Z3
binary time independent covariate
Z4
continuous time independent covariate
Z5
binary time independent covariate
event
event indicator (1 yes, 0 no)
time
subject censoring/event time (in years)
to.impute
logical, should an event time be imputed for this subject? (this is ignored if subject has event time)
DCO.time
The time the subject would have been censored if they had not had an event or been censored before the data cut off date
An S3 object which contains the point estimate
and test statistic after fitting a model to
a ScoreImputedData
object.
The functions print.ScoreStat
and as.vector.ScoreStat
have been included
The object contains the following:
The test statistic should be normally distributed and hence for the logrank test Z = (O_2 - E_2)/sqrt(V_2), i.e. the square root of the standard Chi squared statistic (with the appropriate sign)
model
The model used to create the fit
method
The method used for the fit
estimate
A point estimate of the test statistic
var
The estimate for the variance of the test statistic
statistic
The test statistic given by estimate/sqrt(var)
The object containing the results of fitting models to
a ScoreImputedSet
object.
A summary.ScoreStatList
has been implemented.
The object contains the following
fits
A list of ScoreStat
objects containing the model fits for
the imputed data sets
statistics
A ScoreStatSet
object containing the statistics
m
The number of model fits
ScoreStatSet.object
ScoreStat.object
ScoreStatSet
objectS3 generic to create a ScoreStatSet
object
ScoreStatSet(x)
ScoreStatSet(x)
x |
The object to convert into a |
A ScoreStatSet object
The object is a Mx3 matrix, one row per imputed data set and columns: estimate (the point estimates), var (their variances) and Z (the test statistic). M must be > 4
Note the Z should be ~ standard normal (so we do not use the chi_squared
test statistic see ScoreStat.object
for further details)
The summary.ScoreStatSet function will apply the MI averaging procedures and estimates of the test statistic and p-value
ScoreTD
objectThis data frame holds time dependent covariates for use with risk score imputation
The data frame contains the following columns:
'Id' for subject ID
'time.start' and 'time.end' the range of time for which
the covariate values are valid - i.e. [time.start,time.end]
Additional columns are the time dependent covariates
All data for a single subject should be stored in consecutive rows, sorted by time and the starting time of a row should match the ending time of the previous row
This data set contains time dependent covariates for the
ScoreInd
time to event data.
A data.frame containing 1 row per subject-visit
Id
The Subject Id
start
The covariate given in each row are for a given subject from time 'start'...
end
... until time end
W1
The value of a (binary) time dependent variable for the subject with the given 'Id' between times 'start' and 'end'
W2
The value of a (continuous) time dependent variable for the subject with the given 'Id' between times 'start' and 'end'
ScoreStatSet
objectThis object contains the multiple imputed averages/p-values of a set of estimates from risk score imputed data sets.
A print.summary.ScoreStatSet
function has been implemented
This object contains three lists meth1 and meth2 and methRubin meth1 averages the point estimates to produce an F test statistic, meth2 averages the test statistics and prodcues a t test statistic and methRubin follows Rubin's standard rules and is used for calculating confidence intervals
See the vignette for further details.
meth1, meth2 and methRubin are lists with the following elements:
estimate: average estimator for meth1, NOTE: for meth2 this is the average test statistic,
var: estimate of variance of "estimate" field
test.stat: test statistic
distribution: distribution of statistical test (i.e. F or t)
p.value: p-value of test