Package 'riskPredictClustData'

Title: Assessing Risk Predictions for Clustered Data
Description: Assessing and comparing risk prediction rules for clustered data. The method is based on the paper: Rosner B, Qiu W, and Lee MLT.(2013) <doi: 10.1007/s10985-012-9240-6>.
Authors: Bernard Rosner [aut, ctb], Weiliang Qiu [aut, cre], Meiling T. Lee [aut, ctb]
Maintainer: Weiliang Qiu <[email protected]>
License: GPL (>= 2)
Version: 0.2.6
Built: 2024-11-04 21:43:27 UTC
Source: https://github.com/cran/riskPredictClustData

Help Index


Generate simulated data from logistic mixed effects model based on the AMD data

Description

Generate simulated data from logistic mixed effects model based on the AMD data.

Usage

genSimDataGLMEM(
  nSubj = 131, 
  beta0 = -6, 
  sd.beta0i = 1.58, 
  beta1 = 1.58, 
  beta2 = -3.95, 
  beta3 = 3.15, 
  beta4 = 2.06, 
  beta5 = 0.51, 
  beta6 = 1.47, 
  beta7 = 3.11, 
  p.smkcur = 0.08, 
  p.inieye31 = 0.44, 
  p.inieye32 = 0.42, 
  p.inieye41 = 0.12, 
  p.inieye42 = 0.11, 
  sd.lncalorc = 0.33)

Arguments

nSubj

integer. Number of subjects. Each subject would have data for 2 eyes.

beta0

mean of intercept β0i\beta_{0i}, which is assumed random and follows normal distribution N(β0,σβ2)N(\beta_0, \sigma^2_{\beta})

sd.beta0i

standard deviation σβ2\sigma^2_{\beta} of the random intercept β0i\beta_{0i}.

beta1

slope for the binary covariate cursmkcursmk (current smoking status). cursmk=1cursmk=1 indicates current smokers. cursmk=0cursmk=0 indicates past smokers or never smokers.

beta2

slope for the continuous mean-centered covariate lncalorclncalor_c.

beta3

slope for the binary covariate inieye3inieye3 indicating if an eye of a subject has initial grade equal to 3. inieye3=1inieye3=1 indicates the eye has initial grade equal to 3.

beta4

slope for the binary covariate inieye4inieye4 indicating if an eye of a subject has initial grade equal to 4. inieye4=1inieye4=1 indicates the eye has initial grade equal to 4.

beta5

slope for the binary covariate rtotfat1rtotfat_1 indicating if the subject's total fat intake is in the 2nd quartile of total fat intake. rtotfat1=1rtotfat_1=1 indicates the subject is in the 2nd quartile.

beta6

slope for the binary covariate rtotfat2rtotfat_2 indicating if the subject's total fat intake is in the 3rd quartile of total fat intake. rtotfat2=1rtotfat_2=1 indicates the subject is in the 3rd quartile.

beta7

slope for the binary covariate rtotfat3rtotfat_3 indicating if the subject's total fat intake is in the 4th quartile of total fat intake. rtotfat3=1rtotfat_3=1 indicates the subject is in the 4th quartile.

p.smkcur

proportion of current smokers.

p.inieye31

proportion of left eye having inital grade equal to 3.

p.inieye32

proportion of right eye having inital grade equal to 3.

p.inieye41

proportion of left eye having inital grade equal to 4.

p.inieye42

proportion of right eye having inital grade equal to 4.

sd.lncalorc

standard deviation for lncalorclncalor_c.

Details

We generate simulated data set from the following generalized linear mixed effects model:

log(pij(1pij))=β0i+β1smkcuri+β2lncalorci+β3inieye3ij+β4inieye4ij+β5rtotfat1i+β6rtotfat2i+β7rtotfat3i,\log\left(\frac{p_{ij}}{(1-p_{ij})}\right)=\beta_{0i}+\beta_1 smkcur_i+ \beta_2 lncalor_{ci} + \beta_3 inieye3_{ij} + \beta_4 inieye4_{ij} +\beta_5 rtotfat_{1i} +\beta_6 rtotfat_{2i} + \beta_7 rtotfat_{3i},

i=1,,N,j=1,2i=1,\ldots, N, j=1, 2, β0iN(β0,σβ2).\beta_{0i}\sim N\left(\beta_0, \sigma^2_{\beta}\right).

Value

A data frame with 8 columns: cid, subuid, prog, smkcur, lncalorc, inieye3, inieye4, and rtotfat, where cid is the subject id, subuid is the unit id, and prog is the progression status. prog=1prog=1 indicates the eye is progressed. prog=0prog=0 indicates the eye is not progressed. There are nSubj*2 rows. The first nSubj rows are for the left eyes and the second nSubj rows are for the right eyes.

Author(s)

Bernard Rosner <[email protected]>, Weiliang Qiu <[email protected]>, Meiling Ting Lee <[email protected]>

References

Rosner B, Qiu W, and Lee MLT. Assessing Discrimination of Risk Prediction Rules in a Clustered Data Setting. Lifetime Data Anal. 2013 Apr; 19(2): 242-256.

Examples

set.seed(1234567)
datFrame = genSimDataGLMEM(nSubj = 30, beta0 = -6, sd.beta0i = 1.58, 
                          beta1 = 1.58, beta2 = -3.95, beta3 = 3.15, beta4 = 2.06,
                          beta5 = 0.51, beta6 = 1.47, beta7 = 3.11, 
                          p.smkcur = 0.08, p.inieye31 = 0.44, p.inieye32 = 0.42,
                          p.inieye41 = 0.12, p.inieye42 = 0.11, sd.lncalorc = 0.33)
print(dim(datFrame))
print(datFrame[1:2,])

Get data frame for the function riskPredict

Description

Get data frame for the function riskPredict.

Usage

getScore(fmla, cidVar, subuidVar, statusVar, datFrame, mycorstr = "exchangeable",
  verbose = FALSE)

Arguments

fmla

A formula object for the function gee

cidVar

character. Phenotype variable name for cluster id

subuidVar

character. Phenotype variable name for unit id

statusVar

character. Phenotype variable name for progression status

datFrame

A data frame with at least 3 columns corresponding to cid (indicated by cidVar), subuid (indicated by subuidVar), status (indicated by statusID). cid indicates cluster id; subuid indicates unit ID within a cluster; status=1 indicates an eye is progressed; status=0 indicates an eye is not progressed.

mycorstr

character. indicates correlation structure. see the manual for the function gee in the R library gee

verbose

logical. indicating if summary of gee results should be printed out.

Value

A list with two elements: frame and gee.obj. frame is a data frame with at least 4 columns: cid, subuid, status, and score. cid indicates cluster id; subuid indicates unit ID within a cluster; status=1 indicates an eye is progressed; status=0 indicates an eye is not progressed; score represents the risk score.

gee.obj is the object returned by gee function.

Author(s)

Bernard Rosner <[email protected]>, Weiliang Qiu <[email protected]>, Meiling Ting Lee <[email protected]>

References

Rosner B, Qiu W, and Lee MLT. Assessing Discrimination of Risk Prediction Rules in a Clustered Data Setting. Lifetime Data Anal. 2013 Apr; 19(2): 242-256.

Examples

set.seed(1234567)
datFrame = genSimDataGLMEM(nSubj = 30, beta0 = -6, sd.beta0i = 1.58, 
                          beta1 = 1.58, beta2 = -3.95, beta3 = 3.15, beta4 = 2.06,
                          beta5 = 0.51, beta6 = 1.47, beta7 = 3.11, 
                          p.smkcur = 0.08, p.inieye31 = 0.44, p.inieye32 = 0.42,
                          p.inieye41 = 0.12, p.inieye42 = 0.11, sd.lncalorc = 0.33)

print(dim(datFrame))
print(datFrame[1:2,])

tt1 = getScore(fmla = prog~smkcur+lncalorc+inieye3+inieye4+factor(rtotfat), 
  cidVar = "cid", subuidVar = "subuid", statusVar = "prog", 
  datFrame = datFrame, mycorstr = "exchangeable",
  verbose = FALSE)
myframe1=tt1$frame

gee.obj=tt1$gee.obj
print(summary(gee.obj))

print(dim(myframe1))
print(myframe1[1:3,])

Calculate the power for testing δ=0\delta=0

Description

Calculate the power for testing δ=0\delta=0.

Usage

powerCal(
  nSubj, 
  mu1, 
  triangle, 
  rho, 
  rho11, 
  rho22, 
  rho12, 
  p11, 
  p10, 
  p01, 
  alpha = 0.05)

Arguments

nSubj

integer. number of subjects to be generated. Assume each subject has two observations.

mu1

μ1=H(Y)H(Yc)\mu_1=H(Y)-H(Y_c) is the difference between probit transformation H(Y)H(Y) and probit-shift alternative H(Yc)H(Y_c), where YY is the prediction score of a randomly selected progressing subunit, and YcY_c is the counterfactual random variable obtained if each subunit that had progressed actually had not progressed.

triangle

the difference of the expected value the the extended Mann-Whitney U statistics between two prediction rules, i.e., =ηc(1)ηc(2)\triangle = \eta^{(1)}_c - \eta^{(2)}_c

rho

ρ=corr(H(Zij),H(Zk))\rho=corr\left(H\left(Z_{ij}\right), H\left(Z_{k\ell}\right)\right), where H=Φ1H=\Phi^{-1} is the probit transformation.

rho11

ρ11=corr(Hij(1),Hi(1))\rho_{11}=corr\left(H_{ij}^{(1)}, H_{i\ell}^{(1)}\right), where H=Φ1H=\Phi^{-1} is the probit transformation.

rho22

ρ22=corr(Hij(2),Hi(2))\rho_{22}=corr\left(H_{ij}^{(2)}, H_{i\ell}^{(2)}\right), where H=Φ1H=\Phi^{-1} is the probit transformation.

rho12

ρ12=corr(Hij(1),Hi(2))\rho_{12}=corr\left(H_{ij}^{(1)}, H_{i\ell}^{(2)}\right), where H=Φ1H=\Phi^{-1} is the probit transformation.

p11

p11=Pr(δi1=1&δi2=1)p_{11}=Pr(\delta_{i1}=1 \& \delta_{i2}=1), where δij=1\delta_{ij}=1 if the jj-th subunit of the ii-th cluster has progressed.

p10

p10=Pr(δi1=1&δi2=0)p_{10}=Pr(\delta_{i1}=1 \& \delta_{i2}=0), where δij=1\delta_{ij}=1 if the jj-th subunit of the ii-th cluster has progressed.

p01

p01=Pr(δi1=0&δi2=1)p_{01}=Pr(\delta_{i1}=0 \& \delta_{i2}=1), where δij=1\delta_{ij}=1 if the jj-th subunit of the ii-th cluster has progressed.

alpha

type I error rate

Value

the power

Author(s)

Bernard Rosner <[email protected]>, Weiliang Qiu <[email protected]>, Meiling Ting Lee <[email protected]>

References

Rosner B, Qiu W, and Lee MLT. Assessing Discrimination of Risk Prediction Rules in a Clustered Data Setting. Lifetime Data Anal. 2013 Apr; 19(2): 242-256.

Examples

set.seed(1234567)
mu1 = 0.8

power = powerCal(nSubj = 30, mu1 = mu1, 
  triangle = 0.05, rho = 0.93, rho11 = 0.59, rho22 = 0.56, rho12 = 0.52,
  p11 = 0.115, p10 = 0.142, p01 = 0.130, alpha = 0.05)

print(power)

Calculate the power for testing δ=0\delta=0 based on a dataset

Description

Calculate the power for testing δ=0\delta=0 based on a dataset.

Usage

powerCalData(
  nSubj, 
  triangle, 
  frame,
  alpha = 0.05)

Arguments

nSubj

integer. number of subjects to be generated. Assume each subject has two observations.

triangle

the difference of the expected value the the extended Mann-Whitney U statistics between two prediction rules, i.e., =ηc(1)ηc(2)\triangle = \eta^{(1)}_c - \eta^{(2)}_c

frame

A data frame with 5 columns: cid, subuid, status, score1, and score2. cid indicates cluster id; subuid indicates unit ID within a cluster; status=1 indicates an eye is progressed; status=0 indicates an eye is not progressed; score1 represents the score based on prediction rule 1. score2 represents the score based on prediction rule 2.

alpha

type I error rate

Value

A list with 11 elements.

power

the esstimated power

rho

ρ=corr(H(Zij),H(Zk))\rho=corr\left(H\left(Z_{ij}\right), H\left(Z_{k\ell}\right)\right), where H=Φ1H=\Phi^{-1} is the probit transformation.

rho11

ρ11=corr(Hij(1),Hi(1))\rho_{11}=corr\left(H_{ij}^{(1)}, H_{i\ell}^{(1)}\right), where H=Φ1H=\Phi^{-1} is the probit transformation.

rho22

ρ22=corr(Hij(2),Hi(2))\rho_{22}=corr\left(H_{ij}^{(2)}, H_{i\ell}^{(2)}\right), where H=Φ1H=\Phi^{-1} is the probit transformation.

rho12

ρ12=corr(Hij(1),Hi(2))\rho_{12}=corr\left(H_{ij}^{(1)}, H_{i\ell}^{(2)}\right), where H=Φ1H=\Phi^{-1} is the probit transformation.

p11

p11=Pr(δi1=1&δi2=1)p_{11}=Pr(\delta_{i1}=1 \& \delta_{i2}=1), where δij=1\delta_{ij}=1 if the jj-th subunit of the ii-th cluster has progressed.

p10

p10=Pr(δi1=1&δi2=0)p_{10}=Pr(\delta_{i1}=1 \& \delta_{i2}=0), where δij=1\delta_{ij}=1 if the jj-th subunit of the ii-th cluster has progressed.

p01

p01=Pr(δi1=0&δi2=1)p_{01}=Pr(\delta_{i1}=0 \& \delta_{i2}=1), where δij=1\delta_{ij}=1 if the jj-th subunit of the ii-th cluster has progressed.

p00

p00=Pr(δi1=0&δi2=0)p_{00}=Pr(\delta_{i1}=0 \& \delta_{i2}=0), where δij=1\delta_{ij}=1 if the jj-th subunit of the ii-th cluster has progressed.

mu1

μ1=H(Y)H(Yc)\mu_1=H(Y)-H(Y_c) is the difference between probit transformation H(Y)H(Y) and probit-shift alternative H(Yc)H(Y_c) for the first prediction score, where YY is the prediction score of a randomly selected progressing subunit, and YcY_c is the counterfactual random variable obtained if each subunit that had progressed actually had not progressed.

mu2

μ2=H(Y)H(Yc)\mu_2=H(Y)-H(Y_c) is the difference between probit transformation H(Y)H(Y) and probit-shift alternative H(Yc)H(Y_c) for the second prediction score, where YY is the prediction score of a randomly selected progressing subunit, and YcY_c is the counterfactual random variable obtained if each subunit that had progressed actually had not progressed.

Author(s)

Bernard Rosner <[email protected]>, Weiliang Qiu <[email protected]>, Meiling Ting Lee <[email protected]>

References

Rosner B, Qiu W, and Lee MLT. Assessing Discrimination of Risk Prediction Rules in a Clustered Data Setting. Lifetime Data Anal. 2013 Apr; 19(2): 242-256.

Examples

set.seed(1234567)

datFrame = genSimDataGLMEM(nSubj = 30, beta0 = -6, sd.beta0i = 1.58, 
                          beta1 = 1.58, beta2 = -3.95, beta3 = 3.15, beta4 = 2.06,
                          beta5 = 0.51, beta6 = 1.47, beta7 = 3.11, 
                          p.smkcur = 0.08, p.inieye31 = 0.44, p.inieye32 = 0.42,
                          p.inieye41 = 0.12, p.inieye42 = 0.11, sd.lncalorc = 0.33)

print(dim(datFrame))
print(datFrame[1:2,])

# prediction rule 1
tt1 = getScore(fmla = prog~smkcur+lncalorc+inieye3+inieye4+factor(rtotfat), 
  cidVar = "cid", subuidVar = "subuid", statusVar = "prog", 
  datFrame = datFrame, mycorstr = "exchangeable",
  verbose = FALSE)
myframe1=tt1$frame

print(dim(myframe1))
print(myframe1[1:3,])

####
# prediction rule 2
tt2 = getScore(fmla = prog~smkcur+lncalorc+inieye3+inieye4, 
  cidVar = "cid", subuidVar = "subuid", statusVar = "prog", 
  datFrame = datFrame, mycorstr = "exchangeable",
  verbose = FALSE)
myframe2=tt2$frame

print(dim(myframe2))
print(myframe2[1:3,])

# combine scores from two prediction rules
myframe12=myframe1[, c("cid", "subuid", "status")]
myframe12$score1=myframe1$score
myframe12$score2=myframe2$score
print(dim(myframe12))
print(myframe12[1:3,])


res = powerCalData(nSubj = 30, triangle = 0.05, frame=myframe12, alpha = 0.05)

print(res)

Assessing risk prediction performance for clustered data

Description

Assessing risk prediction performance for clustered data.

Usage

riskPredict(frame, alpha=0.05)

Arguments

frame

A data frame with 4 columns: cid, subuid, status, and score. cid indicates cluster id; subuid indicates unit ID within a cluster; status=1 indicates an eye is progressed; status=0 indicates an eye is not progressed; score represents the risk score.

alpha

numeric. confidence level for etaceta_c.

Details

To obtain 95% confidence interval of ηc\eta_c, we first obtain 95% confidence interval [c1,c2][c_1, c_2] for Φ1(ηc)\Phi^{-1}(\eta_c), then transform back: [Φ(c1),Φ(c2)][\Phi(c_1), \Phi(c_2)].

Value

A list of 6 elements:

stat

the test statistics η^c(1)\hat{\eta}_c^{(1)} hateta_c^(1) based on the prediction rule.

se.stat

standard error of the test statistic under the null hypothesis.

z

z score z=(stat - 0.5)/se.stat

pval

p-value of the test

rho

correlation between H(Zij)andH(Zi)H(Z_{ij}) and H(Z_{i \ell})

mu.hat

estimated μ\mu.

theta.hat

estimated θ\theta.

theta.c.hat

estimated θc\theta_c.

E.stat.Ha

expectation of η^c\hat{\eta}_c under the alternative hypothesis.

se.stat.Ha

standard error for η^c\hat{\eta}_c under the alternative hypothesis.

CIlow

lower confidence limit for ηc\eta_c.

CIupp

upper confidence limit for ηc\eta_c.

datHk

A nSubj by 2 matrix of probit transformed risk scores by using only the first 2 observations of each subject.

ci

the vector of cic_i, the number of progressing subunits for the ii-th subject.

di

the vector of did_i, the number of non-progressing subunits for the ii-th subject.

Author(s)

Bernard Rosner <[email protected]>, Weiliang Qiu <[email protected]>, Meiling Ting Lee <[email protected]>

References

Rosner B, Qiu W, and Lee MLT. Assessing Discrimination of Risk Prediction Rules in a Clustered Data Setting. Lifetime Data Anal. 2013 Apr; 19(2): 242-256.

Examples

set.seed(1234567)
datFrame = genSimDataGLMEM(nSubj = 30, beta0 = -6, sd.beta0i = 1.58, 
                          beta1 = 1.58, beta2 = -3.95, beta3 = 3.15, beta4 = 2.06,
                          beta5 = 0.51, beta6 = 1.47, beta7 = 3.11, 
                          p.smkcur = 0.08, p.inieye31 = 0.44, p.inieye32 = 0.42,
                          p.inieye41 = 0.12, p.inieye42 = 0.11, sd.lncalorc = 0.33)

print(dim(datFrame))
print(datFrame[1:2,])

tt1 = getScore(fmla = prog~smkcur+lncalorc+inieye3+inieye4+factor(rtotfat), 
  cidVar = "cid", subuidVar = "subuid", statusVar = "prog", 
  datFrame = datFrame, mycorstr = "exchangeable",
  verbose = FALSE)
myframe1=tt1$frame

print(dim(myframe1))
print(myframe1[1:3,])

res1 = riskPredict(myframe1)
print(names(res1))
print(res1)

Difference of two risk prediction rules for clustered data

Description

Difference of two risk prediction rules for clustered data.

Usage

riskPredictDiff(frame, alpha = 0.05)

Arguments

frame

A data frame with 5 columns: cid, subuid, status, score1, and score2. cid indicates cluster id; subuid indicates unit ID within a cluster; status=1 indicates an eye is progressed; status=0 indicates an eye is not progressed; score1 represents the score based on prediction rule 1. score2 represents the score based on prediction rule 2.

alpha

numeric. The confidence level.

Value

A list of 7 elements:

diff

the difference of test statistics η^c(1)η^c(2)\hat{\eta}_c^{(1)}-\hat{\eta}_c^{(2)} hateta_c^(1)-hateta_c^(2) based on the 2 prediction rules.

se.diff

standard error of the difference under the null hypothesis.

z

z score z=diff/se.diff

pval

p-value of the test

res1

output object of the function riskPredict for prediction rule 1.

res2

output object of the function riskPredict for prediction rule 2.

rhoVec

A vector of 4 correlations: ρ=cov(Hij(1),Hij(2))\rho=cov(H_{ij}^{(1)}, H_{ij}^{(2)}), ρ11=cov(Hij(1),Hit(1))\rho_{11}=cov(H_{ij}^{(1)}, H_{it}^{(1)}), ρ22=cov(Hij(2),Hit(2))\rho_{22}=cov(H_{ij}^{(2)}, H_{it}^{(2)}), and ρ12=cov(Hij(1),Hit(2))\rho_{12}=cov(H_{ij}^{(1)}, H_{it}^{(2)})

E.diff.Ha

expectation of the difference under the alternative hypothesis.

se.diff.Ha

standard error of the difference under the alternative hypothesis.

CIlow.diff

Lower confidence limit.

CIup.diff

Upper confidence limit.

Author(s)

Bernard Rosner <[email protected]>, Weiliang Qiu <[email protected]>, Meiling Ting Lee <[email protected]>

References

Rosner B, Qiu W, and Lee MLT. Assessing Discrimination of Risk Prediction Rules in a Clustered Data Setting. Lifetime Data Anal. 2013 Apr; 19(2): 242-256.

Examples

set.seed(1234567)
datFrame = genSimDataGLMEM(nSubj = 30, beta0 = -6, sd.beta0i = 1.58, 
                          beta1 = 1.58, beta2 = -3.95, beta3 = 3.15, beta4 = 2.06,
                          beta5 = 0.51, beta6 = 1.47, beta7 = 3.11, 
                          p.smkcur = 0.08, p.inieye31 = 0.44, p.inieye32 = 0.42,
                          p.inieye41 = 0.12, p.inieye42 = 0.11, sd.lncalorc = 0.33)

print(dim(datFrame))
print(datFrame[1:2,])

# prediction rule 1
tt1 = getScore(fmla = prog~smkcur+lncalorc+inieye3+inieye4+factor(rtotfat), 
  cidVar = "cid", subuidVar = "subuid", statusVar = "prog", 
  datFrame = datFrame, mycorstr = "exchangeable",
  verbose = FALSE)
myframe1=tt1$frame

print(dim(myframe1))
print(myframe1[1:3,])

####
# prediction rule 2
tt2 = getScore(fmla = prog~smkcur+lncalorc+inieye3+inieye4, 
  cidVar = "cid", subuidVar = "subuid", statusVar = "prog", 
  datFrame = datFrame, mycorstr = "exchangeable",
  verbose = FALSE)
myframe2=tt2$frame

print(dim(myframe2))
print(myframe2[1:3,])

# combine scores from two prediction rules
myframe12=myframe1[, c("cid", "subuid", "status")]
myframe12$score1=myframe1$score
myframe12$score2=myframe2$score
print(dim(myframe12))
print(myframe12[1:3,])

####
resDiff = riskPredictDiff(frame=myframe12)
print(names(resDiff))
print(resDiff)