Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Biostats-Practicum/BIST5092_LogRegCode.Rmd
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
957 lines (819 sloc)
45.3 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "BIST 5092 Outliers" | |
author: "Christine Guerette" | |
date: "8/5/2022" | |
output: | |
pdf_document: default | |
word_document: default | |
number_sections: true | |
--- | |
\section{Outliers} | |
First, we are only interested in three variables, all categorical. So, technically no outliers in this set of data. The one thing we need to worry about is calibration and discrimination of the model. I.e., we need to make sure that the observed and predicted probabilities based on the model are reasonably close and that the distribution of the predicted probabilities for success (being a responder) and failure (being a non-responder) are easily distinguishable from each other. So for testing calibration we can use a couple of tests (Pearson chi-square GoF test, Hosmer-Lemeshow GoF test, and deviance GoF test) and we can assess the discrimination of the model using the c-stat (concordance statistic). However, let's check our data to see if there are any noteworthy discrepancies in some continuous variables. | |
```{r} | |
getwd() | |
setwd("C:/Users/chrgu/OneDrive/Desktop/BIST 5092/Background materials and data for the course project/Background materials and data for the course project/Data/Phase3-data/data_phase 3") | |
#load in data | |
ogADPA.dat <- read.csv(file="ADPA.csv", sep = ",") | |
names(ogADPA.dat) | |
#does not look like there any variables like age, gender, or weight in OG ADPA | |
#dataset so let's look at ADSL | |
ogADSL.dat <- read.csv(file="ADSL.csv", sep = ",") | |
summary(ogADSL.dat) | |
hist(ogADSL.dat$WEIGHT) #in kg | |
box.wt <- boxplot(ogADSL.dat$WEIGHT) | |
box.wt$out #some of these weights are super weird and very low, i.e., this person | |
#would be super underweight or overweight; overweight in my opinion does not seem | |
#as intense as how underweight some of these participants appear to be. However, | |
#this is simulated data so how far can we really go with that kind of speculation? | |
wt.out <- subset(ogADSL.dat, ogADSL.dat$AGE <= 17) | |
summary(ogADSL.dat$AGE) | |
hist(ogADSL.dat$AGE) | |
box.age <- boxplot(ogADSL.dat$AGE) | |
box.age$out #children should most likely not be participating: 9, 8, 11, 5, 10, 3 are all noted | |
#again, we have simulated data so is that not the nature of simulation? | |
#how should we approach this data? | |
#Checking how many participants are listed as younger than 18 | |
young.17 <- subset(ogADSL.dat, ogADSL.dat$AGE <= 17) | |
dim(young.17) #37 participants are younger than 18 so by Inclusion/Exclusion criteria, | |
#these participants would not be included in the study (pg 27 of protocol) | |
summary(young.17$AGE) | |
#there is no inclusion/exclusion criteria regarding weight so how do we approach those outliers? | |
old.75 <- subset(ogADSL.dat, ogADSL.dat$AGE > 75) | |
dim(old.75) #37 participants are younger than 18 so by Inclusion/Exclusion criteria, | |
#these participants would not be included in the study (pg 27 of protocol) | |
summary(old.75$AGE) | |
length(old.75$AGE) | |
#there is no inclusion/exclusion criteria regarding weight so how do we approach those outliers? | |
``` | |
\section{Checking Assumptions} | |
Since there are no real outliers we can check for in PASI75 response, sex, and treatment, let's take a look at the assumptions of the logistic regression model and see if they are verified. We need to check independence and check if the log odds of the binary of PASI75 response is linear and double check if there are other assumptions | |
\section{Original Dataset: Part C for All} | |
```{r} | |
# we're using the ITT principle so make sure to use TRT01P for treatment as planned | |
#also note that the binary PASI75 response is the PCHGCA1N variable | |
#we need to pare down the datasets such that we have one row of data per person | |
#which will include but is not limited to sex, treatment arm assigned, and if they are recorded as a PASI75 responder | |
#note that we can assume independence now that our data is pared down. | |
#However, have repeated measures with all visits and thus dependent data | |
library(car) | |
library(pROC) | |
#let's look at whole dataset (PART C for assignment) | |
setwd("C:/Users/chrgu/OneDrive/Documents/GitHub/Biostats-Practicum") | |
ogADPAS.dat <- read.csv(file="ADPA + SEX.csv", sep = ",") | |
ogADPAS.v6dat <- subset(ogADPAS.dat, ogADPAS.dat$AVISITN==6) | |
ogADPAS.respdat <- subset(ogADPAS.v6dat, ogADPAS.v6dat$PARAMN==10) | |
ogADPAS.v2dat <- subset(ogADPAS.dat, ogADPAS.dat$AVISITN==2) | |
summary(ogADPAS.v2dat) | |
#let's set up our model | |
ogADPAS.respdat$PCHGCA1N <- as.factor(ogADPAS.respdat$PCHGCA1N) | |
ogADPAS.respdat$SEXN <- as.factor(ogADPAS.respdat$SEXN) | |
ogADPAS.respdat$TRTPN <- as.factor(ogADPAS.respdat$TRTPN) | |
OGdata.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=ogADPAS.respdat) | |
#placebo and male are the reference groups | |
summary(OGdata.mod) | |
car::vif(OGdata.mod) #doesn't make too much sense to look at VIF values as they | |
#are based on correlation and our vars of interest are categorical | |
#check predictive power | |
roc(ogADPAS.respdat$PCHGCA1N, fitted(OGdata.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE) | |
#pretty decent AUC value = C-statistic -- decent predictive power | |
#so with an AUC of .78, we have acceptable discrimination | |
#let's look at deviance residuals | |
devres <- residuals(OGdata.mod,"deviance") | |
ogADPAS.respdat$phat.comp <- OGdata.mod$fitted.values | |
plot(ogADPAS.respdat$phat.comp, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals") | |
abline(h=c(-2,2)) | |
#many repeat cases? so should look at another measure to evaluate model fit | |
#we have quite a few repeat cases as we only have a few levels so we can use the | |
#Deviance GoF test to check calibration | |
#check model fit and assess calibration | |
devdif.po <- OGdata.mod$null.deviance - OGdata.mod$deviance | |
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1) | |
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) | |
cbind(devdif.po,df.po,pval.po) | |
#we reject null and conclude that the model does NOT fit the data well | |
#logistic response function is NOT appropriate for this dataset | |
#LET"S CHECK if we can remove sex from the model and see if that helps with the goodness of fit at all | |
red.mod <- glm(PCHGCA1N ~ TRTPN, family=binomial, data=ogADPAS.respdat) | |
summary(red.mod) | |
G2 <- red.mod$deviance - OGdata.mod$deviance | |
dfs <- red.mod$df.residual - OGdata.mod$df.residual | |
qchisq(.95, dfs) | |
G2 #G2 = 2.436 < 3.841 so fail to reject H_0: coeff of SEXN2 = 0 | |
#if we use alpha = .04 | |
qchisq(.96, dfs) | |
G2 #same conclusion as it is more conservative that alpha=.05 | |
``` | |
\subsection{Identifying potential influential cases} | |
prediction: placebo will be very influential based on how few were responders | |
```{r} | |
#Get hat diagonals | |
hii <- influence(OGdata.mod)$hat | |
#calculate 2p/n, p=2+1=3, n=1831 | |
lowbnd <- (2*3)/1831 | |
x.outl <- subset(hii, hii >= lowbnd) | |
names(x.outl) | |
length(x.outl) | |
out.CooksD <- cooks.distance(OGdata.mod) | |
out.dfits <- dffits(OGdata.mod) | |
out.dbetas <- dfbetas(OGdata.mod) | |
#Rule of thumb for DFFITs and identifying outliers | |
bndfits <- (2*sqrt(3/1831)) | |
x.outl.fits <- subset(out.dfits, abs(out.dfits) >= bndfits) | |
names(x.outl.fits) | |
length(x.outl.fits) | |
#Rule of thumb for DFBETAs and identifying outliers | |
bndbet <- 2/sqrt(1831) | |
#for intercept | |
int.dfb <- out.dbetas[,1] | |
x.outl.intdb <- subset(int.dfb, abs(int.dfb) >= bndbet) | |
names(x.outl.intdb) | |
length(x.outl.intdb) | |
#for SEXN=2 (female) | |
x1.dfb <- out.dbetas[,2] | |
x1.outl <- subset(x1.dfb, abs(x1.dfb) >= bndbet) | |
names(x1.outl) | |
length(x1.outl) | |
#for TRT=2 (active) | |
x2.dfb <- out.dbetas[,3] | |
x2.outl <- subset(x2.dfb, abs(x2.dfb) >= bndbet) | |
names(x2.outl) | |
length(x2.outl) | |
#for TRT=3 (test 140mg) | |
x3.dfb <- out.dbetas[,4] | |
x3.outl <- subset(x3.dfb, abs(x3.dfb) >= bndbet) | |
names(x3.outl) | |
length(x3.outl) | |
#for TRT=4 (test 210) | |
x4.dfb <- out.dbetas[,5] | |
x4.outl <- subset(x4.dfb, abs(x4.dfb) >= bndbet) | |
names(x4.outl) | |
length(x4.outl) | |
#Rule of Thumb for Cook's Distance | |
cdbnd <- .8 | |
cd.outl <- subset(out.CooksD, out.CooksD >= cdbnd) | |
names(cd.outl) | |
``` | |
\section{10\% MCAR 2A, 2B, 2C} | |
```{r} | |
#read in data and pare down to visit 6 | |
setwd("C:/Users/chrgu/OneDrive/Documents/GitHub/Biostats-Practicum") | |
MCAR10.dat <- read.csv(file="MCAR10%.csv", sep = ",", na.strings=c('',NA)) | |
MCAR10.v6dat <- subset(MCAR10.dat, MCAR10.dat$AVISITN==6) | |
MCAR10.respdat <- subset(MCAR10.v6dat, MCAR10.v6dat$PARAMN==10) | |
#let's clean the data | |
MCAR10.respdat$PCHGCA1N <- as.factor(MCAR10.respdat$PCHGCA1N) | |
MCAR10.respdat$SEXN <- as.factor(MCAR10.respdat$SEXN) | |
MCAR10.respdat$TRTPN <- as.factor(MCAR10.respdat$TRTPN) | |
``` | |
\subsection{Part A} | |
```{r} | |
#using AVAL as an indicator of missingness, create a completer dataset | |
compMCAR10.dat <- subset(MCAR10.respdat, !is.na(MCAR10.respdat$AVAL)) | |
#time to set up model and then evaluate | |
compMCAR10.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=compMCAR10.dat) | |
#placebo and male are the reference groups | |
summary(compMCAR10.mod) | |
#car::vif(compMCAR10.mod) #doesn't make too much sense to look at VIF values as they | |
#are based on correlation and our vars of interest are categorical | |
#check predictive power | |
roc(compMCAR10.dat$PCHGCA1N, fitted(compMCAR10.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE) | |
#pretty decent AUC value = C-statistic -- decent predictive power | |
#so with an AUC of .777, we have acceptable discrimination | |
#let's look at deviance residuals | |
devres <- residuals(compMCAR10.mod,"deviance") | |
compMCAR10.dat$phat.10cmpl <- compMCAR10.mod$fitted.values | |
plot(compMCAR10.dat$phat.10cmpl, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals") | |
abline(h=c(-2,2)) | |
#many repeat cases? so should look at another measure to evaluate model fit | |
#we have quite a few repeat cases as we only have a few levels so we can use the | |
#Deviance GoF test to check calibration | |
#check model fit and assess calibration | |
devdif.po <- compMCAR10.mod$null.deviance - compMCAR10.mod$deviance | |
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1) | |
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) | |
cbind(devdif.po,df.po,pval.po) | |
#we reject null and conclude that the model does NOT fit the data well | |
#logistic response function is NOT appropriate for this dataset | |
``` | |
\subsection{Part B} | |
```{r} | |
#create imputed dataset | |
impMCAR10.dat <- MCAR10.respdat | |
i <- 1 | |
for(i in 1:length(impMCAR10.dat$USUBJID)){ | |
if(is.na(MCAR10.respdat$AVAL[i])==TRUE){ | |
impMCAR10.dat$PCHGCA1N[i] <- 0 | |
i <- i+1 | |
} | |
} | |
#time to set up model and then evaluate | |
impMCAR10.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=impMCAR10.dat) | |
#placebo and male are the reference groups | |
summary(impMCAR10.mod) | |
#car::vif(impMCAR10.mod) #doesn't make too much sense to look at VIF values as they | |
#are based on correlation and our vars of interest are categorical | |
#check predictive power | |
roc(impMCAR10.dat$PCHGCA1N, fitted(impMCAR10.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE) | |
#pretty decent AUC value = C-statistic -- decent predictive power | |
#so with an AUC of .741, we have acceptable discrimination | |
#let's look at deviance residuals | |
devres <- residuals(impMCAR10.mod,"deviance") | |
impMCAR10.dat$phat.10imp <- impMCAR10.mod$fitted.values | |
plot(impMCAR10.dat$phat.10imp, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals") | |
abline(h=c(-2,2)) | |
#many repeat cases? so should look at another measure to evaluate model fit | |
#we have quite a few repeat cases as we only have a few levels so we can use the | |
#Deviance GoF test to check calibration | |
#check model fit and assess calibration | |
devdif.po <- impMCAR10.mod$null.deviance - impMCAR10.mod$deviance | |
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1) | |
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) | |
cbind(devdif.po,df.po,pval.po) | |
#we reject null and conclude that the model does NOT fit the data well | |
#logistic response function is NOT appropriate for this dataset | |
``` | |
\section{20\% MCAR 2C} | |
refer to first section about original dataset | |
\subsection{Logistic Regression Model} | |
```{r} | |
#MCAR 20: read in data and clean | |
MCAR20data <- read.csv(file="MCAR20%.csv", sep = ",", na.strings=c('',NA)) | |
MCAR20data$PCHGCA1N <- as.factor(MCAR20data$PCHGCA1N) | |
MCAR20data$SEXN <- as.factor(MCAR20data$SEXN) | |
MCAR20data$TRTPN <- as.factor(MCAR20data$TRTPN) | |
MCAR20data.v6dat <- subset(MCAR20data, MCAR20data$AVISITN==6) | |
MCAR20data.respdat <- subset(MCAR20data.v6dat, MCAR20data.v6dat$PARAMN==10) | |
#COMPLETERS | |
#using AVAL as an indicator of missingness, create a completer dataset | |
compMCAR20.dat <- subset(MCAR20data.respdat, !is.na(MCAR20data.respdat$AVAL)) | |
#time to set up model and then evaluate | |
compMCAR20.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=compMCAR20.dat) | |
compMCAR20.dat$phat.20cmpl <- compMCAR20.mod$fitted.values | |
summary(compMCAR20.mod) | |
#check predictive power | |
roc(compMCAR20.dat$PCHGCA1N, fitted(compMCAR20.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE) | |
#pretty decent AUC value = C-statistic -- decent predictive power | |
#so with an AUC of .78, we have acceptable discrimination | |
#let's look at deviance residuals | |
devres <- residuals(compMCAR20.mod,"deviance") | |
plot(compMCAR20.dat$phat.20cmpl, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals") | |
abline(h=c(-2,2)) | |
#many repeat cases? so should look at another measure to evaluate model fit | |
#we have quite a few repeat cases as we only have a few levels so we can use the | |
#Deviance GoF test to check calibration | |
#check model fit and assess calibration | |
devdif.po <- compMCAR20.mod$null.deviance - compMCAR20.mod$deviance | |
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1) | |
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) | |
cbind(devdif.po,df.po,pval.po) | |
#we reject null and conclude that the model does NOT fit the data well | |
#logistic response function is NOT appropriate for this dataset | |
#IMPUTED AS NON-RESPONDER | |
#create imputed dataset | |
impMCAR20.dat <- MCAR20data.respdat | |
i <- 1 | |
for(i in 1:length(impMCAR20.dat$USUBJID)){ | |
if(is.na(MCAR20data.respdat$AVAL[i])==TRUE){ | |
impMCAR20.dat$PCHGCA1N[i] <- 0 | |
i <- i+1 | |
} | |
} | |
#time to set up model and then evaluate | |
impMCAR20.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=impMCAR20.dat) | |
impMCAR20.dat$phat.20imp <- impMCAR20.mod$fitted.values | |
summary(impMCAR20.mod) | |
#check predictive power | |
roc(impMCAR20.dat$PCHGCA1N, fitted(impMCAR20.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE) | |
#pretty decent AUC value = C-statistic -- decent predictive power | |
#so with an AUC of .713, we have somewhat acceptable discrimination | |
#let's look at deviance residuals | |
devres <- residuals(impMCAR20.mod,"deviance") | |
plot(impMCAR20.dat$phat.20imp, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals") | |
abline(h=c(-2,2)) | |
#many repeat cases? so should look at another measure to evaluate model fit | |
#we have quite a few repeat cases as we only have a few levels so we can use the | |
#Deviance GoF test to check calibration | |
#check model fit and assess calibration | |
devdif.po <- impMCAR20.mod$null.deviance - impMCAR20.mod$deviance | |
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1) | |
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) | |
cbind(devdif.po,df.po,pval.po) | |
#we reject null and conclude that the model does NOT fit the data well | |
#logistic response function is NOT appropriate for this dataset | |
``` | |
\section{30\% MCAR 2C} | |
refer to first section about original dataset | |
\subsection{Logisitic Regression Model} | |
```{r} | |
#MCAR 30: read in data and clean | |
MCAR30data <- read.csv(file="MCAR30%.csv", sep = ",", na.strings=c('',NA)) | |
MCAR30data$PCHGCA1N <- as.factor(MCAR30data$PCHGCA1N) | |
MCAR30data$SEXN <- as.factor(MCAR30data$SEXN) | |
MCAR30data$TRTPN <- as.factor(MCAR30data$TRTPN) | |
MCAR30data.v6dat <- subset(MCAR30data, MCAR30data$AVISITN==6) | |
MCAR30data.respdat <- subset(MCAR30data.v6dat, MCAR30data.v6dat$PARAMN==10) | |
#COMPLETERS | |
#using AVAL as an indicator of missingness, create a completer dataset | |
compMCAR30.dat <- subset(MCAR30data.respdat, !is.na(MCAR30data.respdat$AVAL)) | |
#time to set up model and then evaluate | |
compMCAR30.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=compMCAR30.dat) | |
compMCAR30.dat$phat.30cmpl <- compMCAR30.mod$fitted.values | |
summary(compMCAR30.mod) | |
#check predictive power | |
roc(compMCAR30.dat$PCHGCA1N, fitted(compMCAR30.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE) | |
#pretty decent AUC value = C-statistic -- decent predictive power | |
#so with an AUC of .787, we have acceptable discrimination | |
#let's look at deviance residuals | |
devres <- residuals(compMCAR30.mod,"deviance") | |
plot(compMCAR30.dat$phat.30cmpl, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals") | |
abline(h=c(-2,2)) | |
#many repeat cases? so should look at another measure to evaluate model fit | |
#we have quite a few repeat cases as we only have a few levels so we can use the | |
#Deviance GoF test to check calibration | |
#check model fit and assess calibration | |
devdif.po <- compMCAR30.mod$null.deviance - compMCAR30.mod$deviance | |
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1) | |
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) | |
cbind(devdif.po,df.po,pval.po) | |
#we reject null and conclude that the model does NOT fit the data well | |
#logistic response function is NOT appropriate for this dataset | |
#IMPUTED AS NON-RESPONDER | |
#create imputed dataset | |
impMCAR30.dat <- MCAR30data.respdat | |
i <- 1 | |
for(i in 1:length(impMCAR30.dat$USUBJID)){ | |
if(is.na(MCAR30data.respdat$AVAL[i])==TRUE){ | |
impMCAR30.dat$PCHGCA1N[i] <- 0 | |
i <- i+1 | |
} | |
} | |
#time to set up model and then evaluate | |
impMCAR30.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=impMCAR30.dat) | |
impMCAR30.dat$phat.30imp <- impMCAR30.mod$fitted.values | |
summary(impMCAR30.mod) | |
#check predictive power | |
roc(impMCAR30.dat$PCHGCA1N, fitted(impMCAR30.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE) | |
#pretty decent AUC value = C-statistic -- decent predictive power | |
#so with an AUC of .699, we have borderline acceptable discrimination | |
#let's look at deviance residuals | |
devres <- residuals(impMCAR30.mod,"deviance") | |
plot(impMCAR30.dat$phat.30imp, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals") | |
abline(h=c(-2,2)) | |
#many repeat cases? so should look at another measure to evaluate model fit | |
#we have quite a few repeat cases as we only have a few levels so we can use the | |
#Deviance GoF test to check calibration | |
#check model fit and assess calibration | |
devdif.po <- impMCAR30.mod$null.deviance - impMCAR30.mod$deviance | |
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1) | |
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) | |
cbind(devdif.po,df.po,pval.po) | |
#we reject null and conclude that the model does NOT fit the data well | |
#logistic response function is NOT appropriate for this dataset | |
``` | |
\section{MAR 2C} | |
refer to first section about original dataset | |
```{r} | |
MARdata <- read.csv(file="MAR_Data_V1.csv", sep = ",", na.strings=c('',NA)) | |
MARdata$PCHGCA1N <- as.factor(MARdata$PCHGCA1N) | |
MARdata$SEXN <- as.factor(MARdata$SEXN) | |
MARdata$TRTPN <- as.factor(MARdata$TRTPN) | |
MARdata.v6dat <- subset(MARdata, MARdata$AVISITN==6) | |
MARdata.respdat <- subset(MARdata.v6dat, MARdata.v6dat$PARAMN==10) | |
#COMPLETERS | |
#using AVAL as an indicator of missingness, create a completer dataset | |
compMAR.dat <- subset(MARdata.respdat, !is.na(MARdata.respdat$AVAL)) | |
#time to set up model and then evaluate | |
compMAR.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=compMAR.dat) | |
compMAR.dat$phat.cmpl <- compMAR.mod$fitted.values | |
summary(compMAR.mod) | |
#check predictive power | |
roc(compMAR.dat$PCHGCA1N, fitted(compMAR.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE) | |
#pretty decent AUC value = C-statistic -- decent predictive power | |
#so with an AUC of .712, we have somewhat acceptable discrimination | |
#let's look at deviance residuals | |
devres <- residuals(compMAR.mod,"deviance") | |
plot(compMAR.dat$phat.cmpl, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals") | |
abline(h=c(-2,2)) | |
#many repeat cases? so should look at another measure to evaluate model fit | |
#we have quite a few repeat cases as we only have a few levels so we can use the | |
#Deviance GoF test to check calibration | |
#check model fit and assess calibration | |
devdif.po <- compMAR.mod$null.deviance - compMAR.mod$deviance | |
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1) | |
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) | |
cbind(devdif.po,df.po,pval.po) | |
#we reject null and conclude that the model does NOT fit the data well | |
#logistic response function is NOT appropriate for this dataset | |
#IMPUTED AS NON-RESPONDER | |
#create imputed dataset | |
impMAR.dat <- MARdata.respdat | |
i <- 1 | |
for(i in 1:length(impMAR.dat$USUBJID)){ | |
if(is.na(MARdata.respdat$AVAL[i])==TRUE){ | |
impMAR.dat$PCHGCA1N[i] <- 0 | |
i <- i+1 | |
} | |
} | |
#time to set up model and then evaluate | |
impMAR.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=impMAR.dat) | |
impMAR.dat$phat.imp <- impMAR.mod$fitted.values | |
summary(impMAR.mod) | |
#check predictive power | |
roc(impMAR.dat$PCHGCA1N, fitted(impMAR.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE) | |
#pretty decent AUC value = C-statistic -- decent predictive power | |
#so with an AUC of .721, we have acceptable discrimination -- BETTER THAN COMPLETER | |
#let's look at deviance residuals | |
devres <- residuals(impMAR.mod,"deviance") | |
plot(impMAR.dat$phat.imp, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals") | |
abline(h=c(-2,2)) | |
#many repeat cases? so should look at another measure to evaluate model fit | |
#we have quite a few repeat cases as we only have a few levels so we can use the | |
#Deviance GoF test to check calibration | |
#check model fit and assess calibration | |
devdif.po <- impMAR.mod$null.deviance - impMAR.mod$deviance | |
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1) | |
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) | |
cbind(devdif.po,df.po,pval.po) | |
#we reject null and conclude that the model does NOT fit the data well | |
#logistic response function is NOT appropriate for this dataset | |
``` | |
\section{Graphs} | |
\section{Initial General Notes and Comparisons} | |
Across these three models (complete dataset completer 10% MCAR, imputed 10% MCAR), logistic regression does not seem to be the right choice of analysis as there are only categorical variables and few levels. Maybe a different analysis would be best. Want to look at the difference in response rates based on treatment across sex: ANOVA? but that can analogously explored using regression | |
```{r} | |
#to get a feel for how the predicted probabilities look based on the model | |
library(ggplot2) | |
ggplot() + | |
geom_freqpoly(data=ogADPAS.respdat, aes(x=phat.comp), color="chartreuse") + | |
geom_freqpoly(data=compMCAR10.dat, aes(x=phat.10cmpl), color="coral") + | |
geom_freqpoly(data=impMCAR10.dat, aes(x=phat.10imp), color="darkorchid") + | |
theme_classic() | |
ggplot() + | |
geom_point(data=ogADPAS.respdat, aes(x=phat.comp, y=TRTPN, shape=SEXN), color="chartreuse") + | |
geom_point(data=compMCAR10.dat, aes(x=phat.10cmpl, y=TRTPN, shape=SEXN), color="coral") + | |
geom_point(data=impMCAR10.dat, aes(x=phat.10imp, y=TRTPN, shape=SEXN), color="darkorchid") + | |
theme_classic() | |
ggplot() + | |
geom_line(data=ogADPAS.respdat, aes(x=TRTPN, y=phat.comp, group=SEXN, linetype=SEXN), color="chartreuse") + #whole data | |
geom_line(data=compMCAR10.dat, aes(x=TRTPN, y=phat.10cmpl, group=SEXN, linetype=SEXN), color="coral") + #completer data | |
geom_line(data=impMCAR10.dat, aes(x=TRTPN, y=phat.10imp, group=SEXN, linetype=SEXN), color="darkorchid") + #imputed as non-responder data | |
theme_classic() | |
``` | |
\subsection{Population Graphs} | |
```{r} | |
library(ggplot2) | |
ggplot()+ | |
scale_fill_brewer(palette="Dark2")+ | |
geom_bar(data=ogADPAS.respdat, aes(x=TRTPN, fill=SEXN), position="dodge", color="black") + | |
geom_bar(data=compMCAR10.dat, aes(x=TRTPN, fill=SEXN), position = "dodge", color = "limegreen") + | |
#geom_bar(data=impMCAR10.dat, aes(x=TRTPN, fill=SEXN), position = "dodge", color = "limegreen") + | |
geom_bar(data=compMCAR20.dat, aes(x=TRTPN, fill=SEXN), position = "dodge", color = "blue") + | |
#geom_bar(data=impMCAR20.dat, aes(x=TRTPN, fill=SEXN), position = "dodge", color = "blue") + | |
geom_bar(data=compMCAR30.dat, aes(x=TRTPN, fill=SEXN), position = "dodge", color = "mediumorchid") + | |
#geom_bar(data=impMCAR30.dat, aes(x=TRTPN, fill=SEXN), position = "dodge", color = "mediumorchid") + | |
geom_bar(data=compMAR.dat, aes(x=TRTPN, fill=SEXN), position = "dodge", color = "red") + | |
#geom_bar(data=impMAR.dat, aes(x=TRTPN, fill=SEXN), position = "dodge", color = "red") | |
theme_classic() | |
ggplot() + | |
geom_freqpoly(data=ogADPAS.respdat, aes(x=TRTPN, color = SEXN), stat="count") + | |
geom_freqpoly(data=compMCAR10.dat, aes(x=phat.10cmpl), color="coral") + | |
geom_freqpoly(data=impMCAR10.dat, aes(x=phat.10imp), color="darkorchid") + | |
theme_classic() | |
#FEMALE GRAPH | |
ggplot() + | |
scale_fill_brewer(palette="Set2")+ | |
geom_bar(data=subset(ogADPAS.respdat, ogADPAS.respdat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N), position="dodge", color="black") + #whole data | |
geom_bar(data=subset(compMAR.dat, compMAR.dat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N), position="dodge", color="red") + #completer data | |
geom_bar(data=subset(impMAR.dat, impMAR.dat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N), position="dodge", color="red", linetype="dotted") + #imputed as non-responder data | |
geom_bar(data=subset(compMCAR10.dat, compMCAR10.dat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N), position="dodge", color="limegreen") + #completer data | |
geom_bar(data=subset(impMCAR10.dat, impMCAR10.dat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N), position="dodge", color="limegreen", linetype="dotted") + #imputed as non-responder data | |
geom_bar(data=subset(compMCAR20.dat, compMCAR20.dat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N), position="dodge", color="blue") + #completer data | |
geom_bar(data=subset(impMCAR20.dat, impMCAR20.dat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N), position="dodge", color="blue", linetype="dotted") + #imputed as non-responder data | |
geom_bar(data=subset(compMCAR30.dat, compMCAR30.dat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N), position="dodge", color="mediumorchid") + #completer data | |
geom_bar(data=subset(impMCAR30.dat, impMCAR30.dat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N), position="dodge", color="mediumorchid", linetype="dotted") + #imputed as non-responder data | |
theme_classic() | |
tab1 <- table(ogADPAS.respdat$PCHGCA1N, ogADPAS.respdat$TRTPN, ogADPAS.respdat$SEXN) | |
#PCHGCA1N: 0, non-responder; 1, responder TRTPN: 1, placebo; 2 active control (ustek); | |
#3, 140mg test; 4, 210 test SEXN: 1, male; 2, female, 3, unknown (no 3s appear) | |
tab2 <- table(compMCAR10.dat$SEXN, compMCAR10.dat$TRTPN, compMCAR10.dat$PCHGCA1N) | |
tab3 <- table(impMCAR10.dat$SEXN, impMCAR10.dat$TRTPN, impMCAR10.dat$PCHGCA1N) | |
tab4 <- table(compMCAR20.dat$SEXN, compMCAR20.dat$TRTPN, compMCAR20.dat$PCHGCA1N) | |
tab5 <- table(impMCAR20.dat$SEXN, impMCAR20.dat$TRTPN, impMCAR20.dat$PCHGCA1N) | |
tab6 <- table(compMCAR30.dat$SEXN, compMCAR30.dat$TRTPN, compMCAR30.dat$PCHGCA1N) | |
tab7 <- table(impMCAR30.dat$SEXN, impMCAR30.dat$TRTPN, impMCAR30.dat$PCHGCA1N) | |
tab8 <- table(compMAR.dat$SEXN, compMAR.dat$TRTPN, compMAR.dat$PCHGCA1N) | |
tab9 <- table(impMAR.dat$SEXN, impMAR.dat$TRTPN, impMAR.dat$PCHGCA1N) | |
tab1 #og data | |
tab2 #completer MCAR10 | |
tab3 #inputed MCAR10 | |
tab4 #completer MCAR20 | |
tab5 #imputed MCAR20 | |
tab6 #completer MCAR30 | |
tab7 #imputed MCAR30 | |
tab8 #completer MAR | |
tab9 #imputed MAR | |
summary(ogADPAS.respdat$PCHGCA1N) | |
summary(compMCAR10.dat$PCHGCA1N) | |
summary(impMCAR10.dat$PCHGCA1N) | |
summary(compMCAR20.dat$PCHGCA1N) | |
summary(impMCAR20.dat$PCHGCA1N) | |
summary(compMCAR30.dat$PCHGCA1N) | |
summary(impMCAR30.dat$PCHGCA1N) | |
summary(compMAR.dat$PCHGCA1N) | |
summary(impMAR.dat$PCHGCA1N) | |
``` | |
\subsubsection{making tables} | |
\begin{center} | |
\begin{tabular}{lclcccc} | |
\multicolumn{7}{c}{\textbf{Table 1: Responders}} \\ | |
\hline | |
Model & Total & Sex & Placebo & Active & 140mg & 210mg \\ | |
& Responders (\%) & & & Control & Drug X & Drug X \\ | |
\hline | |
Full Data & 1169 & Male & 15 & 146 & 283 & 376 \\ | |
& (\textit{63.8}) & Female & 10 & 64 & 123 & 152 \\ | |
\hline | |
C. MCAR10 & 1058 & Male & 14 & 130 & 254 & 344 \\ | |
& (\textit{64.2}) & Female & 10 & 59 & 110 & 137 \\ | |
\hline | |
I. MCAR10 & 1058 & Male & 14 & 130 & 254 & 344 \\ | |
& (\textit{57.8}) & Female & 10 & 59 & 110 & 137 \\ | |
\hline | |
C. MCAR20 & 951 & Male & 11 & 118 & 233 & 307 \\ | |
& (\textit{64.9}) & Female & 9 & 48 & 100 & 125 \\ | |
\hline | |
I. MCAR20 & 951 & Male & 11 & 118 & 233 & 307 \\ | |
& (\textit{51.9}) & Female & 9 & 48 & 100 & 125 \\ | |
\hline | |
C. MCAR30 & 840 & Male & 11 & 99 & 201 & 278 \\ | |
& (\textit{65.5}) & Female & 7 & 42 & 87 & 115 \\ | |
\hline | |
I. MCAR30 & 840 & Male & 11 & 99 & 201 & 278 \\ | |
& (\textit{45.9}) & Female & 7 & 42 & 87 & 115 \\ | |
\hline | |
C. MAR & 990 & Male & 14 & 128 & 233 & 318 \\ | |
& (\textit{70.6}) & Female & 7 & 54 & 107 & 129 \\ | |
\hline | |
I. MAR & 990 & Male & 14 & 128 & 233 & 318 \\ | |
& (\textit{54.1}) & Female & 7 & 54 & 107 & 129 \\ | |
\hline | |
\end{tabular} | |
\end{center} | |
\begin{center} | |
\begin{tabular}{lclcccc} | |
\multicolumn{7}{c}{\textbf{Table 2: Non-Responders}} \\ | |
\hline | |
Model & Total Non- & Sex & Placebo & Active & 140mg & 210mg \\ | |
& Responders (\%) & & & Control & Drug X & Drug X \\ | |
\hline | |
Full Data & 662 & Male & 182 & 61 & 126 & 58 \\ | |
& (\textit{36.2}) & Female & 102 & 29 & 78 & 26 \\ | |
\hline | |
C. MCAR10 & 590 & Male & 162 & 55 & 113 & 53 \\ | |
& (\textit{35.8}) & Female & 91 & 26 & 66 & 24 \\ | |
\hline | |
I. MCAR10 & 773 & Male & 183 & 77 & 155 & 90 \\ | |
& (\textit{42.2}) & Female & 102 & 34 & 91 & 41 \\ | |
\hline | |
C. MCAR20 & 514 & Male & 140 & 46 & 98 & 47 \\ | |
& (\textit{35.1}) & Female & 82 & 23 & 59 & 19 \\ | |
\hline | |
I. MCAR20 & 880 & Male & 186 & 89 & 176 & 127 \\ | |
& (\textit{48.1}) & Female & 103 & 45 & 101 & 53 \\ | |
\hline | |
C. MCAR30 & 442 & Male & 125 & 43 & 79 & 38 \\ | |
& (\textit{34.5}) & Female & 70 & 20 & 49 & 18 \\ | |
\hline | |
I. MCAR30 & 991 & Male & 186 & 108 & 208 & 156 \\ | |
& (\textit{54.1}) & Female & 105 & 51 & 114 & 63 \\ | |
\hline | |
C. MAR & 412 & Male & 67 & 51 & 99 & 49 \\ | |
& (\textit{29.4}) & Female & 38 & 23 & 63 & 22 \\ | |
\hline | |
I. MAR & 841 & Male & 183 & 79 & 176 & 116 \\ | |
& (\textit{45.9}) & Female & 105 & 39 & 94 & 49 \\ | |
\hline | |
\end{tabular} | |
\end{center} | |
\subsection{Refined Graphs} | |
\subsubsection{One Graph per Sex} | |
each predicted probability is based on sex as well as treatment so there are TWO of predicted probabilities per treatment -- can't separate out sex until check if sex can be reliably removed from model | |
```{r} | |
#FEMALE GRAPH | |
ggplot() + | |
geom_line(data=subset(ogADPAS.respdat, ogADPAS.respdat$SEXN == 2), aes(x=TRTPN, y=phat.comp, group=1), color="black") + #whole data | |
geom_line(data=subset(compMAR.dat, compMAR.dat$SEXN == 2), aes(x=TRTPN, y=phat.cmpl, group=1), color="red") + #completer data | |
geom_line(data=subset(impMAR.dat, impMAR.dat$SEXN == 2), aes(x=TRTPN, y=phat.imp, group=1), color="red", linetype="dotted") + #imputed as non-responder data | |
geom_line(data=subset(compMCAR10.dat, compMCAR10.dat$SEXN == 2), aes(x=TRTPN, y=phat.10cmpl, group=1), color="limegreen") + #completer data | |
geom_line(data=subset(impMCAR10.dat, impMCAR10.dat$SEXN == 2), aes(x=TRTPN, y=phat.10imp, group=1), color="limegreen", linetype="dotted") + #imputed as non-responder data | |
geom_line(data=subset(compMCAR20.dat, compMCAR20.dat$SEXN == 2), aes(x=TRTPN, y=phat.20cmpl, group=1), color="blue") + #completer data | |
geom_line(data=subset(impMCAR20.dat, impMCAR20.dat$SEXN == 2), aes(x=TRTPN, y=phat.20imp, group=1), color="blue", linetype="dotted") + #imputed as non-responder data | |
geom_line(data=subset(compMCAR30.dat, compMCAR30.dat$SEXN == 2), aes(x=TRTPN, y=phat.30cmpl, group=1), color="mediumorchid") + #completer data | |
geom_line(data=subset(impMCAR30.dat, impMCAR30.dat$SEXN == 2), aes(x=TRTPN, y=phat.30imp, group=1), color="mediumorchid", linetype="dotted") + #imputed as non-responder data | |
labs(title = "Predicted Probability of Response for Females", x="Treatment Group", y = "Predicted Probability")+ | |
scale_x_discrete(labels=c("1"="Placebo", "2"="Active Control", "3"= "140mg Drug X", "4"= "210mg Drug X")) + | |
theme_classic() + | |
theme(plot.title=element_text(hjust=0.5)) | |
#MALE GRAPH | |
ggplot() + | |
geom_line(data=subset(ogADPAS.respdat, ogADPAS.respdat$SEXN == 1), aes(x=TRTPN, y=phat.comp, group=1), color="black") + #whole data | |
geom_line(data=subset(compMAR.dat, compMAR.dat$SEXN == 1), aes(x=TRTPN, y=phat.cmpl, group=1), color="red") + #completer data | |
geom_line(data=subset(impMAR.dat, impMAR.dat$SEXN == 1), aes(x=TRTPN, y=phat.imp, group=1), color="red", linetype="dotted") + #imputed as non-responder data | |
geom_line(data=subset(compMCAR10.dat, compMCAR10.dat$SEXN == 1), aes(x=TRTPN, y=phat.10cmpl, group=1), color="limegreen") + #completer data | |
geom_line(data=subset(impMCAR10.dat, impMCAR10.dat$SEXN == 1), aes(x=TRTPN, y=phat.10imp, group=1), color="limegreen", linetype="dotted") + #imputed as non-responder data | |
geom_line(data=subset(compMCAR20.dat, compMCAR20.dat$SEXN == 1), aes(x=TRTPN, y=phat.20cmpl, group=1), color="blue") + #completer data | |
geom_line(data=subset(impMCAR20.dat, impMCAR20.dat$SEXN == 1), aes(x=TRTPN, y=phat.20imp, group=1), color="blue", linetype="dotted") + #imputed as non-responder data | |
geom_line(data=subset(compMCAR30.dat, compMCAR30.dat$SEXN == 1), aes(x=TRTPN, y=phat.30cmpl, group=1), color="mediumorchid") + #completer data | |
geom_line(data=subset(impMCAR30.dat, impMCAR30.dat$SEXN == 1), aes(x=TRTPN, y=phat.30imp, group=1), color="mediumorchid", linetype="dotted") + #imputed as non-responder data | |
labs(title = "Predicted Probability of Response for Males", x="Treatment Group", y = "Predicted Probability")+ | |
scale_x_discrete(labels=c("1"="Placebo", "2"="Active Control", "3"= "140mg Drug X", "4"= "210mg Drug X")) + | |
theme_classic() + | |
theme(plot.title=element_text(hjust=0.5)) | |
``` | |
\subsubsection{ALL Information Available} | |
```{r} | |
#ALL INFORMATION | |
ggplot() + | |
geom_line(data=ogADPAS.respdat, aes(x=TRTPN, y=phat.comp, group=1), color="black") + #whole data | |
geom_point(data=ogADPAS.respdat, aes(x=TRTPN, y=phat.comp, shape=SEXN), color="black", size=1) + | |
geom_line(data=compMAR.dat, aes(x=TRTPN, y=phat.cmpl, group=1), color="red") + #completer data | |
geom_point(data=compMAR.dat, aes(x=TRTPN, y=phat.cmpl, shape=SEXN), color="red", size=1) + | |
geom_line(data=impMAR.dat, aes(x=TRTPN, y=phat.imp, group=1), color="red", linetype="dotted") + #imputed as non-responder data | |
geom_point(data=impMAR.dat, aes(x=TRTPN, y=phat.imp, shape=SEXN), color="red", size=1) + | |
geom_line(data=compMCAR10.dat, aes(x=TRTPN, y=phat.10cmpl, group=1), color="limegreen") + #completer data | |
geom_point(data=compMCAR10.dat, aes(x=TRTPN, y=phat.10cmpl, shape=SEXN), color="limegreen", size=1) + | |
geom_line(data=impMCAR10.dat, aes(x=TRTPN, y=phat.10imp, group=1), color="limegreen", linetype="dotted") + #imputed as non-responder data | |
geom_point(data=impMCAR10.dat, aes(x=TRTPN, y=phat.10imp, shape=SEXN), color="limegreen", size=1) + | |
geom_line(data=compMCAR20.dat, aes(x=TRTPN, y=phat.20cmpl, group=1), color="blue") + #completer data | |
geom_point(data=compMCAR20.dat, aes(x=TRTPN, y=phat.20cmpl, shape=SEXN), color="blue", size=1) + | |
geom_line(data=impMCAR20.dat, aes(x=TRTPN, y=phat.20imp, group=1), color="blue", linetype="dotted") + #imputed as non-responder data | |
geom_point(data=impMCAR20.dat, aes(x=TRTPN, y=phat.20imp, shape=SEXN), color="blue", size=1) + | |
geom_line(data=compMCAR30.dat, aes(x=TRTPN, y=phat.30cmpl, group=1), color="mediumorchid") + #completer data | |
geom_point(data=compMCAR30.dat, aes(x=TRTPN, y=phat.30cmpl, shape=SEXN), color="mediumorchid", size=1) + | |
geom_line(data=impMCAR30.dat, aes(x=TRTPN, y=phat.30imp, group=1), color="mediumorchid", linetype="dotted") + #imputed as non-responder data | |
geom_point(data=impMCAR30.dat, aes(x=TRTPN, y=phat.30imp, shape=SEXN), color="mediumorchid", size=1) + | |
labs(title = "Predicted Probability of Response by Sex", x="Treatment Group", y = "Predicted Probability", shape = "Sex")+ | |
scale_x_discrete(labels=c("1"="Placebo", "2"="Active Control", "3"= "140mg Drug X", "4"= "210mg Drug X")) + | |
scale_shape_discrete(labels=c("1"="Male", "2"="Female")) + | |
theme_classic() + | |
theme(plot.title=element_text(hjust=0.5)) | |
``` | |
\subsubsection{Completers Graph} | |
```{r} | |
#COMPLETERS | |
ggplot() + | |
geom_line(data=ogADPAS.respdat, aes(x=TRTPN, y=phat.comp, group=1), color="black") + #whole data | |
geom_point(data=ogADPAS.respdat, aes(x=TRTPN, y=phat.comp, shape=SEXN), color="black") + | |
geom_line(data=compMAR.dat, aes(x=TRTPN, y=phat.cmpl, group=1), color="red") + #completer data | |
geom_point(data=compMAR.dat, aes(x=TRTPN, y=phat.cmpl, shape=SEXN), color="red") + | |
geom_line(data=compMCAR10.dat, aes(x=TRTPN, y=phat.10cmpl, group=1), color="limegreen") + #completer data | |
geom_point(data=compMCAR10.dat, aes(x=TRTPN, y=phat.10cmpl, shape=SEXN), color="limegreen") + | |
geom_line(data=compMCAR20.dat, aes(x=TRTPN, y=phat.20cmpl, group=1), color="blue") + #completer data | |
geom_point(data=compMCAR20.dat, aes(x=TRTPN, y=phat.20cmpl, shape=SEXN), color="blue") + | |
geom_line(data=compMCAR30.dat, aes(x=TRTPN, y=phat.30cmpl, group=1), color="mediumorchid") + #completer data | |
geom_point(data=compMCAR30.dat, aes(x=TRTPN, y=phat.30cmpl, shape=SEXN), color="mediumorchid") + | |
labs(title = "Predicted Probability of Response for Completer Data by Sex", x="Treatment Group", y = "Predicted Probability", shape="Sex")+ | |
scale_x_discrete(labels=c("1"="Placebo", "2"="Active Control", "3"= "140mg Drug X", "4"= "210mg Drug X")) + | |
scale_shape_discrete(labels=c("1"="Male", "2"="Female")) + | |
theme_classic() + | |
theme(plot.title=element_text(hjust=0.5)) | |
``` | |
\subsubsection{Imputed Graph} | |
```{r} | |
#IMPUTED AS NON-RESP | |
ggplot() + | |
geom_line(data=ogADPAS.respdat, aes(x=TRTPN, y=phat.comp, group=1), color="black") + #whole data | |
geom_point(data=ogADPAS.respdat, aes(x=TRTPN, y=phat.comp, shape=SEXN), color="black") + | |
geom_line(data=impMAR.dat, aes(x=TRTPN, y=phat.imp, group=1), color="red", linetype="dotted") + #imputed as non-responder data | |
geom_point(data=impMAR.dat, aes(x=TRTPN, y=phat.imp, shape=SEXN), color="red") + | |
geom_line(data=impMCAR10.dat, aes(x=TRTPN, y=phat.10imp, group=1), color="limegreen", linetype="dotted") + #imputed as non-responder data | |
geom_point(data=impMCAR10.dat, aes(x=TRTPN, y=phat.10imp, shape=SEXN), color="limegreen") + | |
geom_line(data=impMCAR20.dat, aes(x=TRTPN, y=phat.20imp, group=1), color="blue", linetype="dotted") + #imputed as non-responder data | |
geom_point(data=impMCAR20.dat, aes(x=TRTPN, y=phat.20imp, shape=SEXN), color="blue") + | |
geom_line(data=impMCAR30.dat, aes(x=TRTPN, y=phat.30imp, group=1), color="mediumorchid", linetype="dotted") + #imputed as non-responder data | |
geom_point(data=impMCAR30.dat, aes(x=TRTPN, y=phat.30imp, shape=SEXN), color="mediumorchid") + | |
labs(title = "Predicted Probability of Response for Imputed Data by Sex", x="Treatment Group", y = "Predicted Probability", shape="Sex")+ | |
scale_x_discrete(labels=c("1"="Placebo", "2"="Active Control", "3"= "140mg Drug X", "4"= "210mg Drug X")) + | |
scale_shape_discrete(labels=c("1"="Male", "2"="Female")) + | |
theme_classic() + | |
theme(plot.title=element_text(hjust=0.5)) | |
``` | |
\section{Odds Ratios} | |
```{r} | |
#ODDS RATIOS and confidence intervals | |
zscore <- qnorm(.98) | |
ogsum <- summary(OGdata.mod) | |
c10sum <- summary(compMCAR10.mod) | |
i10sum <- summary(impMCAR10.mod) | |
c20sum <- summary(compMCAR20.mod) | |
i20sum <- summary(impMCAR20.mod) | |
c30sum <- summary(compMCAR30.mod) | |
i30sum <- summary(impMCAR30.mod) | |
cMARsum <- summary(compMAR.mod) | |
iMARsum <- summary(impMAR.mod) | |
#create rows of data: estimate, std error | |
FullData <- c(ogsum$coefficients[,1], ogsum$coefficients[,2]) | |
c10MCAR <- c(c10sum$coefficients[,1], c10sum$coefficients[,2]) | |
i10MCAR <- c(i10sum$coefficients[,1], i10sum$coefficients[,2]) | |
c20MCAR <- c(c20sum$coefficients[,1], c20sum$coefficients[,2]) | |
i20MCAR <- c(i20sum$coefficients[,1], i20sum$coefficients[,2]) | |
c30MCAR <- c(c30sum$coefficients[,1], c30sum$coefficients[,2]) | |
i30MCAR <- c(i30sum$coefficients[,1], i30sum$coefficients[,2]) | |
cMAR <- c(cMARsum$coefficients[,1], cMARsum$coefficients[,2]) | |
iMAR <- c(iMARsum$coefficients[,1], iMARsum$coefficients[,2]) | |
datanamesN <- c(1,2,3,4,5,6,7,8,9) | |
table <- rbind(FullData, c10MCAR, i10MCAR, c20MCAR, i20MCAR, c30MCAR, i30MCAR, cMAR, iMAR) | |
table <- cbind(datanamesN, table) | |
table <- as.data.frame(table) | |
#where Lo=140mg, Hi=210mg | |
names(table) <- c("dataset", "intEst", "femEst", "acEst", "LoEst", "HiEst","intSE", "femSE", "acSE", "LoSE", "HiSE") | |
table$zscore <- rep(zscore,9) | |
#Female vs Male OR | |
ggplot(table, aes(x=exp(femEst), y=as.factor(dataset))) + | |
geom_point() + | |
geom_errorbarh(aes(xmax=exp(femEst + (zscore*femSE)), xmin = exp(femEst - (zscore*femSE)))) + | |
labs(title = "Female vs Male OR Estimates", x="OR", y = "Model")+ | |
scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) + | |
theme_classic() + | |
theme(plot.title=element_text(hjust=0.5)) | |
#Active Control vs Placebo OR | |
ggplot(table, aes(x=exp(acEst), y=as.factor(dataset))) + | |
geom_point() + | |
geom_errorbarh(aes(xmax=exp(acEst + (zscore*acSE)), xmin = exp(acEst - (zscore*acSE))))+ | |
labs(title = "Active vs Placebo OR Estimates", x="OR", y = "Model")+ | |
scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) + | |
theme_classic() + | |
theme(plot.title=element_text(hjust=0.5)) | |
#140mg vs Placebo OR | |
ggplot(table, aes(x=exp(LoEst), y=as.factor(dataset))) + | |
geom_point() + | |
geom_errorbarh(aes(xmax=exp(LoEst + (zscore*LoSE)), xmin = exp(LoEst - (zscore*LoSE))))+ | |
labs(title = "140mg vs Placebo OR Estimates", x="OR", y = "Model")+ | |
scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) + | |
theme_classic() + | |
theme(plot.title=element_text(hjust=0.5)) | |
#210mg vs Placebo OR | |
ggplot(table, aes(x=exp(HiEst), y=as.factor(dataset))) + | |
geom_point() + | |
geom_errorbarh(aes(xmax=exp(HiEst + (zscore*HiSE)), xmin = exp(HiEst - (zscore*HiSE))))+ | |
labs(title = "210mg vs Placebo OR Estimates", x="OR", y = "Model")+ | |
scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) + | |
theme_classic() + | |
theme(plot.title=element_text(hjust=0.5)) | |
``` | |
\section{Coefficients Error Bars} | |
we have the following model | |
$$ | |
E[Y_i]= logit(p_i)= \beta_0 + \beta_1X_{1_i} + \beta_2X_{2_i} + \beta_3X_{3_i} + \beta_4X_{4_i} | |
$$ | |
```{r} | |
library(ggplot2) | |
#first: intercept, b0 | |
ggplot(table, aes(x=intEst, y=as.factor(dataset))) + | |
geom_point() + | |
geom_errorbarh(aes(xmax=intEst + (zscore*intSE), xmin = intEst - (zscore*intSE))) + | |
labs(title = "Intercept Estimates", x="Estimate Value", y = "Model")+ | |
scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) + | |
theme_classic() + | |
theme(plot.title=element_text(hjust=0.5)) | |
#Female b2 | |
ggplot(table, aes(x=femEst, y=as.factor(dataset))) + | |
geom_point() + | |
geom_errorbarh(aes(xmax=femEst + (zscore*femSE), xmin = femEst - (zscore*femSE))) + | |
labs(title = "Female Coefficient Estimates", x="Estimate Value", y = "Model")+ | |
scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) + | |
theme_classic() + | |
theme(plot.title=element_text(hjust=0.5)) | |
#Active Control b3 | |
ggplot(table, aes(x=acEst, y=as.factor(dataset))) + | |
geom_point() + | |
geom_errorbarh(aes(xmax=acEst + (zscore*acSE), xmin = acEst - (zscore*acSE)))+ | |
labs(title = "Active Control Coefficient Estimates", x="Estimate Value", y = "Model")+ | |
scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) + | |
theme_classic() + | |
theme(plot.title=element_text(hjust=0.5)) | |
#140mg b4 | |
ggplot(table, aes(x=LoEst, y=as.factor(dataset))) + | |
geom_point() + | |
geom_errorbarh(aes(xmax=LoEst + (zscore*LoSE), xmin = LoEst - (zscore*LoSE)))+ | |
labs(title = "140mg Drug X Coefficient Estimates", x="Estimate Value", y = "Model")+ | |
scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) + | |
theme_classic() + | |
theme(plot.title=element_text(hjust=0.5)) | |
#210mg b5 | |
ggplot(table, aes(x=HiEst, y=as.factor(dataset))) + | |
geom_point() + | |
geom_errorbarh(aes(xmax=HiEst + (zscore*HiSE), xmin = HiEst - (zscore*HiSE)))+ | |
labs(title = "210mg Drug X Coefficient Estimates", x="Estimate Value", y = "Model")+ | |
scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) + | |
theme_classic() + | |
theme(plot.title=element_text(hjust=0.5)) | |
``` | |
\section{Convert Data to Freq Table for SAS} | |
```{r} | |
table(ogADPAS.respdat$PCHGCA1N, ogADPAS.respdat$TRTPN, ogADPAS.respdat$SEXN) | |
#PCHGCA1N: 0, non-responder; 1, responder TRTPN: 1, placebo; 2 active control (ustek); | |
#3, 140mg test; 4, 210 test SEXN: 1, male; 2, female, 3, unknown (no 3s appear) | |
table(compMCAR10.dat$PCHGCA1N, compMCAR10.dat$TRTPN, compMCAR10.dat$SEXN) | |
table(impMCAR10.dat$PCHGCA1N, impMCAR10.dat$TRTPN, impMCAR10.dat$SEXN) | |
``` | |