BIST5092_LogRegCode.Rmd

---
title: "BIST 5092 Outliers"
author: "Christine Guerette"
date: "8/5/2022"
output: 
  pdf_document: default
  word_document: default
  number_sections: true
---

\section{Outliers}
First, we are only interested in three variables, all categorical. So, technically no outliers in this set of data. The one thing we need to worry about is calibration and discrimination of the model. I.e., we need to make sure that the observed and predicted probabilities based on the model are reasonably close and that the distribution of the predicted probabilities for success (being a responder) and failure (being a non-responder) are easily distinguishable from each other. So for testing calibration we can use a couple of tests (Pearson chi-square GoF test, Hosmer-Lemeshow GoF test, and deviance GoF test) and we can assess the discrimination of the model using the c-stat (concordance statistic). However, let's check our data to see if there are any noteworthy discrepancies in some continuous variables.

```{r}
getwd()
setwd("C:/Users/chrgu/OneDrive/Desktop/BIST 5092/Background materials and data for the course project/Background materials and data for the course project/Data/Phase3-data/data_phase 3")
#load in data
ogADPA.dat <- read.csv(file="ADPA.csv", sep = ",")
names(ogADPA.dat)
#does not look like there any variables like age, gender, or weight in OG ADPA 
#dataset so let's look at ADSL
ogADSL.dat <- read.csv(file="ADSL.csv", sep = ",")
summary(ogADSL.dat)
hist(ogADSL.dat$WEIGHT) #in kg
box.wt <- boxplot(ogADSL.dat$WEIGHT) 
box.wt$out #some of these weights are super weird and very low, i.e., this person 
#would be super underweight or overweight; overweight in my opinion does not seem 
#as intense as how underweight some of these participants appear to be. However, 
#this is simulated data so how far can we really go with that kind of speculation?
wt.out <- subset(ogADSL.dat, ogADSL.dat$AGE <= 17)
summary(ogADSL.dat$AGE)
hist(ogADSL.dat$AGE)
box.age <- boxplot(ogADSL.dat$AGE) 
box.age$out #children should most likely not be participating: 9, 8, 11, 5, 10, 3 are all noted 
#again, we have simulated data so is that not the nature of simulation? 

#how should we approach this data?
#Checking how many participants are listed as younger than 18
young.17 <- subset(ogADSL.dat, ogADSL.dat$AGE <= 17)
dim(young.17) #37 participants are younger than 18 so by Inclusion/Exclusion criteria, 
#these participants would not be included in the study (pg 27 of protocol)
summary(young.17$AGE)
#there is no inclusion/exclusion criteria regarding weight so how do we approach those outliers?
old.75 <- subset(ogADSL.dat, ogADSL.dat$AGE > 75)
dim(old.75) #37 participants are younger than 18 so by Inclusion/Exclusion criteria, 
#these participants would not be included in the study (pg 27 of protocol)
summary(old.75$AGE)
length(old.75$AGE)
#there is no inclusion/exclusion criteria regarding weight so how do we approach those outliers?
```
\section{Checking Assumptions}
Since there are no real outliers we can check for in PASI75 response, sex, and treatment, let's take a look at the assumptions of the logistic regression model and see if they are verified. We need to check independence and check if the log odds of the binary of PASI75 response is linear and double check if there are other assumptions

\section{Original Dataset: Part C for All}
```{r}
# we're using the ITT principle so make sure to use TRT01P for treatment as planned
#also note that the binary PASI75 response is the PCHGCA1N variable
#we need to pare down the datasets such that we have one row of data per person 
#which will include but is not limited to sex, treatment arm assigned, and if they are recorded as a PASI75 responder

#note that we can assume independence now that our data is pared down.
#However, have repeated measures with all visits and thus dependent data

library(car)
library(pROC)
#let's look at whole dataset (PART C for assignment)
setwd("C:/Users/chrgu/OneDrive/Documents/GitHub/Biostats-Practicum")
ogADPAS.dat <- read.csv(file="ADPA + SEX.csv", sep = ",") 
ogADPAS.v6dat <- subset(ogADPAS.dat, ogADPAS.dat$AVISITN==6)
ogADPAS.respdat <- subset(ogADPAS.v6dat, ogADPAS.v6dat$PARAMN==10)
ogADPAS.v2dat <- subset(ogADPAS.dat, ogADPAS.dat$AVISITN==2)
summary(ogADPAS.v2dat)
#let's set up our model
ogADPAS.respdat$PCHGCA1N <- as.factor(ogADPAS.respdat$PCHGCA1N)
ogADPAS.respdat$SEXN <- as.factor(ogADPAS.respdat$SEXN)
ogADPAS.respdat$TRTPN <- as.factor(ogADPAS.respdat$TRTPN)

OGdata.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=ogADPAS.respdat)
#placebo and male are the reference groups
summary(OGdata.mod)

car::vif(OGdata.mod) #doesn't make too much sense to look at VIF values as they
#are based on correlation and our vars of interest are categorical

#check predictive power
roc(ogADPAS.respdat$PCHGCA1N, fitted(OGdata.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE)
#pretty decent AUC value = C-statistic -- decent predictive power
#so with an AUC of .78, we have acceptable discrimination

#let's look at deviance residuals
devres <- residuals(OGdata.mod,"deviance")
ogADPAS.respdat$phat.comp  <- OGdata.mod$fitted.values

plot(ogADPAS.respdat$phat.comp, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals")
abline(h=c(-2,2))
#many repeat cases? so should look at another measure to evaluate model fit

#we have quite a few repeat cases as we only have a few levels so we can use the 
#Deviance GoF test to check calibration
#check model fit and assess calibration
devdif.po <- OGdata.mod$null.deviance - OGdata.mod$deviance
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1)
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) 
cbind(devdif.po,df.po,pval.po)
#we reject null and conclude that the model does NOT fit the data well
#logistic response function is NOT appropriate for this dataset


#LET"S CHECK if we can remove sex from the model and see if that helps with the goodness of fit at all
red.mod <- glm(PCHGCA1N ~ TRTPN, family=binomial, data=ogADPAS.respdat)
summary(red.mod)
G2 <- red.mod$deviance - OGdata.mod$deviance
dfs <- red.mod$df.residual - OGdata.mod$df.residual
qchisq(.95, dfs)
G2 #G2 = 2.436 < 3.841 so fail to reject H_0: coeff of SEXN2 = 0
#if we use alpha = .04
qchisq(.96, dfs)
G2 #same conclusion as it is more conservative that alpha=.05
```

\subsection{Identifying potential influential cases}
prediction: placebo will be very influential based on how few were responders

```{r}
#Get hat diagonals
hii <- influence(OGdata.mod)$hat
#calculate 2p/n, p=2+1=3, n=1831
lowbnd <- (2*3)/1831
x.outl <- subset(hii, hii >= lowbnd)
names(x.outl)
length(x.outl)

out.CooksD <-  cooks.distance(OGdata.mod)
out.dfits  <- dffits(OGdata.mod)
out.dbetas <- dfbetas(OGdata.mod)

#Rule of thumb for DFFITs and identifying outliers
bndfits <- (2*sqrt(3/1831))
x.outl.fits <- subset(out.dfits, abs(out.dfits) >= bndfits)
names(x.outl.fits)
length(x.outl.fits)
#Rule of thumb for DFBETAs and identifying outliers
bndbet <- 2/sqrt(1831)
#for intercept
int.dfb <- out.dbetas[,1]
x.outl.intdb <- subset(int.dfb, abs(int.dfb) >= bndbet)
names(x.outl.intdb)
length(x.outl.intdb)
#for SEXN=2 (female)
x1.dfb <- out.dbetas[,2]
x1.outl <- subset(x1.dfb, abs(x1.dfb) >= bndbet)
names(x1.outl)
length(x1.outl)
#for TRT=2 (active)
x2.dfb <- out.dbetas[,3]
x2.outl <- subset(x2.dfb, abs(x2.dfb) >= bndbet)
names(x2.outl)
length(x2.outl)
#for TRT=3 (test 140mg)
x3.dfb <- out.dbetas[,4]
x3.outl <- subset(x3.dfb, abs(x3.dfb) >= bndbet)
names(x3.outl)
length(x3.outl)
#for TRT=4 (test 210)
x4.dfb <- out.dbetas[,5]
x4.outl <- subset(x4.dfb, abs(x4.dfb) >= bndbet)
names(x4.outl)
length(x4.outl)

#Rule of Thumb for Cook's Distance
cdbnd <- .8
cd.outl <- subset(out.CooksD, out.CooksD >= cdbnd)
names(cd.outl)
```

\section{10\% MCAR 2A, 2B, 2C}

```{r}
#read in data and pare down to visit 6
setwd("C:/Users/chrgu/OneDrive/Documents/GitHub/Biostats-Practicum")
MCAR10.dat <- read.csv(file="MCAR10%.csv", sep = ",", na.strings=c('',NA)) 
MCAR10.v6dat <- subset(MCAR10.dat, MCAR10.dat$AVISITN==6)
MCAR10.respdat <- subset(MCAR10.v6dat, MCAR10.v6dat$PARAMN==10)

#let's clean the data
MCAR10.respdat$PCHGCA1N <- as.factor(MCAR10.respdat$PCHGCA1N)
MCAR10.respdat$SEXN <- as.factor(MCAR10.respdat$SEXN)
MCAR10.respdat$TRTPN <- as.factor(MCAR10.respdat$TRTPN)
```

\subsection{Part A}
```{r}
#using AVAL as an indicator of missingness, create a completer dataset
compMCAR10.dat <- subset(MCAR10.respdat, !is.na(MCAR10.respdat$AVAL))

#time to set up model and then evaluate
compMCAR10.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=compMCAR10.dat)
#placebo and male are the reference groups
summary(compMCAR10.mod)

#car::vif(compMCAR10.mod) #doesn't make too much sense to look at VIF values as they
#are based on correlation and our vars of interest are categorical

#check predictive power
roc(compMCAR10.dat$PCHGCA1N, fitted(compMCAR10.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE)
#pretty decent AUC value = C-statistic -- decent predictive power
#so with an AUC of .777, we have acceptable discrimination

#let's look at deviance residuals
devres <- residuals(compMCAR10.mod,"deviance")
compMCAR10.dat$phat.10cmpl  <- compMCAR10.mod$fitted.values

plot(compMCAR10.dat$phat.10cmpl, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals")
abline(h=c(-2,2))
#many repeat cases? so should look at another measure to evaluate model fit

#we have quite a few repeat cases as we only have a few levels so we can use the 
#Deviance GoF test to check calibration
#check model fit and assess calibration
devdif.po <- compMCAR10.mod$null.deviance - compMCAR10.mod$deviance
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1)
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) 
cbind(devdif.po,df.po,pval.po)
#we reject null and conclude that the model does NOT fit the data well
#logistic response function is NOT appropriate for this dataset
```

\subsection{Part B}
```{r}
#create imputed dataset
impMCAR10.dat <- MCAR10.respdat
i <- 1
for(i in 1:length(impMCAR10.dat$USUBJID)){
  if(is.na(MCAR10.respdat$AVAL[i])==TRUE){
    impMCAR10.dat$PCHGCA1N[i] <- 0
    i <- i+1
  }
}

#time to set up model and then evaluate
impMCAR10.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=impMCAR10.dat)
#placebo and male are the reference groups
summary(impMCAR10.mod)

#car::vif(impMCAR10.mod) #doesn't make too much sense to look at VIF values as they
#are based on correlation and our vars of interest are categorical

#check predictive power
roc(impMCAR10.dat$PCHGCA1N, fitted(impMCAR10.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE)
#pretty decent AUC value = C-statistic -- decent predictive power
#so with an AUC of .741, we have acceptable discrimination

#let's look at deviance residuals
devres <- residuals(impMCAR10.mod,"deviance")
impMCAR10.dat$phat.10imp  <- impMCAR10.mod$fitted.values

plot(impMCAR10.dat$phat.10imp, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals")
abline(h=c(-2,2))
#many repeat cases? so should look at another measure to evaluate model fit

#we have quite a few repeat cases as we only have a few levels so we can use the 
#Deviance GoF test to check calibration
#check model fit and assess calibration
devdif.po <- impMCAR10.mod$null.deviance - impMCAR10.mod$deviance
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1)
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) 
cbind(devdif.po,df.po,pval.po)
#we reject null and conclude that the model does NOT fit the data well
#logistic response function is NOT appropriate for this dataset
```

\section{20\% MCAR 2C}

refer to first section about original dataset
\subsection{Logistic Regression Model}

```{r}
#MCAR 20: read in data and clean
MCAR20data <- read.csv(file="MCAR20%.csv", sep = ",", na.strings=c('',NA))
MCAR20data$PCHGCA1N <- as.factor(MCAR20data$PCHGCA1N)
MCAR20data$SEXN <- as.factor(MCAR20data$SEXN)
MCAR20data$TRTPN <- as.factor(MCAR20data$TRTPN)
MCAR20data.v6dat <- subset(MCAR20data, MCAR20data$AVISITN==6)
MCAR20data.respdat <- subset(MCAR20data.v6dat, MCAR20data.v6dat$PARAMN==10)

#COMPLETERS
#using AVAL as an indicator of missingness, create a completer dataset
compMCAR20.dat <- subset(MCAR20data.respdat, !is.na(MCAR20data.respdat$AVAL))
#time to set up model and then evaluate
compMCAR20.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=compMCAR20.dat)
compMCAR20.dat$phat.20cmpl  <- compMCAR20.mod$fitted.values

summary(compMCAR20.mod)

#check predictive power
roc(compMCAR20.dat$PCHGCA1N, fitted(compMCAR20.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE)
#pretty decent AUC value = C-statistic -- decent predictive power
#so with an AUC of .78, we have acceptable discrimination

#let's look at deviance residuals
devres <- residuals(compMCAR20.mod,"deviance")

plot(compMCAR20.dat$phat.20cmpl, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals")
abline(h=c(-2,2))
#many repeat cases? so should look at another measure to evaluate model fit

#we have quite a few repeat cases as we only have a few levels so we can use the 
#Deviance GoF test to check calibration
#check model fit and assess calibration
devdif.po <- compMCAR20.mod$null.deviance - compMCAR20.mod$deviance
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1)
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) 
cbind(devdif.po,df.po,pval.po)
#we reject null and conclude that the model does NOT fit the data well
#logistic response function is NOT appropriate for this dataset

#IMPUTED AS NON-RESPONDER
#create imputed dataset
impMCAR20.dat <- MCAR20data.respdat
i <- 1
for(i in 1:length(impMCAR20.dat$USUBJID)){
  if(is.na(MCAR20data.respdat$AVAL[i])==TRUE){
    impMCAR20.dat$PCHGCA1N[i] <- 0
    i <- i+1
  }
}

#time to set up model and then evaluate
impMCAR20.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=impMCAR20.dat)
impMCAR20.dat$phat.20imp  <- impMCAR20.mod$fitted.values

summary(impMCAR20.mod)

#check predictive power
roc(impMCAR20.dat$PCHGCA1N, fitted(impMCAR20.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE)
#pretty decent AUC value = C-statistic -- decent predictive power
#so with an AUC of .713, we have somewhat acceptable discrimination

#let's look at deviance residuals
devres <- residuals(impMCAR20.mod,"deviance")

plot(impMCAR20.dat$phat.20imp, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals")
abline(h=c(-2,2))
#many repeat cases? so should look at another measure to evaluate model fit

#we have quite a few repeat cases as we only have a few levels so we can use the 
#Deviance GoF test to check calibration
#check model fit and assess calibration
devdif.po <- impMCAR20.mod$null.deviance - impMCAR20.mod$deviance
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1)
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) 
cbind(devdif.po,df.po,pval.po)
#we reject null and conclude that the model does NOT fit the data well
#logistic response function is NOT appropriate for this dataset
```


\section{30\% MCAR 2C}

refer to first section about original dataset
\subsection{Logisitic Regression Model}
```{r}
#MCAR 30: read in data and clean
MCAR30data <- read.csv(file="MCAR30%.csv", sep = ",", na.strings=c('',NA))
MCAR30data$PCHGCA1N <- as.factor(MCAR30data$PCHGCA1N)
MCAR30data$SEXN <- as.factor(MCAR30data$SEXN)
MCAR30data$TRTPN <- as.factor(MCAR30data$TRTPN)
MCAR30data.v6dat <- subset(MCAR30data, MCAR30data$AVISITN==6)
MCAR30data.respdat <- subset(MCAR30data.v6dat, MCAR30data.v6dat$PARAMN==10)

#COMPLETERS
#using AVAL as an indicator of missingness, create a completer dataset
compMCAR30.dat <- subset(MCAR30data.respdat, !is.na(MCAR30data.respdat$AVAL))
#time to set up model and then evaluate
compMCAR30.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=compMCAR30.dat)
compMCAR30.dat$phat.30cmpl  <- compMCAR30.mod$fitted.values

summary(compMCAR30.mod)

#check predictive power
roc(compMCAR30.dat$PCHGCA1N, fitted(compMCAR30.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE)
#pretty decent AUC value = C-statistic -- decent predictive power
#so with an AUC of .787, we have acceptable discrimination

#let's look at deviance residuals
devres <- residuals(compMCAR30.mod,"deviance")

plot(compMCAR30.dat$phat.30cmpl, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals")
abline(h=c(-2,2))
#many repeat cases? so should look at another measure to evaluate model fit

#we have quite a few repeat cases as we only have a few levels so we can use the 
#Deviance GoF test to check calibration
#check model fit and assess calibration
devdif.po <- compMCAR30.mod$null.deviance - compMCAR30.mod$deviance
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1)
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) 
cbind(devdif.po,df.po,pval.po)
#we reject null and conclude that the model does NOT fit the data well
#logistic response function is NOT appropriate for this dataset

#IMPUTED AS NON-RESPONDER
#create imputed dataset
impMCAR30.dat <- MCAR30data.respdat
i <- 1
for(i in 1:length(impMCAR30.dat$USUBJID)){
  if(is.na(MCAR30data.respdat$AVAL[i])==TRUE){
    impMCAR30.dat$PCHGCA1N[i] <- 0
    i <- i+1
  }
}

#time to set up model and then evaluate
impMCAR30.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=impMCAR30.dat)
impMCAR30.dat$phat.30imp  <- impMCAR30.mod$fitted.values

summary(impMCAR30.mod)

#check predictive power
roc(impMCAR30.dat$PCHGCA1N, fitted(impMCAR30.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE)
#pretty decent AUC value = C-statistic -- decent predictive power
#so with an AUC of .699, we have borderline acceptable discrimination

#let's look at deviance residuals
devres <- residuals(impMCAR30.mod,"deviance")

plot(impMCAR30.dat$phat.30imp, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals")
abline(h=c(-2,2))
#many repeat cases? so should look at another measure to evaluate model fit

#we have quite a few repeat cases as we only have a few levels so we can use the 
#Deviance GoF test to check calibration
#check model fit and assess calibration
devdif.po <- impMCAR30.mod$null.deviance - impMCAR30.mod$deviance
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1)
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) 
cbind(devdif.po,df.po,pval.po)
#we reject null and conclude that the model does NOT fit the data well
#logistic response function is NOT appropriate for this dataset
```


\section{MAR 2C}
refer to first section about original dataset

```{r}
MARdata <- read.csv(file="MAR_Data_V1.csv", sep = ",", na.strings=c('',NA))
MARdata$PCHGCA1N <- as.factor(MARdata$PCHGCA1N)
MARdata$SEXN <- as.factor(MARdata$SEXN)
MARdata$TRTPN <- as.factor(MARdata$TRTPN)
MARdata.v6dat <- subset(MARdata, MARdata$AVISITN==6)
MARdata.respdat <- subset(MARdata.v6dat, MARdata.v6dat$PARAMN==10)

#COMPLETERS
#using AVAL as an indicator of missingness, create a completer dataset
compMAR.dat <- subset(MARdata.respdat, !is.na(MARdata.respdat$AVAL))
#time to set up model and then evaluate
compMAR.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=compMAR.dat)
compMAR.dat$phat.cmpl  <- compMAR.mod$fitted.values

summary(compMAR.mod)

#check predictive power
roc(compMAR.dat$PCHGCA1N, fitted(compMAR.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE)
#pretty decent AUC value = C-statistic -- decent predictive power
#so with an AUC of .712, we have somewhat acceptable discrimination

#let's look at deviance residuals
devres <- residuals(compMAR.mod,"deviance")

plot(compMAR.dat$phat.cmpl, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals")
abline(h=c(-2,2))
#many repeat cases? so should look at another measure to evaluate model fit

#we have quite a few repeat cases as we only have a few levels so we can use the 
#Deviance GoF test to check calibration
#check model fit and assess calibration
devdif.po <- compMAR.mod$null.deviance - compMAR.mod$deviance
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1)
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) 
cbind(devdif.po,df.po,pval.po)
#we reject null and conclude that the model does NOT fit the data well
#logistic response function is NOT appropriate for this dataset

#IMPUTED AS NON-RESPONDER
#create imputed dataset
impMAR.dat <- MARdata.respdat
i <- 1
for(i in 1:length(impMAR.dat$USUBJID)){
  if(is.na(MARdata.respdat$AVAL[i])==TRUE){
    impMAR.dat$PCHGCA1N[i] <- 0
    i <- i+1
  }
}

#time to set up model and then evaluate
impMAR.mod <- glm(PCHGCA1N ~ SEXN + TRTPN, family=binomial, data=impMAR.dat)
impMAR.dat$phat.imp  <- impMAR.mod$fitted.values

summary(impMAR.mod)

#check predictive power
roc(impMAR.dat$PCHGCA1N, fitted(impMAR.mod), plot=TRUE, legacy.axes=TRUE, print.auc=TRUE)
#pretty decent AUC value = C-statistic -- decent predictive power
#so with an AUC of .721, we have acceptable discrimination -- BETTER THAN COMPLETER

#let's look at deviance residuals
devres <- residuals(impMAR.mod,"deviance")

plot(impMAR.dat$phat.imp, devres, main = "Deviance Residual Plot", xlab= "Estimated Probability", ylab="Deviance Residuals")
abline(h=c(-2,2))
#many repeat cases? so should look at another measure to evaluate model fit

#we have quite a few repeat cases as we only have a few levels so we can use the 
#Deviance GoF test to check calibration
#check model fit and assess calibration
devdif.po <- impMAR.mod$null.deviance - impMAR.mod$deviance
df.po <- 3 - 1 #p1-p0 = (2+1)-(0+1)
pval.po <- pchisq(devdif.po, df.po, ncp=0, lower.tail = FALSE, log.p = FALSE) 
cbind(devdif.po,df.po,pval.po)
#we reject null and conclude that the model does NOT fit the data well
#logistic response function is NOT appropriate for this dataset

```

\section{Graphs}

\section{Initial General Notes and Comparisons}
Across these three models (complete dataset completer 10% MCAR, imputed 10% MCAR), logistic regression does not seem to be the right choice of analysis as there are only categorical variables and few levels. Maybe a different analysis would be best. Want to look at the difference in response rates based on treatment across sex: ANOVA? but that can analogously explored using regression

```{r}
#to get a feel for how the predicted probabilities look based on the model
library(ggplot2)

ggplot() +
  geom_freqpoly(data=ogADPAS.respdat, aes(x=phat.comp), color="chartreuse") +
  geom_freqpoly(data=compMCAR10.dat, aes(x=phat.10cmpl), color="coral") +
  geom_freqpoly(data=impMCAR10.dat, aes(x=phat.10imp), color="darkorchid") +
  theme_classic()

ggplot() +
  geom_point(data=ogADPAS.respdat, aes(x=phat.comp, y=TRTPN, shape=SEXN), color="chartreuse") +
  geom_point(data=compMCAR10.dat, aes(x=phat.10cmpl, y=TRTPN, shape=SEXN), color="coral") +
  geom_point(data=impMCAR10.dat, aes(x=phat.10imp, y=TRTPN, shape=SEXN), color="darkorchid") +
  theme_classic()

ggplot() +
  geom_line(data=ogADPAS.respdat, aes(x=TRTPN, y=phat.comp, group=SEXN, linetype=SEXN), color="chartreuse") + #whole data
  geom_line(data=compMCAR10.dat, aes(x=TRTPN, y=phat.10cmpl, group=SEXN, linetype=SEXN), color="coral") + #completer data
  geom_line(data=impMCAR10.dat, aes(x=TRTPN, y=phat.10imp, group=SEXN, linetype=SEXN), color="darkorchid") + #imputed as non-responder data
  theme_classic()
```

\subsection{Population Graphs}
```{r}
library(ggplot2)
ggplot()+
  scale_fill_brewer(palette="Dark2")+
  geom_bar(data=ogADPAS.respdat, aes(x=TRTPN, fill=SEXN), position="dodge", color="black") +
  geom_bar(data=compMCAR10.dat, aes(x=TRTPN, fill=SEXN), position = "dodge", color = "limegreen") +
  #geom_bar(data=impMCAR10.dat, aes(x=TRTPN, fill=SEXN), position = "dodge", color = "limegreen") +
  geom_bar(data=compMCAR20.dat, aes(x=TRTPN, fill=SEXN), position = "dodge", color = "blue") +
  #geom_bar(data=impMCAR20.dat, aes(x=TRTPN, fill=SEXN), position = "dodge", color = "blue") +
  geom_bar(data=compMCAR30.dat, aes(x=TRTPN, fill=SEXN), position = "dodge", color = "mediumorchid") +
  #geom_bar(data=impMCAR30.dat, aes(x=TRTPN, fill=SEXN), position = "dodge", color = "mediumorchid") +
  geom_bar(data=compMAR.dat, aes(x=TRTPN, fill=SEXN), position = "dodge", color = "red") +
  #geom_bar(data=impMAR.dat, aes(x=TRTPN, fill=SEXN), position = "dodge", color = "red") 
  theme_classic()

ggplot() +
  geom_freqpoly(data=ogADPAS.respdat, aes(x=TRTPN, color = SEXN), stat="count") +
  geom_freqpoly(data=compMCAR10.dat, aes(x=phat.10cmpl), color="coral") +
  geom_freqpoly(data=impMCAR10.dat, aes(x=phat.10imp), color="darkorchid") +
  theme_classic()

#FEMALE GRAPH
ggplot() +
  scale_fill_brewer(palette="Set2")+
  geom_bar(data=subset(ogADPAS.respdat, ogADPAS.respdat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N), position="dodge",  color="black") + #whole data
  geom_bar(data=subset(compMAR.dat, compMAR.dat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N), position="dodge",  color="red") + #completer data
  geom_bar(data=subset(impMAR.dat, impMAR.dat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N),  position="dodge", color="red", linetype="dotted") + #imputed as non-responder data
  geom_bar(data=subset(compMCAR10.dat, compMCAR10.dat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N),  position="dodge", color="limegreen") + #completer data
  geom_bar(data=subset(impMCAR10.dat, impMCAR10.dat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N),  position="dodge", color="limegreen", linetype="dotted") + #imputed as non-responder data
  geom_bar(data=subset(compMCAR20.dat, compMCAR20.dat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N), position="dodge",  color="blue") + #completer data
  geom_bar(data=subset(impMCAR20.dat, impMCAR20.dat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N),  position="dodge", color="blue", linetype="dotted") + #imputed as non-responder data
  geom_bar(data=subset(compMCAR30.dat, compMCAR30.dat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N), position="dodge",  color="mediumorchid") + #completer data
  geom_bar(data=subset(impMCAR30.dat, impMCAR30.dat$SEXN == 2), aes(x=TRTPN, fill=PCHGCA1N), position="dodge",  color="mediumorchid", linetype="dotted") + #imputed as non-responder data
  theme_classic() 

tab1 <- table(ogADPAS.respdat$PCHGCA1N, ogADPAS.respdat$TRTPN, ogADPAS.respdat$SEXN)
#PCHGCA1N: 0, non-responder; 1, responder TRTPN: 1, placebo; 2 active control (ustek); 
#3, 140mg test; 4, 210 test SEXN: 1, male; 2, female, 3, unknown (no 3s appear)
tab2 <- table(compMCAR10.dat$SEXN, compMCAR10.dat$TRTPN, compMCAR10.dat$PCHGCA1N)
tab3 <- table(impMCAR10.dat$SEXN, impMCAR10.dat$TRTPN, impMCAR10.dat$PCHGCA1N)
tab4 <- table(compMCAR20.dat$SEXN, compMCAR20.dat$TRTPN, compMCAR20.dat$PCHGCA1N)
tab5 <- table(impMCAR20.dat$SEXN, impMCAR20.dat$TRTPN, impMCAR20.dat$PCHGCA1N)
tab6 <- table(compMCAR30.dat$SEXN, compMCAR30.dat$TRTPN, compMCAR30.dat$PCHGCA1N)
tab7 <- table(impMCAR30.dat$SEXN, impMCAR30.dat$TRTPN, impMCAR30.dat$PCHGCA1N)
tab8 <- table(compMAR.dat$SEXN, compMAR.dat$TRTPN, compMAR.dat$PCHGCA1N)
tab9 <- table(impMAR.dat$SEXN, impMAR.dat$TRTPN, impMAR.dat$PCHGCA1N)

tab1 #og data
tab2 #completer MCAR10
tab3 #inputed MCAR10
tab4 #completer MCAR20
tab5 #imputed MCAR20
tab6 #completer MCAR30
tab7 #imputed MCAR30
tab8 #completer MAR
tab9 #imputed MAR
summary(ogADPAS.respdat$PCHGCA1N)
summary(compMCAR10.dat$PCHGCA1N)
summary(impMCAR10.dat$PCHGCA1N)
summary(compMCAR20.dat$PCHGCA1N)
summary(impMCAR20.dat$PCHGCA1N)
summary(compMCAR30.dat$PCHGCA1N)
summary(impMCAR30.dat$PCHGCA1N)
summary(compMAR.dat$PCHGCA1N)
summary(impMAR.dat$PCHGCA1N)
```
\subsubsection{making tables}

\begin{center}
 \begin{tabular}{lclcccc}
 \multicolumn{7}{c}{\textbf{Table 1: Responders}}  \\
\hline
 Model         & Total           & Sex  & Placebo & Active  & 140mg  & 210mg    \\
               & Responders (\%) &      &         & Control & Drug X & Drug X \\
\hline
 Full Data     & 1169  & Male   & 15   & 146  & 283  & 376   \\
               & (\textit{63.8})     & Female & 10   & 64   & 123  & 152   \\
               \hline
 C. MCAR10   & 1058  & Male   & 14   & 130  & 254  & 344  \\
               & (\textit{64.2})    & Female & 10   & 59   & 110  & 137     \\
               \hline
 I. MCAR10    & 1058  & Male   & 14   & 130  & 254  & 344   \\
               & (\textit{57.8})    & Female & 10   & 59   & 110  & 137  \\
               \hline
 C. MCAR20   & 951  & Male   & 11   & 118  & 233  & 307  \\
               & (\textit{64.9})   & Female & 9    & 48   & 100  & 125     \\
               \hline
 I. MCAR20    & 951  & Male   & 11   & 118  & 233  & 307  \\
               & (\textit{51.9})   & Female & 9    & 48   & 100  & 125   \\
               \hline
 C. MCAR30   & 840  & Male   & 11   & 99   & 201  & 278  \\
               & (\textit{65.5})   & Female & 7    & 42   & 87   & 115  \\
               \hline
 I. MCAR30    & 840  & Male   & 11   & 99   & 201  & 278  \\
               & (\textit{45.9})   & Female & 7    & 42   & 87   & 115   \\
               \hline
 C. MAR      & 990   & Male   & 14   & 128  & 233  & 318  \\
               & (\textit{70.6})    & Female & 7    & 54   & 107  & 129    \\
               \hline
 I. MAR       & 990   & Male   & 14   & 128  & 233  & 318   \\
               & (\textit{54.1})    & Female & 7    & 54   & 107  & 129   \\
\hline
\end{tabular}
\end{center}

\begin{center}
\begin{tabular}{lclcccc}
\multicolumn{7}{c}{\textbf{Table 2: Non-Responders}}  \\
\hline
 Model         & Total Non-    & Sex   & Placebo & Active  & 140mg  & 210mg     \\
               & Responders (\%) &      &        & Control & Drug X & Drug X  \\
\hline
 Full Data     & 662                   & Male   & 182   & 61   & 126  & 58   \\
               &   (\textit{36.2})     & Female & 102   & 29   & 78   & 26   \\
               \hline
 C. MCAR10   & 590                  & Male   & 162   & 55   & 113  & 53  \\
               &  (\textit{35.8})     & Female & 91    & 26   & 66   & 24    \\
               \hline
 I. MCAR10    & 773                  & Male   & 183   & 77   & 155  & 90   \\
               &  (\textit{42.2})     & Female & 102   & 34   & 91   & 41   \\
               \hline
 C. MCAR20   & 514                  & Male   & 140   & 46   & 98   & 47  \\
               &  (\textit{35.1})     & Female & 82    & 23   & 59   & 19    \\
               \hline
 I. MCAR20    & 880                  & Male   & 186   & 89   & 176  & 127  \\
               & (\textit{48.1})       & Female & 103   & 45   & 101  & 53   \\
               \hline
 C. MCAR30   & 442                 & Male   & 125   & 43   & 79   & 38  \\
               &  (\textit{34.5})     & Female & 70    & 20   & 49   & 18    \\
               \hline
 I. MCAR30    & 991                  & Male   & 186   & 108  & 208  & 156  \\
               &   (\textit{54.1})    & Female & 105   & 51   & 114  & 63   \\
               \hline
 C. MAR      & 412                  & Male   & 67    & 51   & 99   & 49  \\
               &   (\textit{29.4})    & Female & 38    & 23   & 63   & 22    \\
               \hline
 I. MAR       & 841                  & Male   & 183   & 79   & 176  & 116 \\
               &  (\textit{45.9})     & Female & 105   & 39   & 94   & 49     \\
\hline
\end{tabular}
\end{center}

\subsection{Refined Graphs}

\subsubsection{One Graph per Sex}
each predicted probability is based on sex as well as treatment so there are TWO  of predicted probabilities per treatment -- can't separate out sex until check if sex can be reliably removed from model
```{r}
#FEMALE GRAPH
ggplot() +
  geom_line(data=subset(ogADPAS.respdat, ogADPAS.respdat$SEXN == 2), aes(x=TRTPN, y=phat.comp, group=1), color="black") + #whole data
  geom_line(data=subset(compMAR.dat, compMAR.dat$SEXN == 2), aes(x=TRTPN, y=phat.cmpl, group=1), color="red") + #completer data
  geom_line(data=subset(impMAR.dat, impMAR.dat$SEXN == 2), aes(x=TRTPN, y=phat.imp, group=1), color="red", linetype="dotted") + #imputed as non-responder data
  geom_line(data=subset(compMCAR10.dat, compMCAR10.dat$SEXN == 2), aes(x=TRTPN, y=phat.10cmpl, group=1), color="limegreen") + #completer data
  geom_line(data=subset(impMCAR10.dat, impMCAR10.dat$SEXN == 2), aes(x=TRTPN, y=phat.10imp, group=1), color="limegreen", linetype="dotted") + #imputed as non-responder data
  geom_line(data=subset(compMCAR20.dat, compMCAR20.dat$SEXN == 2), aes(x=TRTPN, y=phat.20cmpl, group=1), color="blue") + #completer data
  geom_line(data=subset(impMCAR20.dat, impMCAR20.dat$SEXN == 2), aes(x=TRTPN, y=phat.20imp, group=1), color="blue", linetype="dotted") + #imputed as non-responder data
  geom_line(data=subset(compMCAR30.dat, compMCAR30.dat$SEXN == 2), aes(x=TRTPN, y=phat.30cmpl, group=1), color="mediumorchid") + #completer data
  geom_line(data=subset(impMCAR30.dat, impMCAR30.dat$SEXN == 2), aes(x=TRTPN, y=phat.30imp, group=1), color="mediumorchid", linetype="dotted") + #imputed as non-responder data
  labs(title = "Predicted Probability of Response for Females", x="Treatment Group", y = "Predicted Probability")+
  scale_x_discrete(labels=c("1"="Placebo", "2"="Active Control", "3"= "140mg Drug X", "4"= "210mg Drug X")) +
  theme_classic() +
  theme(plot.title=element_text(hjust=0.5))

#MALE GRAPH
ggplot() +
  geom_line(data=subset(ogADPAS.respdat, ogADPAS.respdat$SEXN == 1), aes(x=TRTPN, y=phat.comp, group=1), color="black") + #whole data
  geom_line(data=subset(compMAR.dat, compMAR.dat$SEXN == 1), aes(x=TRTPN, y=phat.cmpl, group=1), color="red") + #completer data
  geom_line(data=subset(impMAR.dat, impMAR.dat$SEXN == 1), aes(x=TRTPN, y=phat.imp, group=1), color="red", linetype="dotted") + #imputed as non-responder data
  geom_line(data=subset(compMCAR10.dat, compMCAR10.dat$SEXN == 1), aes(x=TRTPN, y=phat.10cmpl, group=1), color="limegreen") + #completer data
  geom_line(data=subset(impMCAR10.dat, impMCAR10.dat$SEXN == 1), aes(x=TRTPN, y=phat.10imp, group=1), color="limegreen", linetype="dotted") + #imputed as non-responder data
  geom_line(data=subset(compMCAR20.dat, compMCAR20.dat$SEXN == 1), aes(x=TRTPN, y=phat.20cmpl, group=1), color="blue") + #completer data
  geom_line(data=subset(impMCAR20.dat, impMCAR20.dat$SEXN == 1), aes(x=TRTPN, y=phat.20imp, group=1), color="blue", linetype="dotted") + #imputed as non-responder data
  geom_line(data=subset(compMCAR30.dat, compMCAR30.dat$SEXN == 1), aes(x=TRTPN, y=phat.30cmpl, group=1), color="mediumorchid") + #completer data
  geom_line(data=subset(impMCAR30.dat, impMCAR30.dat$SEXN == 1), aes(x=TRTPN, y=phat.30imp, group=1), color="mediumorchid", linetype="dotted") + #imputed as non-responder data
  labs(title = "Predicted Probability of Response for Males", x="Treatment Group", y = "Predicted Probability")+
  scale_x_discrete(labels=c("1"="Placebo", "2"="Active Control", "3"= "140mg Drug X", "4"= "210mg Drug X")) +
  theme_classic() +
  theme(plot.title=element_text(hjust=0.5))

```

\subsubsection{ALL Information Available}

```{r}
#ALL INFORMATION
ggplot() +
  geom_line(data=ogADPAS.respdat, aes(x=TRTPN, y=phat.comp, group=1), color="black") + #whole data
  geom_point(data=ogADPAS.respdat, aes(x=TRTPN, y=phat.comp, shape=SEXN), color="black", size=1) +
  geom_line(data=compMAR.dat, aes(x=TRTPN, y=phat.cmpl, group=1), color="red") + #completer data
  geom_point(data=compMAR.dat, aes(x=TRTPN, y=phat.cmpl, shape=SEXN), color="red", size=1) +
  geom_line(data=impMAR.dat, aes(x=TRTPN, y=phat.imp, group=1), color="red", linetype="dotted") + #imputed as non-responder data
  geom_point(data=impMAR.dat, aes(x=TRTPN, y=phat.imp, shape=SEXN), color="red", size=1) +
  geom_line(data=compMCAR10.dat, aes(x=TRTPN, y=phat.10cmpl, group=1), color="limegreen") + #completer data
  geom_point(data=compMCAR10.dat, aes(x=TRTPN, y=phat.10cmpl, shape=SEXN), color="limegreen", size=1) +
  geom_line(data=impMCAR10.dat, aes(x=TRTPN, y=phat.10imp, group=1), color="limegreen", linetype="dotted") + #imputed as non-responder data
  geom_point(data=impMCAR10.dat, aes(x=TRTPN, y=phat.10imp, shape=SEXN), color="limegreen", size=1) +
  geom_line(data=compMCAR20.dat, aes(x=TRTPN, y=phat.20cmpl, group=1), color="blue") + #completer data
  geom_point(data=compMCAR20.dat, aes(x=TRTPN, y=phat.20cmpl, shape=SEXN), color="blue", size=1) +
  geom_line(data=impMCAR20.dat, aes(x=TRTPN, y=phat.20imp, group=1), color="blue", linetype="dotted") + #imputed as non-responder data
  geom_point(data=impMCAR20.dat, aes(x=TRTPN, y=phat.20imp, shape=SEXN), color="blue", size=1) +
  geom_line(data=compMCAR30.dat, aes(x=TRTPN, y=phat.30cmpl, group=1), color="mediumorchid") + #completer data
  geom_point(data=compMCAR30.dat, aes(x=TRTPN, y=phat.30cmpl, shape=SEXN), color="mediumorchid", size=1) +
  geom_line(data=impMCAR30.dat, aes(x=TRTPN, y=phat.30imp, group=1), color="mediumorchid", linetype="dotted") + #imputed as non-responder data
  geom_point(data=impMCAR30.dat, aes(x=TRTPN, y=phat.30imp, shape=SEXN), color="mediumorchid", size=1) +
  labs(title = "Predicted Probability of Response by Sex", x="Treatment Group", y = "Predicted Probability", shape = "Sex")+
  scale_x_discrete(labels=c("1"="Placebo", "2"="Active Control", "3"= "140mg Drug X", "4"= "210mg Drug X")) +
  scale_shape_discrete(labels=c("1"="Male", "2"="Female")) +
  theme_classic() +
  theme(plot.title=element_text(hjust=0.5))
```

\subsubsection{Completers Graph}

```{r}
#COMPLETERS
ggplot() +
  geom_line(data=ogADPAS.respdat, aes(x=TRTPN, y=phat.comp, group=1), color="black") + #whole data
  geom_point(data=ogADPAS.respdat, aes(x=TRTPN, y=phat.comp, shape=SEXN), color="black") +
  geom_line(data=compMAR.dat, aes(x=TRTPN, y=phat.cmpl, group=1), color="red") + #completer data
  geom_point(data=compMAR.dat, aes(x=TRTPN, y=phat.cmpl, shape=SEXN), color="red") +
  
  geom_line(data=compMCAR10.dat, aes(x=TRTPN, y=phat.10cmpl, group=1), color="limegreen") + #completer data
  geom_point(data=compMCAR10.dat, aes(x=TRTPN, y=phat.10cmpl, shape=SEXN), color="limegreen") +
  
  geom_line(data=compMCAR20.dat, aes(x=TRTPN, y=phat.20cmpl, group=1), color="blue") + #completer data
  geom_point(data=compMCAR20.dat, aes(x=TRTPN, y=phat.20cmpl, shape=SEXN), color="blue") +
  
  geom_line(data=compMCAR30.dat, aes(x=TRTPN, y=phat.30cmpl, group=1), color="mediumorchid") + #completer data
  geom_point(data=compMCAR30.dat, aes(x=TRTPN, y=phat.30cmpl, shape=SEXN), color="mediumorchid") +
 
  labs(title = "Predicted Probability of Response for Completer Data by Sex", x="Treatment Group", y = "Predicted Probability", shape="Sex")+
  scale_x_discrete(labels=c("1"="Placebo", "2"="Active Control", "3"= "140mg Drug X", "4"= "210mg Drug X")) +
  scale_shape_discrete(labels=c("1"="Male", "2"="Female")) +
  theme_classic() +
  theme(plot.title=element_text(hjust=0.5))
```

\subsubsection{Imputed Graph}

```{r}
#IMPUTED AS NON-RESP
ggplot() +
  geom_line(data=ogADPAS.respdat, aes(x=TRTPN, y=phat.comp, group=1), color="black") + #whole data
  geom_point(data=ogADPAS.respdat, aes(x=TRTPN, y=phat.comp, shape=SEXN), color="black") +
  
  geom_line(data=impMAR.dat, aes(x=TRTPN, y=phat.imp, group=1), color="red", linetype="dotted") + #imputed as non-responder data
  geom_point(data=impMAR.dat, aes(x=TRTPN, y=phat.imp, shape=SEXN), color="red") +
 
  geom_line(data=impMCAR10.dat, aes(x=TRTPN, y=phat.10imp, group=1), color="limegreen", linetype="dotted") + #imputed as non-responder data
  geom_point(data=impMCAR10.dat, aes(x=TRTPN, y=phat.10imp, shape=SEXN), color="limegreen") +
  
  geom_line(data=impMCAR20.dat, aes(x=TRTPN, y=phat.20imp, group=1), color="blue", linetype="dotted") + #imputed as non-responder data
  geom_point(data=impMCAR20.dat, aes(x=TRTPN, y=phat.20imp, shape=SEXN), color="blue") +
  
  geom_line(data=impMCAR30.dat, aes(x=TRTPN, y=phat.30imp, group=1), color="mediumorchid", linetype="dotted") + #imputed as non-responder data
  geom_point(data=impMCAR30.dat, aes(x=TRTPN, y=phat.30imp, shape=SEXN), color="mediumorchid") +
  labs(title = "Predicted Probability of Response for Imputed Data by Sex", x="Treatment Group", y = "Predicted Probability", shape="Sex")+
  scale_x_discrete(labels=c("1"="Placebo", "2"="Active Control", "3"= "140mg Drug X", "4"= "210mg Drug X")) +
  scale_shape_discrete(labels=c("1"="Male", "2"="Female")) +
  theme_classic() +
  theme(plot.title=element_text(hjust=0.5))
```
\section{Odds Ratios}

```{r}
#ODDS RATIOS and confidence intervals
zscore <- qnorm(.98)
ogsum <- summary(OGdata.mod)
c10sum <- summary(compMCAR10.mod)
i10sum <- summary(impMCAR10.mod)
c20sum <- summary(compMCAR20.mod)
i20sum <- summary(impMCAR20.mod)
c30sum <- summary(compMCAR30.mod)
i30sum <- summary(impMCAR30.mod)
cMARsum <- summary(compMAR.mod)
iMARsum <- summary(impMAR.mod)

#create rows of data: estimate, std error
FullData <- c(ogsum$coefficients[,1], ogsum$coefficients[,2])
c10MCAR <- c(c10sum$coefficients[,1], c10sum$coefficients[,2])
i10MCAR <- c(i10sum$coefficients[,1], i10sum$coefficients[,2])
c20MCAR <- c(c20sum$coefficients[,1], c20sum$coefficients[,2])
i20MCAR <- c(i20sum$coefficients[,1], i20sum$coefficients[,2])
c30MCAR <- c(c30sum$coefficients[,1], c30sum$coefficients[,2])
i30MCAR <- c(i30sum$coefficients[,1], i30sum$coefficients[,2])
cMAR <- c(cMARsum$coefficients[,1], cMARsum$coefficients[,2])
iMAR <- c(iMARsum$coefficients[,1], iMARsum$coefficients[,2])

datanamesN <- c(1,2,3,4,5,6,7,8,9)
table <- rbind(FullData, c10MCAR, i10MCAR, c20MCAR, i20MCAR, c30MCAR, i30MCAR, cMAR, iMAR)
table <- cbind(datanamesN, table)
table <- as.data.frame(table)
#where Lo=140mg, Hi=210mg
names(table) <- c("dataset", "intEst", "femEst", "acEst", "LoEst", "HiEst","intSE", "femSE", "acSE", "LoSE", "HiSE")
table$zscore <- rep(zscore,9)

#Female vs Male OR
ggplot(table, aes(x=exp(femEst), y=as.factor(dataset))) +
  geom_point() +
  geom_errorbarh(aes(xmax=exp(femEst + (zscore*femSE)), xmin = exp(femEst - (zscore*femSE)))) +
  labs(title = "Female vs Male OR Estimates", x="OR", y = "Model")+
  scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) +
  theme_classic() +
  theme(plot.title=element_text(hjust=0.5))
  

#Active Control vs Placebo OR
ggplot(table, aes(x=exp(acEst), y=as.factor(dataset))) +
  geom_point() +
  geom_errorbarh(aes(xmax=exp(acEst + (zscore*acSE)), xmin = exp(acEst - (zscore*acSE))))+
  labs(title = "Active vs Placebo OR Estimates", x="OR", y = "Model")+
  scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) +
  theme_classic() +
  theme(plot.title=element_text(hjust=0.5))

#140mg vs Placebo OR
ggplot(table, aes(x=exp(LoEst), y=as.factor(dataset))) +
  geom_point() +
  geom_errorbarh(aes(xmax=exp(LoEst + (zscore*LoSE)), xmin = exp(LoEst - (zscore*LoSE))))+
  labs(title = "140mg vs Placebo OR Estimates", x="OR", y = "Model")+
  scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) +
  theme_classic() +
  theme(plot.title=element_text(hjust=0.5))


#210mg vs Placebo OR
ggplot(table, aes(x=exp(HiEst), y=as.factor(dataset))) +
  geom_point() +
  geom_errorbarh(aes(xmax=exp(HiEst + (zscore*HiSE)), xmin = exp(HiEst - (zscore*HiSE))))+
  labs(title = "210mg vs Placebo OR Estimates", x="OR", y = "Model")+
  scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) +
  theme_classic() +
  theme(plot.title=element_text(hjust=0.5))

```
\section{Coefficients Error Bars}
we have the following model
$$
E[Y_i]= logit(p_i)= \beta_0 + \beta_1X_{1_i} + \beta_2X_{2_i} + \beta_3X_{3_i} + \beta_4X_{4_i}
$$

```{r}
library(ggplot2)
#first: intercept, b0
ggplot(table, aes(x=intEst, y=as.factor(dataset))) +
  geom_point() +
  geom_errorbarh(aes(xmax=intEst + (zscore*intSE), xmin = intEst - (zscore*intSE))) +
  labs(title = "Intercept Estimates", x="Estimate Value", y = "Model")+
  scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) +
  theme_classic() +
  theme(plot.title=element_text(hjust=0.5))

#Female b2
ggplot(table, aes(x=femEst, y=as.factor(dataset))) +
  geom_point() +
  geom_errorbarh(aes(xmax=femEst + (zscore*femSE), xmin = femEst - (zscore*femSE))) +
  labs(title = "Female Coefficient Estimates", x="Estimate Value", y = "Model")+
  scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) +
  theme_classic() +
  theme(plot.title=element_text(hjust=0.5))
  

#Active Control b3
ggplot(table, aes(x=acEst, y=as.factor(dataset))) +
  geom_point() +
  geom_errorbarh(aes(xmax=acEst + (zscore*acSE), xmin = acEst - (zscore*acSE)))+
  labs(title = "Active Control Coefficient Estimates", x="Estimate Value", y = "Model")+
  scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) +
  theme_classic() +
  theme(plot.title=element_text(hjust=0.5))

#140mg b4
ggplot(table, aes(x=LoEst, y=as.factor(dataset))) +
  geom_point() +
  geom_errorbarh(aes(xmax=LoEst + (zscore*LoSE), xmin = LoEst - (zscore*LoSE)))+
  labs(title = "140mg Drug X Coefficient Estimates", x="Estimate Value", y = "Model")+
  scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) +
  theme_classic() +
  theme(plot.title=element_text(hjust=0.5))


#210mg b5
ggplot(table, aes(x=HiEst, y=as.factor(dataset))) +
  geom_point() +
  geom_errorbarh(aes(xmax=HiEst + (zscore*HiSE), xmin = HiEst - (zscore*HiSE)))+
  labs(title = "210mg Drug X Coefficient Estimates", x="Estimate Value", y = "Model")+
  scale_y_discrete(labels=c("Full Data", "C MCAR10", "I MCAR10", "C MCAR20", "I MCAR20", "C MCAR30", "I MCAR30", "C MAR", "I MAR")) +
  theme_classic() +
  theme(plot.title=element_text(hjust=0.5))

```


\section{Convert Data to Freq Table for SAS}
```{r}
table(ogADPAS.respdat$PCHGCA1N, ogADPAS.respdat$TRTPN, ogADPAS.respdat$SEXN)
#PCHGCA1N: 0, non-responder; 1, responder TRTPN: 1, placebo; 2 active control (ustek); 
#3, 140mg test; 4, 210 test SEXN: 1, male; 2, female, 3, unknown (no 3s appear)
table(compMCAR10.dat$PCHGCA1N, compMCAR10.dat$TRTPN, compMCAR10.dat$SEXN)
table(impMCAR10.dat$PCHGCA1N, impMCAR10.dat$TRTPN, impMCAR10.dat$SEXN)
```