Skip to content
Permalink
main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
output
html_document pdf_document
default
default

Homework 2

STAT/BIST 5225

Due Date: Febuary 11, 2024 (by 23:59, on your own github repo)

  1. Which of the following are valid variables names in SAS? Which are valid in R? Explain (briefly)
A.	Weight
B.	WeightKg
C.	Weight_Kg
D.	Speed-MPH
E.	 x321z333
F.	76ers
G.	_var_

A, B, and C would be valid in R In R, a variable name must start with a letter and can have underscores, periods, and numbers.

  • D has a hyphen
  • E starts with a space
  • F starts with a number
  • G starts with an underscore

A, B, C, and G would be valid in SAS In SAS, a variable name must begin with a letter or an underscore and there can't be any special characters. D has a -, E starts with a space (which can't be part of the name), and F starts with a number.

  • D has a hyphen
  • E starts with a space
  • F starts with a number
  1. In SAS, which of the following is a valid data set name?
A.	Hospital
B.	hospital
C.	data-set
D.	100questions
E.	Demographics_2016

A, B, and E

  1. Can a program in SAS have two different variables, one called Score and the other one called score? How about an R program?

In SAS: No, as SAS is not case sensitive. In R: Yes, since R is case sensitive.

  1. You have a data set consisting of Student ID, English, Math, History, and Science test scores for 25 students. What is the number of variables? What is the number of observations?

The number of variables would be 5, and the number of observations would be 25.

  1. TRUE or FALSE?
A.	You can put more than one SAS statement on a single line.
B.	You can put more than one R statement on a single line.
C.	You can use several lines for a single SAS statement.
D.	You can use several lines for a single R statement.
E.	SAS has three data types: character, numeric, and integer. If you think it’s FALSE, state 2 other data types.
F.	R has three data types: character, numeric, and integer. If you think it’s FALSE, state 2 other data types.
G.	In SAS, OPTIONS and TITLE statements are considered global statements.

A. TRUE B. TRUE, we can separate them with a semicolon. C. TRUE, words must stay together though. D. TRUE E. FALSE, it just has numeric and character data types F. FALSE. Logical and complex are 2 other data types G. TRUE

  1. What is the default storage length for SAS numeric variables (in bytes)? How about in R? In SAS: 8 bytes In R: also 8 bytes

  2. The file stocks.txt contains three columns: stock symbol, stock price, and the number of shares.

A.	In SAS, create a temporary SAS data set called portfolio with three variables called: ticker, price, shares.
B.	Create a new variable called value, and set it to be the stock price times the number of shares.
C.	Write the appropriate statement to compute the average price and average number of shares of the stocks in your portfolio.
D.	Include proper documentation in your program. This must also include your name, and the date the program was created.
E.	Repeat A-D in R.

SAS Code:

/* Shea van den Broek 2/11/24 */
data portfolio; /* creating dataset name */
	infile 'hsv23001\Desktop\stocks.txt' dsd dlm = ' ';
	input ticker $ price shares; /* column names */
	value = price*shares;
	run;
proc print data = portfolio;
proc means data = portfolio mean; /* calculating means */
	var price shares;

R Code:

# Shea van den Broek, 2/11/24
# reading in the data
portfolio <- read.table("~/STAT5225/Data/stocks.txt", quote="\"", comment.char="")
colnames(portfolio) = c("ticker", "price", "shares") # assigning column names
portfolio$value = portfolio$price*portfolio$share #new variable called value
mean(portfolio$price) # average price
mean(portfolio$shares) # average number of shares of the stocks in your portfolio
  1. In the SAS program below, add the necessary statements to compute five new variables:

A. Weight in kilograms (1kg = 2.2 pounds). Name this variable WtKg.

B. Height in centimeters (1 inch = 2.54 cm). Name this variable HtCm.

C. Average blood pressure = diastolic BP plus one third of the difference between systolic BP and diastolic BP. Call it AveBP.

D. A variable equal to 2 times the height squared, plus 1.5 times the height cubed. Call it HtPolynomial.

E. BMI = Weight in Kilograms / ( Height in Meters x Height in Meters )

Attaching my SAS Code Below:

data prob1_8;
   input ID $
         Height /* in inches */
         Weight /* in pounds */
         SBP    /* systolic BP  */
         DBP    /* diastolic BP */;
   WtKg = Weight/2.2; /* A. Weight in KG */
   HtCm = Height * 2.54; /* B. Height in CM */
   AveBP = DBP + (1/3)*abs(SBP - DBP); /* C. Avg Blood Pressure */
   HtPolynomial = 2*Height**2 + 1.5*Height**3; /* D. Polynomial */
   BMI = WtKg/((HtCm/100)**2); /* E. BMI */
datalines;
001 68 150 110 70
002 73 240 150 90
003 62 101 120 80
;
  1. Repeat question 8 in R. Use the following code:
ID <- c(001, 002, 003)
Height <- c(68, 73, 62)
Weight <- c(150, 240, 101)
SBP <- c(110, 150, 120)
DBP <- c(70, 90, 80)
### insert your code here
df_1_8 <- data.frame(ID, Height, Weight, SBP, DBP
                     # insert the new variables here
                     )
cat("Content of the df_1_8 data frame:\n")
df_1_8

Attaching my R code below:

ID <- c(001, 002, 003)
Height <- c(68, 73, 62)
Weight <- c(150, 240, 101)
SBP <- c(110, 150, 120)
DBP <- c(70, 90, 80)
### my code

# A. Weight in kilograms (1kg = 2.2 pounds)
WtKg <- Weight/2.2

# B.	Height in centimeters (1 inch = 2.54 cm)
HtCm <- Height*2.54

# C.	Average blood pressure = diastolic BP plus one third of the difference 
# between systolic BP and diastolic BP
AveBP <- DBP + (1/3)*abs(SBP-DBP)

# D.	A variable equal to 2 times the height squared, plus 1.5 times the height cubed
HtPolynomial <- 2*Height^2 + 1.5*Height^3

# E.	BMI = Weight in Kilograms / ( Height in Meters x Height in Meters )
BMI <- WtKg/((HtCm/100)*(HtCm/100))

df_1_8 <- data.frame(ID, Height, Weight, SBP, DBP, WtKg, HtCm, 
                AveBP, HtPolynomial, BMI)
cat("Content of the df_1_8 data frame:\n")
df_1_8
  1. What is wrong with the following SAS program?
data new-data;
infile prob10data.txt
input x1 x2
y1 = 3(x1) + 2(x2);
y2 = x1 / x2;
new_var = x1 + x2 - 37;
run;
  • new-data is not a valid dataset name (with the hyphen)
  • no semicolons on the second and third lines
  • need operators (*) on the fourth line - can't just have parentheses
  1. What is wrong with the following R program?
d <- 3
y <- D+3
z <- 4d
dat <- read.csv(my file.csv, sep=”\t”)
n <- c(1, 2, 3, 4)
mean(n)
  • 2nd line: the y variable is assigned to D + 3, but there is no D variable (only d)
  • 3rd line: the z variable is assigned to 4d, but there needs to be a multiplication operator
  • 4th line: In the dat assignment line, the .csv file name needs quotes around it, which would also produce an error