Biostatistics for Clinical and Public Health Research
1
MPH5041, 2013
Biostatistics for Clinical and Public Health Research
Test 2, Due April 14, 2013 by 11:59pm
Modules Covered: This Test covers Modules 1-3. All questions must be answered using
statistical knowledge in Modules 1-3 only. Required data analysis for Question 1 (SPSS or any
other statistical package can be used).
Assignment General Instructions:
This test is worth 5% of total marks for this subject. It will be marked out of 50
marks.
Please explain the results in plain/simple English (if applies).
Word Limit: Maximum 1200 Words.
Please download the “Test 2” and answer the questions within the WORD LIMIT
stated above and then upload the PDF file of the completed test by April 14,
11:59pm.
You must have a COVER PAGE, PRINT YOUR NAME, no sign required. See the
Section “Assignment and Test” on Moodle for cover page.
Instructions for Submission:
Please don’t add the data description or original tables/graphs given in the Questions to
your “SOLUTION”.
Upload the completed TEST 2 onto Moodle under the “Test 2 Drop Box” section.
Extension: Students seeking an extension should contact Baki Billah before the due date. Please
note that an extension will not be granted without valid grounds (e.g., illness with medical
certificate).
Important Note:
If you submit more than one file as an attachment, only the first file will be assessed and
rest of the files will be ignored, sorry!
Please use solution for Test 1 as a guideline for Question 1.
PLEASE DON’T discuss any of these questions on Assignment help – Discussion Forum.
2
Data Description: Consider a study that was conducted to investigate the risk factors for
mortality in cardiac surgery patients. Data was collected on Cardiac Surgery patients in public
hospitals in St Martine Island from 2005 to 2011. The risk factors for 30-day mortality (patients
died within 30-day of cardiac surgery) along with short name and their numerical codes (where
necessary) are shown in Table 1.
Table 1: Long and short name of selected variables for cardiac surgery patients

Risk Factor (Variable) Short
Name Code
Patient ID ID N/A
Age AGE N/A
Current smoking status SMOKE 0 for No, 1 for Yes
Gender SEX 1 for Male; 0 for Female
Urgency of procedure (surgery) STAT 0 for elective
1 for urgent
2 for emergency
3 for salvage
Diabetic Status DB 0 for No; 1 for Yes
Ejection fraction estimate (the
% of blood emptied from the
left ventricle at the end of the
contraction) EFE 0 for normal (>60%)
1 for mild (46-60%)
2 for moderate (30-45%)
3 for severe (<30%)
Procedure (surgery) type TP 0 for CABG only
1 for Valve only
2 for CABG + Valve
3 for others
Length of hospital stay LOS N/A
Body mass index BMI N/A
Mortality status MORT 0 for No; 1 for Yes
Preoperative dialysis DIAL 0 for No; 1 for Yes
Any blank/dot “cell” in the data file indicates missing value
AssignmentTutorOnline

Note:
o 30-day mortality (MORT): patients who die within 30 days of cardiac surgery.
o Ejection fraction: the % of blood the left ventricular pump off in each beat of heart.
3
Question 1 [25 marks]: Summarize the descriptive statistics for each of the variables listed in
Table 1 by mortality status and discuss the results. Note: Please do not copy and paste the SPSS
output; must summarize all the results in a single table in created in WORD (Hint: present list of
variables in Column 1 and summary statistics in subsequent columns).
Question 2 [2 marks]: Let us consider that a variable Y is heavily right skewed in the population. If you
draw a large sample from this population what should be the shape of this variable (Y) in the sample?
Justify your answer.
Question 3 [5 marks]: Consider that weight of tumor of bladder cancer patients in the population follows
normal distribution with a mean 50g and standard deviation 5g.
a) [2 mark] If a bladder cancer patient is selected randomly what is the probability that the tumor is
less than 45g?
b) [3 marks] If 4 of these patients are selected at random, calculate the probability that the average
weight of the 4 tumors (assume each patient has only one tumor) will be greater than 55g?
Question 4 [3 marks]: Juan makes a measurement in a chemistry laboratory and records the result in his
lab report. The standard deviation of the students’ lab measurement is 10 milligrams. Juan repeats the
measurement 4 times and records the mean of his 4 measurements.
a) [1 mark] What is the standard deviation of Juan’s mean result?
b) [2 marks] How many times must Juan repeat the measurement to reduce the standard deviation of
the sample mean to 2?
Question 5 [5 marks]: Suppose that in fact the blood cholesterol level of all men aged 20 to 30
is symmetric and bell shaped with mean 186 mg/dl and an unknown standard deviation.
a) [4 marks] Choose a simple random sample of 100 men from this population. The sample
standard deviation is 41 mg/dl.
1) What is the probability that the sample mean takes a value between 183 and 189
mg/dl?
2) What is the probability that the sample mean takes a value less than 191 mg/dl?
b) [1 marks] Choose a simple random sample of 1000 men from this population. Now what
is the probability that the sample mean falls within ±3 mg/dl of the population mean?
Question 6 [5 marks]: The age group to which Anne belongs has mean height 1.6 metre and
standard deviation 0.1 metre. The age group to which Devi belongs has mean height 1.2 metre
and standard deviation 0.08 metre. Anne is 1.7 metre tall. Devi is 1.36 metre tall. Which is the
taller for their age?
4
Question 7 [5 marks]: Find the true answer(s) for the following questions (selection of any false
answer(s) for a question will result in zero marks):
(A) It is necessary to estimate the mean blood sugar level by drawing a sample from a large
population of diabetic patients. The accuracy of the estimate will depend on:
a. The mean sugar level in the population;
b. The population size;
c. The sample size;
d. The way the sample is selected;
e. The variance of sugar level in the population.
(B) The prevalence of a condition in a population is 0.1. If the prevalence is estimated
repeatedly from samples of size 10, these estimates will form a distribution which:
a. Is a sampling distribution;
b. Is approximately normal;
c. Has mean 0.1;
d. Have variance 0.001;
e. None of the above is true.
(C) If the size of a random sample is increased, we would expect:
a. The mean to decrease;
b. The standard error of the mean to decrease;
c. The standard deviation to decrease;
d. The sample variance to increase;
e. The mean to increase.
(D) The standard error of the mean of a sample:
a. Measures the variability of the observations;
b. Is the accuracy with which each observation is measured;
c. Is a measure of how far the sample mean is likely to be from the population mean;
d. Is a measure of how far the sample observations to be from the population mean;
e. Is less than the estimated standard deviation of the population.
(E) Diastolic blood pressure has a distribution which is slightly skew to the right. If the mean
and standard deviation were calculated for the diastolic blood pressures of a random
sample of men:
a. There would be fewer observations below the mean than above it;
b. The standard deviation would be approximately equal to the mean;
c. The majority of the observations would be more than one standard deviation from
the mean;
d. The standard deviation would estimate the accuracy of blood pressure
measurement;
e. About 95% of observations would be expected to be within two standard
deviations of the mean.

1 MPH5041, 2013 Biostatistics for Clinical and Public Health Research
Test 2 in Biostatistics for Clinical and Public Health Research is due by 11:59 p.m. on April 14, 2013.
Modules covered include: Modules 1-3 are covered in this test. In Modules 1-3, all questions must be solved using statistical knowledge exclusively. Question 1 requires data analysis (SPSS or another statistical tool might be used).
Assignment Instructions for Use:
This test accounts for 5% of the total grade for this subject. It will be graded on a scale of 50.
Please provide an explanation in plain/simple English (if applies).
Maximum word count: 1200 words.
Please download “Test 2” and answer the questions within the WORD LIMIT indicated above by April 14, 11:59pm, and then upload the PDF file of the completed test.
You must have a COVER PAGE with your name printed on it; no signature is necessary. The cover page can be found in Moodle’s “Assignment and Test” section.
Please do not use the data description or the actual tables/graphs provided in the Questions in your “SOLUTION.”
Upload the completed TEST 2 to the “Test 2 Drop Box” section of Moodle.
Students who need a deadline extension should contact Baki Billah as soon as possible. Please be aware that an extension will not be granted unless there are valid reasons (e.g., illness supported by a medical certificate).
Important Note: If you submit multiple files as attachments, only the first one will be evaluated; the rest will be ignored, sorry!
Please use the Test 1 solution as a guide for Question 1.
DO NOT DISCUSS ANY OF THESE QUESTIONS ON THE DISCUSSION BOARD.
2
Consider the results of a study that looked into the factors that contributed to mortality in heart surgery patients. From 2005 to 2011, data on Cardiac Surgery patients in public hospitals in St Martine Island was collected. Table 1 lists the risk variables for 30-day mortality (patients who died within 30 days of cardiac surgery), as well as their short names and numerical codes (where applicable).
Table 1 shows the long and short names of chosen variables for patients undergoing heart surgery.

Factor of Risk (Variable)

Short sName Code
N/A N/A N/A N/A N/A N/A N
Age AGE N/A
Smoking status at the moment
SMOKE 0 if you don’t want to smoke, 1 if you do.
Gender SEX
Males receive a 1; females receive a 0.
Procedure’s urgency (surgery)
STAT 0 (optional)
1 indicates that something is urgent.
2 in case of emergency
3 for rescuing
Diabetic Status DB 0 indicates that you do not have diabetes; 1 indicates that you do.
Estimated ejection fraction (the percentage of blood evacuated from the left ventricle at the end of a contraction)
EFE 0 indicates that the condition is normal (>60%).
1 (46-60%), 2 (46-60%), 3 (46-60%), 4 (60%), 5 (60%), 6 (30-45 percent )
3 (about 30%) for severe
Type of procedure (surgical)
TP 0 for CABG only 1 for Valve only 2 for CABG + Valve 3 for others TP 0 for CABG only 1 for Valve only 2 for CABG + Valve 3 for others
Length of stay in the hospital
LOS N/A
BMI (body mass index) is a measurement of how
BMI N/A
Status of Mortality
MORT 1 for yes, 0 for no
Dialysis before surgery
DIAL 0 if you want to say no and 1 if you want to say yes.
Any “cell” in the data file that is blank or has a dot indicates that a value is missing.
AssignmentTutorOnline

Patients who die within 30 days of heart surgery are referred to as 30-day mortality (MORT).
o Ejection fraction: the percentage of blood pumped out by the left ventricle in each heartbeat.
3
Question 1 [25 points]: Summarize and discuss the descriptive statistics for each of the factors indicated in Table 1 according to mortality status. Note: Do not copy and paste the SPSS output; instead, build a single table in WORD that summarizes all of the data (Hint: show a list of variables in Column 1 and summary statistics in successive columns).
Question 2 [2 marks]: Assume that a variable Y in the population is strongly right skewed. What should the shape of this variable (Y) in the sample be if you take a big sample from this population?
Justify your response.
Consider the weight of tumors in bladder cancer patients in the general population, which follows a normal distribution with a mean of 50g and a standard deviation of 5g.
a) [2 marks] What is the probability that a bladder cancer patient’s tumor will be less than 45g if they are chosen at random?
b) [3 marks] Calculate the likelihood that the average weight of the four tumors (assuming each patient has just one tumor) will be larger than 55g if four of these individuals are chosen at random.
Question 4 [3 points]: In a chemistry lab, Juan takes a measurement and records the result in his lab report. The lab measurement of the students had a standard deviation of 10 milligrams. Juan repeats the measurement four times and takes the average of his four results.
a) [1 mark] What is Juan’s mean result’s standard deviation?
b) [2 points] How many times must Juan perform the measurement in order to lower the sample mean’s standard deviation to 2?
Question 5 [5 points]: Assume that all men aged 20 to 30 have a symmetric and bell-shaped blood cholesterol level, with a mean of 186 mg/dl and an unknown standard deviation.
a) [4 points] From this population, select a basic random sample of 100 guys. The standard deviation of the sample is 41 mg/dl.
1) How likely is it that the sample mean will fall between 183 and 189 mg/dl?
2) What is the likelihood that the sample mean will go below 191 mg/dl?
b) [1 point] From this population, select a simple random sample of 1000 men. How likely is it that the sample mean will be within 3 mg/dl of the population mean?
Question 6 [5 points]: Anne’s age group has a mean height of 1.6 meters and a standard variation of 0.1 meters. Devi’s age group has a mean height of 1.2 meters and a standard deviation of 0.08 meters. Anne stands at a height of 1.7 meters. Devi stands at a height of 1.36 meters. Which of them is taller than the other for their age?
4
Question 7 (five points): Determine the correct answer(s) to the following questions. (Selecting any wrong answer(s) for a question will earn you zero points):
(A) The mean blood sugar level must be estimated by taking a sample from a large group of diabetes individuals. The accuracy of the estimate will be determined by: a. the population’s mean sugar level; b. the population’s size; c. the sample size; d. how the sample is chosen; and e. the population’s sugar level variation.
(B) A condition’s prevalence in a population is 0.1. If the prevalence is calculated repeatedly from samples of size 10, a distribution will emerge that: a. is a sampling distribution; b. is essentially normal; c. has a mean of 0.1; d. has a variance of 0.001; e. none of the above is true.
(C) If the size of a random sample is increased, we can expect: a. the mean to fall; b. the standard error of the mean to fall; c. the standard deviation to fall; d. the sample variance to rise; e. the mean to rise.
(D) A sample’s standard error of the mean:
a. Is a measure of how far the sample mean is likely to be from the population mean; b. Is the accuracy with which each observation is measured; c. Is a measure of how far the sample observations are likely to be from the population mean; e. Is less than the population’s estimated standard deviation.
(E) The distribution of diastolic blood pressure is somewhat skewed to the right. If the mean and standard deviation of diastolic blood pressures of a random sample of men were calculated: a. There would be fewer observations below the mean than above it; b. The standard deviation would be approximately equal to the mean; c. The majority of the observations would be more than one standard deviation from the mean; d. The standard deviation would estimate the accuracy of blood pressure measurement; e. About 95% of the observations would be more than one standard deviation from the mean;

Published by
Thesis
View all posts