Summary of AnalyticalMedical Statistics
بسم الله الرحمن الرحيم
• Variable:
• Variable: This is a characteristic, that takes a different values in different persons, places, …etc. It may be measurable, e.g. systolic blood pressure, weight of children in a school. or non measurable, e.g. sex (M, F).• Variables may be further sub-classified according to their scale of measurement.
• 1. Qualitative: A categorical variable, non measurable. e.g: ( race, sex, bd.gr., place of birth, education, hair color, eyes color ….etc).
• Note: Binary variable have only 2 possible values as sex, test result +ve or -ve.
2. Quantitative: A numerical variable which may be taking some values it may be:
Discrete: limited numbers of possible values e.g. no. of previous pregnancy, age, no. of bacteria, parity, no. of patients,….etc.Continuous: taking some values in an infinity devisable range of values e.g. birth weight (1500 – 4500 gm), exact age, height, weight, uric acid level, blood pressure,….etc.
• Mother`s
• Length of stay after deliverydays• Singleton or multiple birth
• Birth weight (gm)
• Sex of baby
• No. of previous pregnancies
• Mother`s
• Socio-economic class
• Mother`s age
• (years)
• Consultant
• 2
• S
• 3460
• F
• 1
• I
• 31
• Mr Brown
• 3
• S
• 3740
• M
• 1
• III
• 25
• Miss White
• 4
• S
• 2790
• F
• 0
• III
• 24
• Miss White
• 2
• S
• 3340
• F
• 1
• I
• 30
• Mr Green
• 1
• S
• 3920
• M
• 3
• V
• 28
• Mr Black
• 3
• S
• 3250
• F
• 0
• I
• 24
• Miss White
• 2
• S
• 2875
• F
• 1
• V
• 26
• Mr Brown
• 4
• S
• 2975
• M
• 0
• V
• 23
• Mr Black
• 2
• S
• 3100
• F
• 4
• IV
• 26
• Mr Green
• 1
• S
• 2910
• M
• 1
• I
• 33
• Mr Green
• 2
• S
• 3455
• F
• 1
• V
• 25
• Miss White
• 4
• S
• 3795
• M
• 0
• V
• 20
• Mr Brown
• 2
• S
• 4070
• F
• 2
• I
• 20
• Mr Brown
• 10
• M
• 2580
• M
• 1
• V
• 32
• Mr Black
• 10
• M
• 2655
• M
• 1
• V
• 32
• Mr Black
• 4
• S
• 2510
• F
• 0
• V
• 24
• Miss White
Table 1.1: Consecutive series of birth at a district general hospital
• Univariate, bivariate, and multivariate data
• One way of looking at the data in Table 1.1 is to consider the variables one at a time.
• For example, we might start by concentrating on the birth weights and examine their distribution for all of the babies. This analysis would be classed as univariate.
• Similarly, the subset of data concerning the sex of the babies is univariate.
• To study the relation between the sex and birth weight of the babies—for example, whether the boys tended to be heavier than the girls. To answer this question we need data in the form of two variables. These are bivariate data comprising two linked pieces of information—the sex and the birth weight—for each baby.
• A more complex analyses might look at the interrelation of three, four, or even more variables using multivariate data. Table 1.1 is an example of a multivariate data set.
Statistical Analysis for Univariate variable
8"Z – test for one mean"
Prerequisite:
- Population: mean μ, σ2
- Sample: n, mean ¯x
Example 1: the mean age of a random sample of 10 individuals is 27 years. This sample drawn from a population with mean of 30 years and variance of 20. Can we conclude that the mean sample is really different from 30 years?
Answer: ¯x = 27 n = 10 μ = 30 σ2 = 20
1- H0: ¯x = 30 H1: ¯x ≠ 303- Z – test for one mean
4- Estimate Z calculated
9
5- Z cal. (2.12) > ZTab. (α/2) (1.96) → Reject H0 and accept H1
i.e. there is a real difference of the sample mean from that of the population.
Example 2: The mean level of prothrombine in the general population is known to be 20.0 mg / dl of plasma and standard deviation of 3.5 mg / dl. A sample of 49 patients showing vitamin K deficiency has a mean prothrombine level of 18.5 mg / dl.- Test whether the mean prothrombine level for the patients with vitamin K deficiency is different from that of the general population.
Answer: ¯x = 18.50 mg/dl n = 49
μ = 20.0 mg /dl σ = 3.5 mg /dlZ – test for one mean
23 أيلول، 13
10
Z-Test Concerning One-Proportion
Prerequisite:- Population: proportion P
- Sample: n, proportion p
23 أيلول، 13
11
Example 3: The CFR of tetanus in adults is around 40% with traditional treatment. In a sample of 100 tetanus patients treated with a new method clamed a CFR of 32%. Is this new treatment modality different from the traditional treatment?
Solution: Sample n = 100 p =0.32 q = 0.68
Population P0 = 0.40 Q0 = 0.6023 أيلول، 13
12
Z- one proportion
Example 4: In a survey 300 adults were interviewed, 123 said they had yearly medical checkup. If you know that the true proportion of adults having yearly medical checkup is 35%.
Solution: Sample n = 300 p = 123 / 300 = 0.41 q = 0.59
Population P0 = 0.35 Q0 = 0.65
Z- one proportion
23 أيلول، 13
13
Example 5: Suppose that the smoking rate among men is 25% in Iraq and we want to study the smoking rate among men in Mosul city. Of 100 males sampled, 15 were found to be smokers. Does the proportion of smokers in Mosul differ from that in Iraq?
Solution: p =15 / 100 =0.15 q = 0.85 n = 100 P0 = 0.25 Q0 = 0.75
23 أيلول، 1314
Z- one proportion
Z-Test Concerning Two-Proportions
used for the difference between two proportions.Prerequisite:
- Sample 1 : n1, p1
- Sample 2 : n2, p2
23 أيلول، 13
15
Solution: p1 = 51 / 100 = 0.51 q1 = 0.49 n1 =100
p2= 0.43 q2 = 0.57 n2 =100
Example 6: A study was conducted to see whether an important public health intervention would significantly reduce the smoking rate among men. Of n1 = 100 males sampled in 1965 at the time of the release of the Surgeon General’s report on the health consequences of smoking, 51 were found to be smokers. In 1980 a second random sample of n2 = 100 males, similarly gathered, indicated that 43 were smokers.
Z- Two proportion
23 أيلول، 13
16
Solution: p1 = 44 / 50 = 0.88 q1 = 0.12 n1 = 50
p2 = 42 / 50 = 0.84 q2 = 0.16 n2 = 50Example 7: Two drugs were applied in an experimental study. Drug A given to 50 patients and drug B given to another 50 patients too. At the end of the study, 6 patients in the 1st group and 8 patients in the 2nd group show no improvement to that medications. Is there is significant different between the two drugs?
Z- Two proportion
23 أيلول، 13
17
• The conditions in performing T-Test : - Measured data (Quantitative).- Population variance (σ2 ) is unknown.- n small or large.
18
t-Test
t-Test Concerning One-Mean
Prerequisite:
- Population: mean μ
- Sample: n, mean ¯x, SD
Solution: Population μ = 10
Sample: n = 11 ¯x = 122.8 /11 = 11.16 minutes
S2 =3.95 SD = 1.99 minutes
Example 8: A random sample of 11 subjects is selected from a population, which constitute the clotting time (minute) of plasma.
[ 7.9, 10.9, 11.3, 11.9, 15.0, 12.7, 12.3, 8.6, 9.4, 11.3, 11.5 ]
If you know the true mean clotting time of plasma in the population is 10 minutes. Can we clamed that the above sample mean is similar to that of the population?
T-test one mean
23 أيلول، 13
19
20
A- Independent two samples (unpaired t-test)B- Dependent two samples (paired t-test)
t-test for two means
A- Independent two samples (unpaired t-test)
Prerequisite:
Sample 1 : quantitative data: n1, mean1, SD1 e.g S. uric acid titer in 125 adult male
Sample 2 : quantitative data: n2, mean2, SD2 e.g S. uric acid titer in 110 adult female
21
Example 9: test whether their is a difference at onset of symptoms of lung cancer between male and females. To verify this aim, Age (years) at onset of symptoms in a sample of male lung cancer patients and in an independent sample of female lung cancer patients, were presented as follow:♀: 58, 52, 50, 49, 56, 52, 54, 48, 41, 37, 67, 70♂: 26, 41, 57, 66, 36, 55, 41, 61, 53, 50, 52, 37, 50
Answer: Unpaired t-test of two means
22
Example 10: The extend to which an infant’s health is affected by parental smoking is an important public health concern. The following data are the urinary concentrations of cotinine (a metabolite of nicotine); measurements were taken both from a sample of infants who had been exposed to household smoke and from a sample of unexposed infants.
Unexposed (n1 = 7): 8, 11, 12, 14, 20, 43, 111
Exposed (n2 = 8): 35, 56, 83, 92, 128, 150, 176 208
Answer: Unpaired t-test of two means
23
B- Dependent two samples (paired t-test):
The individual members of one sample are paired with particular members of the other sample.
These pairing consist of:
• - The same individual on different occasions (obtained observation twice for the same subject such as blood pressure before and after treatment
• - Or ESR measurement by two method in same group of subjects.
Prerequisite:
Two Samples related to each other e.g S. uric acid titer in 80 adult male before treatment (n1, mean1) and after treatment (n1, mean2).
24
Example 11: The systolic blood pressures of n = 12 women between the ages of 20 and 35 were measured before and after administration of a newly developed oral contraceptive. Data are shown in the following Table (the difference is after– before).
Answer: Paired t-test of two means
Answer: Paired t-test of two means
23 أيلول، 1325
Example 12: The serum albumin levels (g/dl) of six randomly chosen patients with dengue hemorrhagic fever were estimated before and after symptomatic treatment. Test whether the treatment of dengue fever does not alter the albumin level?
• After
• treatment
• Before
• treatment
• Patient's
• No.
• 5.2
• 4.8
• 1
• 4.9
• 4.1
• 2
• 5.2
• 5.3
• 3
• 4.8
• 3.9
• 4
• 4.6
• 4.5
• 5
• 4.4
• 3.8
• 6
26
test for more than two means
ANOVA test (F – test)
“Analysis of Variance”
Prerequisite:
Independent samples (> 2 samples).
Sample 1: quantitative data: n1, mean1, SD1 e.g S. uric acid titer in 125 healthy adult male
Sample 2: quantitative data: n2, mean2, SD2 e.g S. uric acid titer in 100 healthy elderly male
Sample 3: quantitative data: n3, mean3, SD3 e.g S. uric acid titer in 120 CRF male
Sample 4: quantitative data: n4, mean4, SD4 …..
T-test
127
Example 13: S. Cholesterol level in 4 groups of patients. Compare between them.
If using t-test of two mean so we need to repeated it for 6 times so better to apply F – test.
groups 1:
n= 20
Mean=200mg/dl
G 2:
n= 30
210mg/dl
G 3:
n= 26
190mg/dl
G 4:
n= 20
240mg/dl
2
3
4
5
6
28
Data in form of Contingency TablesApply Chi- squared Test (ϰ2)
Statistical Analysis for Bivariate variables
29
Chi- squared Test (ϰ2)
When there are 2 qualitative variables, the data are arranged in a contingency table. The categories for one variable define the rows R, and the categories for the another variable define the columns C.
Applied for 2 × 2, 2 × 3, 2 × 4, 3 × 3, 3 × 4 tables ….etc.
30
Example 14: Test whether the age of the car drivers affect on number of accidents.
• Column total
• Age of the drivers
• No. of
• accidents• 20 - 30
• 31 -40• 41 -50
• 51- 60
• 100
• 34
• 14
• 16
• 36
• 0
• 200
• 82
• 20
• 34
• 64
• 1
• 200
• 84
• 16
• 50
• 50
• 2
• 500
• 200
• 50
• 100
• 150
• Row total
Answer: 3×4 table, apply Chi- squared Test (ϰ2)
31
Example 15: Test whether the vaccine was effective or whether the difference could arisen by chance?
• Placebo
• Vaccine
• 100
• 80• 20
• Yes
• Influenza
• 360
• 140
• 220
• No
• 460
• 220• 240
E=52.2
Answer: 2×2 table, apply Chi- squared Test (ϰ2)32
Two quantitative variablesApply Simple Liner Correlation (r)
Correlation applied to study the relationships between different variables.
Correlation used to established and quantify the strength and direction of the relationship between two variables.
The relation between two continuous variables can be displayed graphically as a scatter plot. One variable is represented on the horizontal axis and the other on the vertical axis. Each pair of measurements is then represented as a dot. [ -1 < r < 1 ]
33
Example 16: Test whether there is an association between the body weight and plasma volume of 8 healthy men and draw the scatter diagram.
• Plasma volume (L)
• Weight (Kg)
• Subject
• 2.75
• 58.0
• 1
• 2.86
• 70.0
• 2
• 3.37
• 74.0
• 3
• 2.76
• 63.5
• 4
• 2.62
• 62.0
• 5
• 3.49
• 70.5
• 6
• 3.09
• 71.0
• 7
• 3.12
• 66.0
• 8
34
A coefficient of −1 means a perfect fit to a straight line sloping down from left to right (inverse relation) .
Birth weight and weight gain during infancy: r = -1
A coefficient of +1 means a perfect fit to a straight line rising from left to right.
Body temp and heart rate:
r = 1
35
Here the points show an imperfect fit to a straight line rising from left to right. The correlation coefficient is + 0.96.
Imperfect positive correlation
0 < r < 1A rather poorer fit to a straight line than (d) and sloping down from left to right. The coefficient is −0.87.
Imperfect negative correlation
-1 < r < 0
36
When the points on the scatter plot show a very poor fit to a straight line, the correlation coefficient is close to zero.
e.g.Cholestrol level and Hb level
No linear correlation
r = 0A non-linear relation can have a correlation coefficient close to zero even if it is very exact.
No linear correlation
r = 0