قراءة
عرض

Biostatistics

Lab 2
Presentation of Data
Part I
1

Descriptive statistics

Descriptive statistics: are mathematical formulas and functions which help us describe, summarize and communicate the main characteristics of large amounts of data.
2

Inferential statistics

Inferential statistics: are mathematical formulas and functions which help us make guesses and inferences about the characteristics of whole populations.
3

GROUPED DATA

To group a set of observations, we select a set of contagious, non overlapping intervals, such that each value in the set of observation can be placed in one, and only one, of the interval, and no single observation should be missed.
The interval is called:
CLASS INTERVAL.


NUMBER OF CLASS INTERVALS
The number of class intervals :
Should not be too few because of the loss of important information. and
Not too many because of the loss of the needed summarization .

When there is a prior classification of that particular observation we can follow that classification ( annual tabulations), but when there is no such classification we can follow the
Sturge's Rule

NUMBER OF CLASS INTERVALS


Sturge's Rule:
k=1+3.322 log n

k= number of class intervals

n= number of observations in the set

The result should not be regarded as final, modification is possible

WIDTH OF CLASS INTERVAL
The width of the class intervals should be the same, if possible.

R
W = --------
K

W= Width of the class interval
R= Range (largest value – smallest value)
K= Number of class intervals


FREQUENCY DISTRIBUTION
It determines the number of observations falling into each class interval
• Frequency
• Fasting blood glucose levels
• 1
• < 60
• 1
• 60-62
• 5
• 63-65
• 1
• 66-68
• 1
• 69-71
• 1
• 72+
• 10

RELATIVE FREQUENCY DISTRIBUTION

It determines the proportion of observation in the particular class interval relative to the
total observations in the set.
• Relative frequency
• %
• Frequency
• Fasting blood glucose levels
• 10
• 1
• < 60
• 10
• 1
• 60-62
• 50
• 5
• 63-65
• 10
• 1
• 66-68
• 10
• 1
• 69-71
• 10
• 1
• 72+
• 100
• 10


CUMULATIVE FREQUENCY DISTRIBUTION
This is calculated by adding the number of observation in each class interval to the number of observations in the class interval above, starting from the second class interval onward.
• Cumulative frequency distribution
• Frequency
• Fasting blood glucose levels
• 1
• 1
• < 60
• 2
• 1
• 60-62
• 7
• 5
• 63-65
• 8
• 1
• 66-68
• 9
• 1
• 69-71
• 10
• 1
• 72+
• 10


CUMULATIVE RELATIVE FREQUENCY DISTRIBUTION
This calculated by adding the relative frequency in each class interval to the relative frequency in the class interval above, starting also from the second class interval onward.
• Cumulative relative frequency distribution
• Relative frequency %
• Cumulative frequency distribution
• F
• Fasting blood glucose levels
• 10
• 10
• 1
• 1
• < 60
• 20
• 10
• 2
• 1
• 60-62
• 70
• 50
• 7
• 5
• 63-65
• 80
• 10
• 8
• 1
• 66-68
• 90
• 10
• 9
• 1
• 69-71
• 100
• 10
• 10
• 1
• 72+
• 100
• 10


Central Tendency
Suppose that we have a large group of numbers. Typically these numbers will be dependent variable measurements (e.g., reaction time, blood pressure, hours spent in exercise per day) on the individuals who participate in our research.
Central tendency refers to our intuition that there is a center around which all these scores vary.

12

Measures of Central Tendency

We might have ten numbers or a hundred or a thousand or ten thousand numbers. For large data sets all those numbers are quite a jumble of information to process. We'd like to find a single number that typifies all those numbers, that indicates what their center is
The three branches of central tendency are:
The mean,
The median, and
The mode
13

Measures of Central Tendency; The Mean:

It is the average of the data or the sum of all values of a set of observation divided by the number of these observations.

Calculated by this equation:

∑ X
Mean of population μ =----------
N
_ ∑ X
Mean of sample X = ---------
n


14

Measures of Central Tendency; The Weighted Mean:

The individual values in the set are weighted by their respective frequencies.

_ ∑ (n . X)
X w = -------------
N
15

Measures of Central Tendency; The Median (50th percentile)

After creating an ordered array
The median of a data set is the value that lies exactly in the middle.
The position of the median depends on the number of observations
For odd number of observations: (n+1/2)
For even number of observations:
(n/2) and (n/2 +1)
(The value of the two positions divided by 2)
16

Measures of Central Tendency; The Mode:

• It is the value which occurs most frequently.
• Data distribution with one mode is called unimodal
• If all values are different there is no mode or nonmodal.
• Sometimes, there are more than one mode.
• two modes is called bimodal; more than two is called multimodal distribution.
17


Examples
Although the mean is often an excellent summary measure of a set of data, the data must be approximately normally distributed, because the mean is quite sensitive to extreme values that skew a distribution.
Example
In an outbreak of hepatitis A, 6 persons became ill with clinical symptoms. The incubation periods for the affected persons (xi) were 29, 31, 24, 29,30, and 25 days.
_ ∑ X
X = --------- = 168/6 = 28 days
n

If the largest value of the six listed incubation periods were 131 instead of 31, the mean would change from 28.0 to ?
18
(24+25+29+29+30+25+131)/6 =44.7 days

Examples

What about the Median & the Mode?
Finding the Median
• Position of the Median:
• Arrange data in order (24, 25, 29,29,30,31)
• Find position of the median; in even no.=n/2 & (n/2)+1, (observations no.3 &4)
• The value of the median is the average of the TWO VALUES (29)
Finding the Mode:
The most frequent observation
Mode = (29)


if the largest value of the six listed incubation periods were 131 instead of 31, what will happen to the Median & the Mode?
19
the Median & the Mode will remain the same

Working Groups

Exercises
20

Group (1) - Exercise 1

Listed below are data on parity collected from 19 women who participated in a study on reproductive health.
Organize these data into a frequency distribution.
0, 2, 0, 0, 1, 3, 1, 4, 1,8, 2, 2, 0, 1, 3, 5, 1, 7, 2
21

Group (1) - Exercise 2

Consider the data taken from a study that examines the response to ozone and sulfur dioxide among adolescents suffering from asthma. The following are the measurements of forced expiratory volume (liters) for 10 subjects:
3.5, 2.6, 2.8, 4.0, 2.3, 2.7, 3.0, 4.0, 2.9, 3.0
Calculate the measures of central tendency.
22

Group (2) - Exercise 1

Sixteen primary school children were examined to see the no. of decayed teeth in each, the no. of decayed teeth were as follow;
3,5,2,4,0,1,3,5,2,3,2,3,3,2,4,1
Construct a frequency table for those data.
23


Group (2) - Exercise 2
For the following haemoglobin values (gm/dl), find the mean, mode, and median.
12, 14, 16, 15, 8, 10, 10, 13, 11, 14, 15, 10, 10, 17, 14
24

Group (3) - Exercise 1

The Wt of malignant tumor (in gm) removed from the abdomen of 57 subjects are:
68, 63, 42, 27, 30, 36, 28, 32, 79, 27, 22, 23, 24, 25, 44, 65, 43, 25, 74, 51, 36, 42, 28, 31, 28, 25, 45, 12, 57, 51, 12, 32, 49, 38, 42, 27, 31, 50, 38, 21, 16, 24, 69, 47, 23, 22, 43, 27, 49, 28, 23, 19, 46, 30, 43, 49, 12.
Construct a frequency table
25

Group (3) - Exercise 2

A sample of 15 patients making visits to a health center traveled these distances in miles, calculate measures of central tendency:
5, 9, 11, 3, 12, 13, 12, 6, 13, 7, 3, 15, 12, 15, 5
26

The following table represents the cumulative frequency, the relative frequency & the cumulative relative frequency
• Cumulative
• Relative Frequency
• Relative
• Frequency
• R.f
• Cumulative Frequency
• Frequency
• Freq (f)
• Class interval
• (age in years)
• 5.8
• 5.8
• 11
• 11
• 30 – 39
• 30.1
• 24.3
• 57
• 46
• 40 – 49
• 67.2
• 37.1
• 127
• 70
• 50 – 59
• 91
• 23.8
• 172
• 45
• 60 – 69
• 99.5
• 8.5
• 188
• 16
• 70 – 79
• 100
• 0.5
• 189
• 1
• 80 – 89
• 100
• 189
• Total


R.f= (freq/n)*100
• Group (4) – Exercise 1

From the above frequency table, complete the table then answer the following questions:

1-The number of subjects with age less than 50 years ?
2-The number of subjects with age between 40-69 years ?
3-Relative frequency of subjects with age between 70-79 years ?
4-Relative frequency of subjects with age more than 69 years ?
5-The percentage of subjects with age between 40-49 years ?
6- The percentage of subjects with age less than 60 years ?
7-The Range (R) ?
8- Number of intervals (K)?
9- The width of the interval ( W) ?
• Group (4) – Exercise 1 ( cont.)

Group (4) - Exercise 2

Arterial blood gas analysis performed on a sample of 15 physically active adult males yielded the following resting PaO2 values:
75, 80, 80, 74, 84, 78, 89, 72, 83, 76, 75, 87, 78, 79, 88
Calculate the measures of central tendency.
29


Group 5; Exercise 1
• Mean age
• (months)
• No. of
• children
• Village
• 58
• 44
• 1
• 45
• 78
• 2
• 62
• 48
• 3
• 60
• 45
• 4
• 59
• 47
• 5
30
The mean age in months of preschool children in five villages are presented down; calculate the weighted mean of preschool children in these villages.


Group 5; Exercise 2
31
Mention clearly the main criteria of Class Interval.



رفعت المحاضرة من قبل: Abdalmalik Abdullateef
المشاهدات: لقد قام 10 أعضاء و 229 زائراً بقراءة هذه المحاضرة








تسجيل دخول

أو
عبر الحساب الاعتيادي
الرجاء كتابة البريد الالكتروني بشكل صحيح
الرجاء كتابة كلمة المرور
لست عضواً في موقع محاضراتي؟
اضغط هنا للتسجيل