版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、Statistical Description (1),Xiaojin Yu Introductory biostatistics http://www.hstathome.com/tjziyuan/Introductory%20Biostatistics%20Le%20C.T.%20%20(Wiley,%202003)(T)(551s).pdfIntroductory biostatistics for the health sc
2、iencehttp://faculty.ksu.edu.sa/hisham/Documents/eBooks/Introductory_Biostatistics_for_the_Health.pdf,Review,What is Medical statistics about? key terms in Statistics,2,3,1.2 key words in Statistics,Population(individua
3、l) & sampleVariation & random variableRandom Variable & dataStatistic & parameterSampling errorProbability,4,Framework of statistical analysis,populationindividual, variation,parameterunknown,sampl
4、erepresentative,sampling error,,Randomly sampling,Statisticsknown,,,,Statistical inference based on probabililty,,Statistical Description,5,Statistical Description CONTENTS,For quantitative(numerical) dataFrequency
5、 distributionMeasures of central tendencyMeasures of dispersionFor qualitative(categorical) data,6,Raw Data (quantitative),Example: 120 values of height (cm) for 12-year-old boys in 1997: 142.3 156.6 142.7 145.
6、7 138.2 141.6 142.5 130.5 134.5 148.8134.4 148.8 137.9 151.3 140.8 149.8 145.2 141.8 146.8 135.1150.3 133.1 142.7 143.9 151.1 144.0 145.4 146.2 143.3 156.3141.9 140.7 141.2 141.5 148.8 140
7、.1 150.6 139.5 146.4 143.8143.5 139.2 144.7 139.3 141.9 147.8 140.5 138.9 134.7 147.3138.1 140.2 137.4 145.1 145.8 147.9 150.8 144.5 137.1 147.1142.9 134.9 143.6 142.3 125.9 132.7 152.9 14
8、7.9 141.8 141.4140.9 141.4 160.9 154.2 137.9 139.9 149.7 147.5 136.9 148.1134.7 138.5 138.9 137.7 138.5 139.6 143.5 142.9 129.4 142.5141.2 148.9 154.0 147.7 152.3 146.6 132.1 145.9 146.7 1
9、44.0135.5 144.4 143.4 137.4 143.6 150.0 143.3 146.5 149.0 142.1140.2 145.4 142.4 148.9 146.7 139.2 139.6 142.4 138.7 139.9,7,Data Summary For continuous variable data,Numerical methods Description
10、 of tendency of central Description of dispersionTabular and graphical methods,8,Tabular & Graphical Methods,Frequency table Histogram,9,FREQUENCY TABLE,10,SOLUTION TO EXAMPLE,1.number of intervals k=10
11、2 calculate the width R=Xmax-Xmin= 160.9- 125.9=35 w=R/k W=35/10=3.53.form the intervals4.counting frequencyA recommended step is to present the proportion or relative f
12、requency.,11,Class intervals,,12,12,Tally and Counting,13,13,Final Frequency Table,A recommended step is to present the proportion or relative frequency.,14,Basic Steps to Form Frequency Table,step1: determining the numb
13、er of intervals 5-15step2: calculating the width of intervalsStep3: forming intervals- certain range of valuesStep4: count the number of observation with certain interval the final table consists of the interva
14、ls and the frequencies.,15,Figure 2.1 Distribution of heights of 120 boys from China,1997,Frequency,16,Present data graphically,presenting data visuallyintuitivelyeasy to read and understandself-explanatory stand a
15、lone from text Statistical table and graph are intended to communicate information, so it should be easy to read and understand.The shape of the distribution is the characteristic of the variable.,17,Application,One
16、 lead to a research questionconcerns unimodal and symmetry of the distribution,18,18,Shape of frequency distribution,DistributionUnimodal/bimodal Symmetry /skew,,19,Unimodal/bimodal,Homogeneous /heterogeneousThe d
17、efinition of population or the classification is approapriate.,20,SYMMETRY & SKEWNESS,Symmetric means the distribution has the same shape on both side of the peak location.Skewness means the lack of symmetry in a pr
18、obability distribution. (The Cambridge Dictionary of Statistics in the Medical Sciences.)An asymmetric distribution is called skew. (Armitage: Statistical Methods in Medical Research.),21,Figure 2.2 Symmetric
19、And Asymmetric Distribution,positive skewness,negative skewness,22,Positive & Negative Skewness,A distribution is said to have positive skewness when it has a long thin tail at the right, and to have negative skewne
20、ss when it has a long thin tail to the left.A distribution which the upper tail is longer than the low, would be called positively skew.,23,Frequency,24,,,,,,,,,Fig. The distribution of scores of QOL (quality of life
21、) of 892 senior citizen,0 10 20 30 40 50 60 70 80 90 100,QOL,400300200100 0,Frequency,25,Frequency,26,Fig. The distribution of ages at death of males in 1990~1992,0 5 10
22、 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85,Age at death (year),2500200015001000 500 0,Frequency,27,Numerical methods,Central tendencyTendency of dispersion,arithmetic mean, Median, geometric mean r
23、ange, interquartile range, standard deviation, variance, coefficient of variation,,28,Mean,Concept and notationCalculation Application,29,CONCEPT OF MEAN,Arithmetic mean, meanPopulation mean μThe Sample mean will
24、be denoted by x (‘‘x-bar’’).,30,CALCULATION OF MEAN,given a data set of size n {x1,x2,…,xn},The mean is computed by summing all the x’s and divided the sum by n. symbolically,31,GROUPED DATA
25、,The mean can be approximated using the formulaWhere f denotes the frequency ,m the interval midpoint ,and the summation is across the intervals.,32,Midpoint,The midpoint for an interval is obtained by calculating
26、 the average of the interval lower true boundary and the upper true boundary.The midpoint for the first interval is The midpoint for the second interval is,33,Example 1,34,Average: Limitation in describing data,It has
27、been said that a fellow with one leg frozen in ice and the other leg in boiling water is comfortable ON AVERAGE !,35,Geometric Mean-notation,The geometric mean is defined as the nth root of the produc
28、t of n numbers, i.e., for a set of numbers.G /GM,36,Geometric Mean-calculation,As the definition, the expression is,Example like, the G for 2, 4, 8(n=3) should be like:,37,Geometric mean:,,37,38,38,Geometric Mean
29、-calculation,Example1_geo given a data set consisting of survival times to relapse in weeks of 21 acute leukemia patients that received some drug.1,1,2,2,3,4,4,5,5,8,8,8,8,11,11,12,12,15,17,22,23(n=21)The mean is 8.67
30、 weeks,Ex. Serum HI antibody dilution from 107 testees after measles vaccination,39,,hemagglutination inhibition(HI),application,positive skew data_if log transformation creates symmetric, unimodalGeometric series.,40,4
31、1,Median,Concept of medianCalculation Application-disadvantage $ advantage,42,Concept of Median,If the data are arranged in increasing or decreasing order, the median is the middle value, which divided the set into equ
32、al halves.,M sample median,,43,Calculation-how do we get it?,M=56,Example1 n=11,a. When n is odd,,44,Calculation-how do we get it?,Example2 n=12,M=(56+58)/2=57,b. When n is even,,45,Application-Advantage,It is robust
33、 to the extreme value.,Mean=58.42 mean=149.9Median=57,,,46,Application-when is it used?,Fig.A skew distribution.,47,Data described by Median,Skew dataNormal distribution dataOrdinal data!!,48,For
34、 normal distribution,49,Figure 3 the average of height of basketball players.,50,Disadvantage of median,the precise magnitude of most of the observations are not taken. if two groups of observations are pooled, the me
35、dian of the combined group cannot be expressed in terms of the medians of the two component.,,51,Summary: Choosing the most appropriate measure,symmetric, unimodal-mean if log transformation creates symmetric, unimod
36、al-geometric meandistribution free, uncertain data-median Outlier or skewed data-median Ordinal data-median,52,Measure of Dispersion,range, interquartile range, Variance& standard deviation
37、, coefficient of variation,53,Percentile(quantile),X% PX (100-X)%Quartiles:Lower (First) quartile: 25% (QL) p25Second quartile:medianUpper (Third) quartile:75%
38、(QU)p75,,54,Measures Of Dispersion,,,,,,,,,,,,,,,,,,,,55,Range & Inter-quartile Range,R = xmax-xmin QU - QL = P75 - P 25Obviously, range and inter-quartile are simple and easy to explain. However,
39、 there are a few difficulties about use of the range. 1.The first is that the value of the range is determined by only two of the original observations. 2.Second, the interpretation of the range depends on the number o
40、f observations in a complicated way, which is a undesirable feature.,56,variance s2,An alternative approach is to make use of deviations from the mean, x-xbar; the greater the variation in the data set, the larger the ma
41、gnitude of these deviations will tend to be.From this deviation, the variance s2 is computed by squaring each deviation, adding them and dividing their sum by one less than n.,n-1: degree of freedom, df,57,Variance,A
42、 population variance is denoted by σ2,A sample variance is denoted by s2,,57,58,The following should be noted,It would be no use to take the mean of deviations becauseTaking the mean of the absolute values, for exam
43、ple, is possibility. However, this measure has the drawback of being difficult to handle mathematically.,59,standard deviation, SD,The variance s2 have the units that are the square of the original units. For example , i
44、f x is the time in seconds, the variance is measured in seconds squared(sec2). So it is convenient to have a measure of variation expressed in the same units as the original data, and this can be done by taking the squar
45、e root of the variance. This quantity is the standard deviation,,60,Formula for Calculation,In general the calculation using mean is likely to cause some trouble. If the mean is not a round number, say mean is 10/3, it
46、will need to be rounded off, and errors arise in the subtraction of this figure from each x. this difficulty can be overcome by using the following shortcut formula for the variance or SD.,Solution to calculation of s,61
47、,62,Example:,range variance sd meanGroup A: 8 10.03.16 30Group B: 1222.54.74 30Group C: 8 8.52.92 30,63,Coefficient Of Variation,
48、CV,nonzero mean.Make comparison between different distributions.for variables with different scale or unit;for variables with more different means.,64,Example:Comparing The Dispersion Of Two Variables,,mean sd
49、Height: 166.06(cm)4.95(cm)Weight:53.72(kg)4.96(kg),,65,What do the variance and SD tell us?,Large variance (or SD) means:more variable, wider range,lower degree of representativeness of mean.small varianc
50、e (or SD) means:less variable, narrower range,higher degree of representativeness of mean.,66,Which measure should be used?,sd, variancefor unimodal, symmetric, CVfor different units; for more different means.Ra
51、ngefor any distribution, Wasteful of information.Interquartilefor any distribution, robust, Wasteful of information.The subjects should be homogeneity!,67,Summary of Average and dispersion,Mean±sd(min,max)Medi
52、an±interquartile range(min,max)Using both average and dispersion.,68,SUMMARY,Each variable has its own distribution;Descriptive Using graphsUsing statisticsaverage:Mean, G, M Dispersion: sd, variance, Q,
53、 CV, RChoosing appropriate measurement;Using average with dispersion.,69,DATA SUMMARIZATION,Tabular and graphical methodsFrequency tablehistogramNumerical methods -Using statistics measures of location: arithme
54、tic mean, Median geometric mean, measures of dispersion: range, inter-Quartile range(IQR), standard deviation
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫(kù)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 統(tǒng)計(jì)指標(biāo)體系四、變異與變量五
- 考點(diǎn)離散型隨機(jī)變量及其分布列項(xiàng)分布及其應(yīng)用離散型隨機(jī)變量的均值與方差
- 隨機(jī)變量的分布函數(shù)
- 隨機(jī)變量及其分布5
- 隨機(jī)變量函數(shù)的分布
- 多維隨機(jī)變量及其分布
- 柑橘果實(shí)中代謝產(chǎn)物的分布與自然變異.pdf
- 考慮自變量個(gè)數(shù)先驗(yàn)分布的貝葉斯變量選擇.pdf
- 個(gè)體心理與個(gè)體行為
- 離散型隨機(jī)變量及其分布
- 25.1隨機(jī)變量及其概率分布
- 隨機(jī)變量的函數(shù)及其分布
- 隨機(jī)變量及分布列習(xí)題
- matlab實(shí)現(xiàn)正態(tài)分布到瑞利分布隨機(jī)變量
- 2 多維隨機(jī)變量聯(lián)合分布列和邊際分布列
- 2 多維隨機(jī)變量聯(lián)合分布列和邊際分布列
- 復(fù)雜疾病易感性與個(gè)體遺傳變異程度的關(guān)聯(lián)性分析.pdf
- 21 隨機(jī)變量及其概率分布(1)
- 專(zhuān)題復(fù)習(xí)之隨機(jī)變量及其分布
- 隨機(jī)變量及其分布 第2章
評(píng)論
0/150
提交評(píng)論