模式識(shí)別與人工智能之十_第1頁
已閱讀1頁,還剩64頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、Pattern Recognition & artificial IntelligenceLecture 10: 聚類算法(六),1,EXPECTATION MAXIMIZATION (EM) Jensen’s inequality Single Gaussian Model (SGM) Maximum Likelihood (ML) estimation EM estimatio

2、n Gaussian Mixture Model (GMM) Difference between K-means and EM Applications,Model-based clustering (1),2,3,Jensen’s inequality,Mathematical Foundation (1),Convex Functions,Definition: Let f be a real val

3、ued function defined on an interval I =[a, b], f is said to be convex on I, if,4,Jensen’s inequality,Mathematical Foundation (1),Convex Functions,Theorem: If f(x) is twice differentiable on [a, b] and f’’(x)≥0 on [a, b]

4、, then f(x) is convex on [a, b].,f’(x) increases gradually, which means f’’(x) ≥0,5,Jensen’s inequality,Mathematical Foundation (1),Concave Functions,Definition: Let f be a real valued function defined on an interval I

5、=[a, b], f is said to be concave on I, if,6,Jensen’s inequality,Mathematical Foundation (1),Concave Functions,6,Theorem: If f(x) is twice differentiable on [a, b] and f’’(x)<0 on [a, b], then f(x) is concave on [a, b]

6、.,f’(x) increases gradually, which means f’’(x)≤0,7,Jensen’s inequality,Mathematical Foundation (2),Expectation of a function,7,Theorem: If X is a random variable, and Y=g(X), then:,Where:,is the probability density of X

7、,discrete,Continuous,8,Jensen’s inequality,8,,Jensen’s inequality: For convex functionFor concave function,Generalized convex function,,Generalized concave function,,Equality holds if and only if

8、 or f is linear.,9,Single Gaussian Model (SGM),Sampling,10,Single Gaussian Model (SGM),Sampling,,Given x, it is a function of ? and ?2,We want to maximize it.(LIKELIHOOD FUNCTION),Independent,Likelihood Func

9、tion,Parameter space Θ X1 ,…, Xn : i.i.d. observations θ: an unknown parameter Θ: The set of all possible values of θ Joint p.d.f. or p.m.f. of X1 ,…, Xn,Maximum Likelihood Estimation,11,Likelihood Function

10、 of θ For observed x1 ,…, xn :,,The joint p.d.f. or p.m.f. is a function of χ1,…,χn for given θ .The likelihood function is a function of θ for given χ1,…,χn .,Likelihood Function,Maximum Likelihood Estimation,12

11、,Calculation of MLE,Maximum Likelihood Estimation: Need to find which maximizes the likelihood function .Simple Example:Two independent Bernoulli trials with suc

12、cess probability θ. θ is known : 1/4 or 1/3 (Θ).The probabilities of observing x = 0, 1, 2 successes can be calculated. Let’s look at the following table.,Maximum Likelihood Estimation,13,Calculation of MLE,Maximum Li

13、kelihood Estimation,Probability of Observing x Successes The MLE is chosen to maximize for observed χ. χ=0: χ =1 or 2:,14,Calculation of MLE,Maximum Likelihood Est

14、imation,Log-likelihood function: Setting the derivative of L(θ) equal to zero and solving for θ.Note: The likelihood function must be differentiable and then this method can be used.,15,Example : Nor

15、mal Distribution,Suppose x1,…,xn is a random sample from a normal distribution with p.d.f.: a vector parameter θ: Likelihood Function:,Maximum Likelihood Estimation,16,Maximum Likelihood Estimation,17,Maximizethis

16、 instead,By setting,and,Example : Normal Distribution,Maximum Likelihood Estimation,18,Example : Normal Distribution,19,IF THERE IS MISSING DATA, HOW TO ESTIMATE THE MODEL PARAMETERS?IF IT IS NOT SGM BUT THE GAUSSIAN MI

17、XTURE MODEL, HOW TO ESTIMATE THE PARAMETERS FOR EACH MODEL?,Problems,What is EM?,In statistics, an expectation–maximization (EM) algorithm is an iterative method for finding maximum likelihood or maximum a posteriori (MA

18、P) estimates of parameters in statistical models, where the model depends on unobserved latent variables Motivation: to deal with Incomplete dataData with latent variable (hidden). For example, clustering the people i

19、n our lab, a latent variable exists: if it is the height, then Gaussian distribution; if it is gender, then Bernoulli distribution,20,EM estimation,Basic setting in EM,21,EM estimation,X is a set of data points: observed

20、 dataΘ is a parameter vector.EM is a method to find θML whereCalculating P(X | θ) directly is hard.Calculating P(X,Y|θ) is much simpler, where Y is “hidden” data (or “missing” data).,Recall: maximum likelihood,22,

21、EM estimation,,Latent variables or missing data,23,EM estimation,,Incomplete Data,Complete Data,Missing data,Complete Data Likelihood,EM estimation,,Complete Data Likelihood,EM estimation,,Log likelihood,In many cases, w

22、e cannot find the solution directly. An alternative is to find a sequence:,s.t.,Complete Data Likelihood,EM estimation,Define: : the probability of y : function of y, namely g(y),0,Ln(x) is str

23、ictly concave on (0, ),,Complete Data Likelihood,EM estimation,We need to choose the maximum lower bound,How to choose Q(y)?,If is given, Q(y) and P(xi, y) will determine the value of l(𝛩),Θ,Step1: find Q(

24、y) (Expectation)Step2: according to Q(y), maximize l(𝛩) by adjusting 𝛩 (Maximization),EM estimation,Expectation Step: how to choose Q(y),According to the Jensen inequality, when g(y) is a constant, the l

25、ower bound is maximum.,We know that 𝑄(𝑦 is the probability of y, so,It is easily obtained:,EM estimation,Expectation Step: how to choose Q(y),EM estimation,Maximization Step: maximize l(𝛩) by ad

26、justing 𝛩,EM algorithm is an iterative procedure for maximizing . Assume that after the Iteration the current estimation for is given by . Since the objective is to maximize , we wish t

27、o compute an update estimate such that: Equivalently, we want to maximize the difference,EM estimation,How to guarantee the likelihood function L(θ) is increased at each step?,Denote the hidden random vector by ,

28、 the total probability is,EM estimation,How to guarantee the likelihood function L(θ) is increased at each step?,EM estimation,How to guarantee the likelihood function L(θ) is increased at each step?,EM est

29、imation,Our objective is to choose a value of so that is maximized.,Gaussian Mixture Model,35,Single Gaussian,Solution: MLE by maximizing,,Gaussian Mixture Model,36,Multiple Gaussians,Which component do

30、es each point belong to?Solution: ???,EM for GMM,37,The aim is to estimate the unknown parameters representing between the Gaussians and the means and covariances of each:,let x=(x1,x2,…,xn) be a sample of n independent

31、 observations from a mixture of multiple normal distributions of dimension d, and let z=(z1,z2,…,zn) be the latent variables that determine the component from which the observation originates.,where,with,EM for GMM,38,Th

32、e incomplete-data likelihood function is:,The complete-data likelihood function is:,where𝕀 is an indicator function and f is the probability density function of a multivariate normal.,EM for GMM,39,This may be re

33、written in log likelihood function form:,E-step:,,,EM for GMM,40,M-step:,,,Using MLE to find solution of,,,EM for GMM,41,M-step:,,,Eg:,SMM:,GMM:,EM for GMM,42,M-step:,,,Eg:,SMM,GMM,EM for GMM,43,M-step:,,,Eg:,,S.t.,,Diff

34、erence between K-means and EM,44,K-means,Classifier(K-Means),Classification Resultsx1?C(x1)x2?C(x2)…xi?C(xi)…,Cluster Parametersm1 for C1m2 for C2…mk for Ck,,Difference between K-means and EM,45,K-means,x1={r1,

35、 g1, b1}x2={r2, g2, b2}…xi={ri, gi, bi}…,Classification Results (1)C(x1), C(x2), …, C(xi),Initial Guess of Cluster Parametersm1 , m2 , …, mk,Input (Known),Output (Unknown),Cluster Parameters(1)m1 , m2 , …, mk,Cla

36、ssification Results (2)C(x1), C(x2), …, C(xi),Cluster Parameters(2)m1 , m2 , …, mk,Classification Results (ic)C(x1), C(x2), …, C(xi),Cluster Parameters(ic)m1 , m2 , …, mk,,,???,???,???,???,Difference between K-means

37、and EM,46,K-means,Boot Step:Initialize K clusters: C1, …, CKEach Cluster is represented by its mean mjIteration Step:Estimate the cluster of each dataRe-estimate the cluster parameters,Difference between K-means a

38、nd EM,47,Kmeans ->EM,Boot Step:Initialize K clusters: C1, …, CKIteration Step:Estimate the cluster of each dataRe-estimate the cluster parameters,(?j, ?j) and P(Cj) for each cluster j.,For each cluster j,Expect

39、ation,Maximization,,,Difference between K-means and EM,48,Classifier(EM),Classification Resultsp(C1|x1)p(Cj|x2)…p(Cj|xi)…,Cluster Parameters(?1,?1),p(C1) for C1(?2,?2),p(C2) for C2…(?k,?k),p(Ck) for Ck,,EM,Dif

40、ference between K-means and EM,49,EM,x1={r1, g1, b1}x2={r2, g2, b2}…xi={ri, gi, bi}…,Cluster Parameters(?1,?1), p(C1) for C1(?2,?2), p(C2) for C2…(?k,?k), p(Ck) for Ck,Input (Known),Output (Unknown),Classificati

41、on Resultsp(C1|x1)p(Cj|x2)…p(Cj|xi)…,,Difference between K-means and EM,50,EM(E-step),x1={r1, g1, b1}x2={r2, g2, b2}…xi={ri, gi, bi}…,Cluster Parameters(?1,?1), p(C1) for C1(?2,?2), p(C2) for C2…(?k,?k), p(

42、Ck) for Ck,Input (Known),Output,Classification Resultsp(C1|x1)p(Cj|x2)…p(Cj|xi)…,Estimation,+,,Difference between K-means and EM,51,EM (M-step),x1={r1, g1, b1}x2={r2, g2, b2}…xi={ri, gi, bi}…,Cluster Parameters

43、(?1,?1), p(C1) for C1(?2,?2), p(C2) for C2…(?k,?k), p(Ck) for Ck,Input (Known),Input (Estimation),Output,+,,Classification Resultsp(C1|x1)p(Cj|x2)…p(Cj|xi)…,Difference between K-means and EM,52,EM,Boot Step:Ini

44、tialize K clusters: C1, …, CKIteration Step:Expectation StepMaximization Step,(?j, ?j) and P(Cj) for each cluster j.,53,Case Study (1),We have a data set with 1000 2-D data points (three clusters, k=3), how can we

45、 cluster it?,54,Case Study (2),Step 1: Initialize the partitionRandomly select 3 points as initial class centers;Partition the data with the 3 mean pointsStep 2: E stepStep 3: EM algorithmdo M StepE Stepuntil con

46、vergence,55,Step 1: Initialize the partition,56,57,58,Image Segmentation using EM,Step 1: Feature ExtractionStep 2: Image Segmentation using EM,Symbols,The feature vector for pixel i is called xi.There are going to be

47、K segments; K is given.The j-th segment has a Gaussian distribution with parameters ?j=(?j,?j).?j's are the weights (which sum to 1) of Gaussians. ? is the collection of parameters:? =(?1, …, ?k, ?1, …, ?k),Init

48、ialization,Each of the K Gaussians will have parameters ?j=(?j,?j), where?j is the mean of the j-th Gaussian.?j is the covariance matrix of the j-th Gaussian.The covariance matrices are initialed to be the identity ma

49、trix.The means can be initialized by finding the average feature vectors in each of K windows in the image; this is data-driven initialization.,E-Step,M-Step,Sample Results,KeyPoints:,Maximum likelihood estimationGMME

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論