外文文獻(xiàn)翻譯---數(shù)據(jù)挖掘技術(shù)簡(jiǎn)介_(kāi)第1頁(yè)
已閱讀1頁(yè),還剩11頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、<p><b>  畢業(yè)設(shè)計(jì)(論文)</b></p><p><b>  外文文獻(xiàn)翻譯</b></p><p><b>  博雅學(xué)院</b></p><p><b>  中文譯文</b></p><p><b>  數(shù)據(jù)挖掘技術(shù)簡(jiǎn)介&l

2、t;/b></p><p>  摘要:微軟® SQL Server?2005中提供用于創(chuàng)建和使用數(shù)據(jù)挖掘模型的集成環(huán)境的工作。本教程使用的四種情況:有針對(duì)性的郵件預(yù)測(cè);順序分析和聚類;演示如何使用挖掘模型算法;挖掘模型查看器和數(shù)據(jù)挖掘工具。 </p><p><b>  介紹</b></p><p>  數(shù)據(jù)挖掘教程旨在通過(guò)創(chuàng)建

3、走在Microsoft SQL Server 2005的數(shù)據(jù)挖掘模型的過(guò)程。數(shù)據(jù)挖掘算法,并在SQL Server 2005工具可以很容易地建立一個(gè)項(xiàng)目,包括市場(chǎng)購(gòu)物籃分析各種全面的解決方案,預(yù)測(cè)分析,有針對(duì)性的郵件分析。這些解決方案的情景更詳細(xì)的解釋在后面的教程。</p><p>  SQL Server 2005最明顯的部分是用來(lái)創(chuàng)建和處理數(shù)據(jù)挖掘模型的工作室。在線分析處理( OLAP )和數(shù)據(jù)挖掘工具被統(tǒng)一

4、為兩個(gè)工作環(huán)境:商業(yè)智能開(kāi)發(fā)工作室和SQL Server 管理工作室。通過(guò)商業(yè)智能開(kāi)發(fā)工作室,您可以在與服務(wù)器斷開(kāi)連接的情況下建立一個(gè)服務(wù)項(xiàng)目分析。當(dāng)項(xiàng)目已經(jīng)準(zhǔn)備就緒,您可以發(fā)布到服務(wù)器上。您也可以直接面向服務(wù)器工作。SQL Server 管理工作室的主要職能是管理服務(wù)器。之后將有針對(duì)每一個(gè)環(huán)境的詳細(xì)說(shuō)明。欲了解更多關(guān)于從兩個(gè)環(huán)境中選擇的信息,請(qǐng)參看SQL Server聯(lián)機(jī)叢書(shū)中的“在SQL Server 工作室和商業(yè)智能開(kāi)發(fā)工作室中選

5、擇”。</p><p>  數(shù)據(jù)挖掘工具都存在于數(shù)據(jù)挖掘的編輯。使用編輯器,您可以管理挖掘模型,創(chuàng)造新模式,查看模型,比較模型,并建立在現(xiàn)有模型的預(yù)測(cè)。</p><p>  當(dāng)你創(chuàng)建一個(gè)挖掘模型,你會(huì)想要去探索它,尋找有趣的模式和規(guī)則。在編輯器中的每個(gè)挖掘模型查看器是自定義進(jìn)行探討,以特定的算法建立的模型。如需觀眾的信息,請(qǐng)參看SQL Server聯(lián)機(jī)叢書(shū)中的“查看數(shù)據(jù)挖掘模型”。<

6、/p><p>  您的項(xiàng)目往往會(huì)包含多個(gè)挖掘模型,所以才能使用的模式創(chuàng)建的預(yù)測(cè),你要能夠確定哪些模式是最準(zhǔn)確的。出于這個(gè)原因,編輯包含一個(gè)模型比較工具挖掘精度的圖表標(biāo)簽。使用此工具,您可以比較準(zhǔn)確的預(yù)測(cè)模型和您確定最佳模式。 </p><p>  為了建立數(shù)據(jù)預(yù)期,你將使用一種 DME語(yǔ)言,DMX擴(kuò)展了傳統(tǒng)的SQL語(yǔ)法,包含了一些創(chuàng)建修改和建立數(shù)據(jù)預(yù)期的命令,關(guān)于DMX的詳細(xì)信息,請(qǐng)參考SQL

7、 BOL中的 “Data Mining Extensions (DMX) Reference”章節(jié)。因?yàn)榻⒁粋€(gè)數(shù)據(jù)預(yù)期可能比較復(fù)雜,所以數(shù)據(jù)挖掘編輯器包含了一個(gè)工具叫做 “Prediction Query Builder”, 該工具可以讓你在一個(gè)圖形化的界面下編輯DMX查詢語(yǔ)句,你也可以在該工具中可以查看自動(dòng)生成的DMX語(yǔ)句。</p><p>  了解了前面介紹的實(shí)現(xiàn)數(shù)據(jù)挖掘的工具之外,同等重要的是了解數(shù)據(jù)挖掘

8、模型的結(jié)構(gòu)本身,建立一個(gè)數(shù)據(jù)模型的關(guān)鍵是數(shù)據(jù)挖掘算法,該算法在你操作的數(shù)據(jù)中尋找我們需要的部分,并且轉(zhuǎn)換這些數(shù)據(jù)成為一個(gè)可操作的數(shù)據(jù)模型。 </p><p>  一些很重要的建立數(shù)據(jù)挖掘解決方案的步驟是用來(lái)整理準(zhǔn)備那些用于建立數(shù)據(jù)模型的數(shù)據(jù),SQL2005包含一個(gè)DTS的工作環(huán)境以及一些DTS的工具用于清理驗(yàn)證準(zhǔn)備數(shù)據(jù),關(guān)于DTS的更多信息請(qǐng)查看SQL BOL中的‘DTS Data Mining Tasks an

9、d Transformations’ 章節(jié)。</p><p>  Adventure 數(shù)據(jù)庫(kù)</p><p>  AdventureWorksDW 數(shù)據(jù)庫(kù)是基于一個(gè)虛構(gòu)的自行車(chē)制造公司而建立,公司的名稱叫做 “Adventure Works Cycles”(簡(jiǎn)稱AW公司)。AW公司生產(chǎn)并向北美,歐洲和亞洲的商業(yè)市場(chǎng)銷售金屬和復(fù)合材料的自行車(chē),主要的工作都在華盛頓Bothell完成,那里擁有

10、 500 員工,以及一些地區(qū)銷售部門(mén)遍及各地。 </p><p>  AW公司通過(guò)INTERNET批發(fā)和零售他們的產(chǎn)品,本教程中的數(shù)據(jù)模型實(shí)例需要你使用這些網(wǎng)絡(luò)銷售數(shù)據(jù)作為數(shù)據(jù)模型。 </p><p>  關(guān)于AW公司數(shù)據(jù)庫(kù)的更多信息請(qǐng)參考 SQL Server聯(lián)機(jī)叢書(shū)中的如下章節(jié):‘Sample Databases and Business Scenarios’。</p>

11、<p><b>  數(shù)據(jù)庫(kù)詳細(xì)信息</b></p><p>  網(wǎng)絡(luò)銷售數(shù)據(jù)構(gòu)架包含9242個(gè)客戶的信息,這些客戶分布在6個(gè)國(guó)家,并被合并為3個(gè)區(qū)域:</p><p><b>  南美 (83%)</b></p><p><b>  歐洲 (12%)</b></p><p

12、><b>  澳大利亞 (7%)</b></p><p>  該數(shù)據(jù)庫(kù)包含三個(gè)財(cái)政年度的數(shù)據(jù): 2002年, 2003年和2004年。數(shù)據(jù)庫(kù)中的產(chǎn)品根據(jù)子類別,型號(hào)和產(chǎn)品來(lái)分類。</p><p><b>  商業(yè)智能開(kāi)發(fā)工作室</b></p><p>  商業(yè)智能開(kāi)發(fā)工作室是一套用于創(chuàng)建商務(wù)智能項(xiàng)目的工具。由于商業(yè)智

13、能開(kāi)發(fā)工作室是創(chuàng)建于IDE環(huán)境中的,在該環(huán)境中,你可以在脫機(jī)狀態(tài)下創(chuàng)建一個(gè)完整地解決方案。你可以想改多少數(shù)據(jù)挖掘?qū)ο缶透亩嗌?,但是在你發(fā)布該項(xiàng)目前,這些改變將不會(huì)反映在服務(wù)器上。</p><p>  一個(gè)SSAS數(shù)據(jù)庫(kù)用于集成多種技術(shù),這個(gè)數(shù)據(jù)庫(kù)作為數(shù)據(jù)挖掘模型以及OLAP等技術(shù)的基礎(chǔ)。你可以使用商業(yè)智能 建立和修改一個(gè)SSAS項(xiàng)目并部署這個(gè)項(xiàng)目到一個(gè)或多個(gè)SSAS服務(wù)如果你在開(kāi)發(fā)一個(gè)SSAS項(xiàng)目你也可以使用商業(yè)

14、智能開(kāi)發(fā)工作室直接連接數(shù)據(jù)庫(kù),這樣你所作的改動(dòng)可以立刻影響到數(shù)據(jù)庫(kù)中。</p><p>  SQL Server 管理工作室</p><p>  SQL Server管理工作室是一個(gè)行政和腳本工具與Microsoft SQL Server組件工作的集合。此工作區(qū)的不同之處,你是在互聯(lián)環(huán)境中工作的行動(dòng)是在傳播到服務(wù)器只要您保存您的工作從商務(wù)智能開(kāi)發(fā)工作室中。</p><p

15、>  在數(shù)據(jù)被清理并為數(shù)據(jù)挖掘準(zhǔn)備好后,大多數(shù)和創(chuàng)建蘇局挖掘解決方案相關(guān)聯(lián)的工作都在商業(yè)智能開(kāi)發(fā)工作室中工作。通過(guò)使用商業(yè)智能開(kāi)發(fā)工作室,你可以利用迭代過(guò)程確定的給定情況下的最佳模式來(lái)發(fā)布和測(cè)試數(shù)據(jù)挖掘解決方案。一旦開(kāi)發(fā)商對(duì)解決方案滿意,就可以將其發(fā)布到分析服務(wù)服務(wù)器。</p><p>  從這點(diǎn)來(lái)看,重點(diǎn)從SQL Server管理工作室的開(kāi)發(fā)轉(zhuǎn)移到了維護(hù)和應(yīng)用。在SQL Server管理工作室中,您可以管

16、理您的數(shù)據(jù)庫(kù)和執(zhí)行一些在商業(yè)智能開(kāi)發(fā)工作室中的相同的職能,比如在挖掘模式中查看、創(chuàng)建預(yù)測(cè)。</p><p><b>  數(shù)據(jù)轉(zhuǎn)換服務(wù)</b></p><p>  在SQL Server 2005中數(shù)據(jù)轉(zhuǎn)換服務(wù)( DTS )包括抽取,轉(zhuǎn)換和加載(簡(jiǎn)稱ETL )工具 。這些工具可用于執(zhí)行一些數(shù)據(jù)挖掘中最重要的任務(wù),為數(shù)據(jù)模型的建立清理和準(zhǔn)備數(shù)據(jù)。在數(shù)據(jù)挖掘,您通??梢詧?zhí)行

17、重復(fù)數(shù)據(jù)轉(zhuǎn)換清理數(shù)據(jù),然后利用這些數(shù)據(jù)組成挖掘模型。利用DTS中的任務(wù)和轉(zhuǎn)移,您可以把數(shù)據(jù)準(zhǔn)備和模型建立結(jié)合為一個(gè)單一的DTS包。</p><p>  DTS公司還提供了DTS設(shè)計(jì)器,以幫助您輕松地建立和運(yùn)行的包含了所有的任務(wù)和轉(zhuǎn)變的軟件包。利用DTS設(shè)計(jì)器,您可以將包發(fā)布到服務(wù)器上并定期的運(yùn)行他們。這是非常有用例如,你每周收集數(shù)據(jù)資料,并向要每次自動(dòng)執(zhí)行相同的清潔轉(zhuǎn)換工作。</p><p&g

18、t;  你可以通過(guò)向商業(yè)智能開(kāi)發(fā)式的解決方案中分別增加項(xiàng)目來(lái)將數(shù)據(jù)轉(zhuǎn)換項(xiàng)目和分析服務(wù)項(xiàng)目結(jié)合起來(lái)工作,作為商務(wù)智能解決方案的一部分。</p><p><b>  挖掘模式算法</b></p><p>  數(shù)據(jù)挖掘算法是挖掘模型的創(chuàng)建的基礎(chǔ)。SQL Server 2005中各種各樣的算法可以讓你執(zhí)行多種類型的執(zhí)行。欲了解更多有關(guān)算法及其參數(shù)調(diào)整的信息,請(qǐng)參看SQL Se

19、rver聯(lián)機(jī)叢書(shū)中的“數(shù)據(jù)挖掘算法”。</p><p><b>  決策樹(shù)</b></p><p>  決策樹(shù)算法支持分類與回歸并且對(duì)預(yù)測(cè)模型也行之有效。利用該算法,你可以預(yù)測(cè)離散和連續(xù)這兩個(gè)屬性。</p><p>  在建立模型時(shí),該算法檢查每個(gè)數(shù)據(jù)集的輸入屬性是怎樣的影響預(yù)測(cè)屬性的結(jié)果,以及使用最強(qiáng)的關(guān)系的輸入屬性制造了一系列的分裂,稱為節(jié)

20、點(diǎn)。隨著新節(jié)點(diǎn)添加到模型中,樹(shù)狀結(jié)構(gòu)開(kāi)始形成。頂端節(jié)點(diǎn)樹(shù)描述了大多數(shù)預(yù)測(cè)屬性的統(tǒng)計(jì)分析。每個(gè)節(jié)點(diǎn)建立把預(yù)測(cè)屬性比作投入的屬性的分布情況上。如果輸入的屬性被視為導(dǎo)致預(yù)測(cè)屬性有利于促成比另一個(gè)更好的狀態(tài),于是一個(gè)新的節(jié)點(diǎn)添加到模型。該模型繼續(xù)增長(zhǎng),直到?jīng)]有剩余的屬性制造分裂提供了一個(gè)更好的預(yù)測(cè)在現(xiàn)有節(jié)點(diǎn)。該模型力圖找到一個(gè)結(jié)合的屬性和引起在預(yù)測(cè)屬性不成比例分配的狀態(tài),因此,您可以預(yù)測(cè)預(yù)測(cè)屬性的結(jié)果。</p><p>

21、<b>  簇</b></p><p>  簇算法采用迭代技術(shù)組從包含相似特性的數(shù)據(jù)及中進(jìn)行分類。利用這些組合,您可以探討的數(shù)據(jù),更多地了解存在的關(guān)系,這在理論上可能不容易通過(guò)偶然的觀察獲得。此外,您也可以從算法創(chuàng)建的簇建立預(yù)測(cè)模型。例如,考慮那些住在同一社區(qū),驅(qū)動(dòng)器相同的車(chē),吃同樣的食物,買(mǎi)了類似的版本的產(chǎn)品的那一個(gè)群體的人。這是一組數(shù)據(jù)。另一組可能包括去相同的餐廳,也有類似的薪金,休假和

22、每年兩次以外的地區(qū)的人。觀測(cè)這些集合是如何的分布,可以更好地了解預(yù)測(cè)屬性的結(jié)果是如何相互影響的。</p><p><b>  傳統(tǒng)貝葉斯</b></p><p>  在傳統(tǒng)貝葉斯算法快速生成挖掘,可用于分類和預(yù)測(cè)的模型。它計(jì)算的每個(gè)輸入屬性的國(guó)家給予每個(gè)可預(yù)測(cè)屬性,它可以用來(lái)預(yù)測(cè)以后的預(yù)測(cè)屬性上已知的結(jié)果輸入屬性狀態(tài),概率。用于生成該模型的概率計(jì)算,并在立方體的處理中

23、。該算法只支持離散或離散化的屬性,它認(rèn)為所有輸入屬性是獨(dú)立的。在傳統(tǒng)貝葉斯算法產(chǎn)生一個(gè)簡(jiǎn)單的挖掘模型可以被認(rèn)為是在數(shù)據(jù)挖掘過(guò)程的起點(diǎn)。由于在建立模型中使用的計(jì)算大多是在加工過(guò)程中產(chǎn)生的立方體,迅速返回結(jié)果。這使得該模型的一個(gè)探索發(fā)現(xiàn)的數(shù)據(jù)和如何在不同的輸入屬性的預(yù)測(cè)屬性的不同分布狀態(tài)不錯(cuò)的選擇。</p><p><b>  時(shí)間系</b></p><p>  Micr

24、osoft時(shí)序算法創(chuàng)建,可用于預(yù)測(cè)了來(lái)自O(shè)LAP和關(guān)系數(shù)據(jù)源的時(shí)間連續(xù)變量模型。例如,您可以使用Microsoft時(shí)序算法來(lái)預(yù)測(cè)銷售和在一個(gè)立方體的歷史數(shù)據(jù)為基礎(chǔ)的利潤(rùn)。 利用該算法,你可以選擇一個(gè)或多個(gè)變量進(jìn)行預(yù)測(cè),但必須是連續(xù)的。您只能有一個(gè)為每個(gè)模型病例。此案系列標(biāo)識(shí)系列中的位置,如超過(guò)之日起在幾個(gè)月或幾年的長(zhǎng)度尋找銷售。</p><p>  一個(gè)案件可能含有一組變量(例如,在不同的商店銷售)。 M

25、icrosoft時(shí)序算法 可以用其預(yù)測(cè)交叉變量的相關(guān)性。例如,在一家商店前的銷售可能會(huì)在其他商店的預(yù)測(cè)目前的銷售非常有用。</p><p><b>  神經(jīng)網(wǎng)絡(luò)</b></p><p>  在Microsoft SQL Server 2005分析服務(wù),Microsoft神經(jīng)網(wǎng)絡(luò)算法創(chuàng)建通過(guò)構(gòu)建一個(gè)多層感知器神經(jīng)元網(wǎng)絡(luò)分類和回歸挖掘模型。類似Microsoft決策樹(shù)

26、算法提供程序,那么每一個(gè)可預(yù)測(cè)屬性的狀態(tài),該算法計(jì)算出的每個(gè)輸入屬性可能狀態(tài)的概率。該算法提供程序處理案件的整套,反復(fù)比較,與已知的案件實(shí)際的分類個(gè)案的預(yù)測(cè)分類。從整個(gè)案件的第一次迭代的初始設(shè)置分類的錯(cuò)誤是反饋到網(wǎng)絡(luò),并用于修改為下一次迭代網(wǎng)絡(luò)的性能,等等。您可以在以后使用這些概率來(lái)預(yù)測(cè)一個(gè)屬性的預(yù)測(cè)結(jié)果,根據(jù)輸入的屬性。該算法之間和Microsoft決策樹(shù)算法的主要區(qū)別之一,但是,是其學(xué)習(xí)的過(guò)程是朝著減少錯(cuò)誤,而Microsoft決策

27、樹(shù)算法拆分規(guī)則,以最大限度地獲取信息,優(yōu)化網(wǎng)絡(luò)參數(shù)。該算法同時(shí)支持離散和連續(xù)屬性的預(yù)測(cè)。</p><p><b>  線性回歸</b></p><p>  線性回歸算法是決策樹(shù)算法的一種特殊的構(gòu)造,獲得了無(wú)效的分裂(整個(gè)回歸公式是建立在一個(gè)單一根節(jié)點(diǎn))。該算法支持預(yù)測(cè)連續(xù)屬性。</p><p><b>  邏輯回歸</b>

28、</p><p>  邏輯回歸算法是神經(jīng)網(wǎng)絡(luò)算法的一種特殊的構(gòu)造,得到了消除隱蔽層。該算法支持預(yù)測(cè)的離散和連續(xù)屬性。</p><p><b>  英文原文</b></p><p>  Introduction to Data Mining</p><p>  Abstract: Microsoft® SQL S

29、erver? 2005 provides an integrated environment for creating and working with data mining models. This tutorial uses four scenarios, targeted mailing, forecasting, market basket, and sequence

30、 clustering, to demonstrate how to use the mining model algorithms, mining model viewers, and data mining tools that are included in this release of SQL Server.</p><p>  Introduction</p><

31、;p>  The data mining tutorial is designed to walk you through the process of creating data mining models in Microsoft SQL Server 2005. The data mining algorithms and tools in SQL Server 2005 make it easy to build a co

32、mprehensive solution for a variety of projects, including market basket analysis, forecasting analysis, and targeted mailing analysis. The scenarios for these solutions are explained in greater detail later in the tutori

33、al. </p><p>  The most visible components in SQL Server 2005 are the workspaces that you use to create and work with data mining models. The online analytical processing (OLAP) and data mining tools are cons

34、olidated into two working environments: Business Intelligence Development Studio and SQL Server Management Studio. Using Business Intelligence Development Studio, you can develop an Analysis Services project disconnected

35、 from the server. When the project is ready, you can deploy it to the server. You can a</p><p>  All of the data mining tools exist in the data mining editor. Using the editor you can manage mining models, c

36、reate new models, view models, compare models, and create predictions based on existing models. </p><p>  After you build a mining model, you will want to explore it, looking for interesting patterns and rul

37、es. Each mining model viewer in the editor is customized to explore models built with a specific algorithm. For more information about the viewers, see "Viewing a Data Mining Model" in SQL Server Books Online.&

38、lt;/p><p>  Often your project will contain several mining models, so before you can use a model to create predictions, you need to be able to determine which model is the most accurate. For this reason, the ed

39、itor contains a model comparison tool called the Mining Accuracy Chart tab. Using this tool you can compare the predictive accuracy of your models and determine the best model. </p><p>  To create prediction

40、s, you will use the Data Mining Extensions (DMX) language. DMX extends SQL, containing commands to create, modify, and predict against mining models. For more information about DMX, see "Data Mining Extensions (DMX)

41、 Reference" in SQL Server Books Online. Because creating a prediction can be complicated, the data mining editor contains a tool called Prediction Query Builder, which allows you to build queries using a graphical i

42、nterface. You can also view the DMX code that is g</p><p>  Just as important as the tools that you use to work with and create data mining models are the mechanics by which they are created. The key to crea

43、ting a mining model is the data mining algorithm. The algorithm finds patterns in the data that you pass it, and it translates them into a mining model — it is the engine behind the process. </p><p>  Some o

44、f the most important steps in creating a data mining solution are consolidating, cleaning, and preparing the data to be used to create the mining models. SQL Server 2005 includes the Data Transformation Services (DTS) wo

45、rking environment, which contains tools that you can use to clean, validate, and prepare your data. For more information on using DTS in conjunction with a data mining solution, see "DTS Data Mining Tasks and Transf

46、ormations" in SQL Server Books Online.</p><p>  In order to demonstrate the SQL Server data mining features, this tutorial uses a new sample database called AdventureWorksDW. The database is included wi

47、th SQL Server 2005, and it supports OLAP and data mining functionality. In order to make the sample database available, you need to select the sample database at the installation time in the “Advanced” dialog for compone

48、nt selection.</p><p>  Adventure Works</p><p>  AdventureWorksDW is based on a fictional bicycle manufacturing company named Adventure Works Cycles. Adventure Works produces and distributes meta

49、l and composite bicycles to North American, European, and Asian commercial markets. The base of operations is located in Bothell, Washington with 500 employees, and several regional sales teams are located throughout the

50、ir market base. </p><p>  Adventure Works sells products wholesale to specialty shops and to individuals through the Internet. For the data mining exercises, you will work with the AdventureWorksDW Internet

51、sales tables, which contain realistic patterns that work well for data mining exercises. </p><p>  For more information on Adventure Works Cycles see "Sample Databases and Business Scenarios" in SQ

52、L Server Books Online.</p><p>  Database Details</p><p>  The Internet sales schema contains information about 9,242 customers. These customers live in six countries, which are combined into thr

53、ee regions:</p><p>  North America (83%)</p><p>  Europe (12%)</p><p>  Australia (7%)</p><p>  The database contains data for three fiscal years: 2002, 2003, and 2004.

54、 </p><p>  The products in the database are broken down by subcategory, model, and product.</p><p>  Business Intelligence Development Studio</p><p>  Business Intelligence Developm

55、ent Studio is a set of tools designed for creating business intelligence projects. Because Business Intelligence Development Studio was created as an IDE environment in which you can create a complete solution, you work

56、disconnected from the server. You can change your data mining objects as much as you want, but the changes are not reflected on the server until after you deploy the project.</p><p>  Working in an IDE is be

57、neficial for the following reasons:</p><p>  The Analysis Services project is the entry point for a business intelligence solution. An Analysis Services project encapsulates mining models and OLAP cubes, alo

58、ng with supplemental objects that make up the Analysis Services database. From Business Intelligence Development Studio, you can create and edit Analysis Services objects within a project and deploy the project to the ap

59、propriate Analysis Services server or servers.</p><p>  If you are working with an existing Analysis Services project, you can also use Business Intelligence Development Studio to work connected the server.

60、In this way, changes are reflected directly on the server without having to deploy the solution.</p><p>  SQL Server Management Studio</p><p>  SQL Server Management Studio is a collection of ad

61、ministrative and scripting tools for working with Microsoft SQL Server components. This workspace differs from Business Intelligence Development Studio in that you are working in a connected environment where actions are

62、 propagated to the server as soon as you save your work. </p><p>  After the data has been cleaned and prepared for data mining, most of the tasks associated with creating a data mining solution are performe

63、d within Business Intelligence Development Studio. Using the Business Intelligence Development Studio tools, you develop and test the data mining solution, using an iterative process to determine which models work best f

64、or a given situation. When the developer is satisfied with the solution, it is deployed to an Analysis Services server. From this point, the</p><p>  Data Transformation Services</p><p>  Data T

65、ransformation Services (DTS) comprises the Extract, Transform, and Load (ETL) tools in SQL Server 2005. These tools can be used to perform some of the most important tasks in data mining: cleaning and preparing the data

66、for model creation. In data mining, you typically perform repetitive data transformations to clean the data before using the data to train a mining model. Using the tasks and transformations in DTS, you can combine data

67、preparation and model creation into a single DTS packa</p><p>  DTS also provides DTS Designer to help you easily build and run packages containing all of the tasks and transformations. Using DTS Designer, y

68、ou can deploy the packages to a server and run them on a regularly scheduled basis. This is useful if, for example, you collect data weekly data and want to perform the same cleaning transformations each time in an autom

69、ated fashion.</p><p>  You can work with a Data Transformation project and an Analysis Services project together as part of a business intelligence solution, by adding each project to a solution in Business

70、Intelligence Development Studio.</p><p>  Mining Model Algorithms</p><p>  Data mining algorithms are the foundation from which mining models are created. The variety of algorithms included in S

71、QL Server 2005 allows you to perform many types of analysis. For more specific information about the algorithms and how they can be adjusted using parameters, see "Data Mining Algorithms" in SQL Server Books On

72、line.</p><p>  Microsoft Decision Trees</p><p>  The Microsoft Decision Trees algorithm supports both classification and regression and it works well for predictive modeling. Using the algorith

73、m, you can predict both discrete and continuous attributes. </p><p>  In building a model, the algorithm examines how each input attribute in the dataset affects the result of the predicted attribute, and th

74、en it uses the input attributes with the strongest relationship to create a series of splits, called nodes. As new nodes are added to the model, a tree structure begins to form. The top node of the tree describes the bre

75、akdown of the predicted attribute over the overall population. Each additional node is created based on the distribution of states of the predi</p><p>  Microsoft Clustering</p><p>  The Microso

76、ft Clustering algorithm uses iterative techniques to group records from a dataset into clusters containing similar characteristics. Using these clusters, you can explore the data, learning more about the relationships th

77、at exist, which may not be easy to derive logically through casual observation. Additionally, you can create predictions from the clustering model created by the algorithm. For example, consider a group of people who liv

78、e in the same neighborhood, drive the same kind o</p><p>  Microsoft Naïve Bayes</p><p>  The Microsoft Naïve Bayes algorithm quickly builds mining models that can be used for classifi

79、cation and prediction. It calculates probabilities for each possible state of the input attribute, given each state of the predictable attribute, which can later be used to predict an outcome of the predicted attribute b

80、ased on the known input attributes. The probabilities used to generate the model are calculated and stored during the processing of the cube. The algorithm supports only discrete or disc</p><p>  Microsoft T

81、ime Series</p><p>  The Microsoft Time Series algorithm creates models that can be used to predict continuous variables over time from both OLAP and relational data sources. For example, you can use the Micr

82、osoft Time Series algorithm to predict sales and profits based on the historical data in a cube.</p><p>  Using the algorithm, you can choose one or more variables to predict, but they must be continuous. Yo

83、u can have only one case series for each model. The case series identifies the location in a series, such as the date when looking at sales over a length of several months or years. A case may contain a set of variables

84、(for example, sales at different stores). The Microsoft Time Series algorithm can use cross-variable correlations in its predictions. For example, prior sales at one store may be </p><p>  Microsoft Neural N

85、etwork</p><p>  In Microsoft SQL Server 2005 Analysis Services, the Microsoft Neural Network algorithm creates classification and regression mining models by constructing a multilayer perceptron network of n

86、eurons. Similar to the Microsoft Decision Trees algorithm provider, given each state of the predictable attribute, the algorithm calculates probabilities for each possible state of the input attribute. The algorithm prov

87、ider processes the entire set of cases , iteratively comparing the predicted classificati</p><p>  Microsoft Linear Regression</p><p>  The Microsoft Linear Regression algorithm is a particular

88、 configuration of the Microsoft Decision Trees algorithm, obtained by disabling splits (the whole regression formula is built in a single root node). The algorithm supports the prediction of continuous attributes.</p&

89、gt;<p>  Microsoft Logistic Regression</p><p>  The Microsoft Logistic Regression algorithm is a particular configuration of the Microsoft Neural Network algorithm, obtained by eliminating the hidden

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫(kù)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論