畢業(yè)設(shè)計(jì)外文文獻(xiàn)翻譯_sql_2005_第1頁
已閱讀1頁,還剩12頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、<p><b>  畢業(yè)設(shè)計(jì)(論文)</b></p><p><b>  外文文獻(xiàn)翻譯</b></p><p><b>  英文原文</b></p><p>  Introduction to Data Mining</p><p>  Abstract: Micro

2、soft® SQL Server? 2005 provides an integrated environment for creating and working with data mining models. This tutorial uses four scenarios, targeted mailing, forecasting, market basket, a

3、nd sequence clustering, to demonstrate how to use the mining model algorithms, mining model viewers, and data mining tools that are included in this release of SQL Server.</p><p>  Introductio

4、n</p><p>  The data mining tutorial is designed to walk you through the process of creating data mining models in Microsoft SQL Server 2005. The data mining algorithms and tools in SQL Server 2005 make it ea

5、sy to build a comprehensive solution for a variety of projects, including market basket analysis, forecasting analysis, and targeted mailing analysis. The scenarios for these solutions are explained in greater detail lat

6、er in the tutorial. </p><p>  The most visible components in SQL Server 2005 are the workspaces that you use to create and work with data mining models. The online analytical processing (OLAP) and data minin

7、g tools are consolidated into two working environments: Business Intelligence Development Studio and SQL Server Management Studio. Using Business Intelligence Development Studio, you can develop an Analysis Services proj

8、ect disconnected from the server. When the project is ready, you can deploy it to the server. You can a</p><p>  All of the data mining tools exist in the data mining editor. Using the editor you can manage

9、mining models, create new models, view models, compare models, and create predictions based on existing models. </p><p>  After you build a mining model, you will want to explore it, looking for interesting

10、patterns and rules. Each mining model viewer in the editor is customized to explore models built with a specific algorithm. For more information about the viewers, see "Viewing a Data Mining Model" in SQL Serve

11、r Books Online.</p><p>  Often your project will contain several mining models, so before you can use a model to create predictions, you need to be able to determine which model is the most accurate. For thi

12、s reason, the editor contains a model comparison tool called the Mining Accuracy Chart tab. Using this tool you can compare the predictive accuracy of your models and determine the best model. </p><p>  To c

13、reate predictions, you will use the Data Mining Extensions (DMX) language. DMX extends SQL, containing commands to create, modify, and predict against mining models. For more information about DMX, see "Data Mining

14、Extensions (DMX) Reference" in SQL Server Books Online. Because creating a prediction can be complicated, the data mining editor contains a tool called Prediction Query Builder, which allows you to build queries usi

15、ng a graphical interface. You can also view the DMX code that is g</p><p>  Just as important as the tools that you use to work with and create data mining models are the mechanics by which they are created.

16、 The key to creating a mining model is the data mining algorithm. The algorithm finds patterns in the data that you pass it, and it translates them into a mining model — it is the engine behind the process. </p>&

17、lt;p>  Some of the most important steps in creating a data mining solution are consolidating, cleaning, and preparing the data to be used to create the mining models. SQL Server 2005 includes the Data Transformation S

18、ervices (DTS) working environment, which contains tools that you can use to clean, validate, and prepare your data. For more information on using DTS in conjunction with a data mining solution, see "DTS Data Mining

19、Tasks and Transformations" in SQL Server Books Online.</p><p>  In order to demonstrate the SQL Server data mining features, this tutorial uses a new sample database called AdventureWorksDW. The databas

20、e is included with SQL Server 2005, and it supports OLAP and data mining functionality. In order to make the sample database available, you need to select the sample database at the installation time in the “Advanced” di

21、alog for component selection.</p><p>  Adventure Works</p><p>  AdventureWorksDW is based on a fictional bicycle manufacturing company named Adventure Works Cycles. Adventure Works produces and

22、distributes metal and composite bicycles to North American, European, and Asian commercial markets. The base of operations is located in Bothell, Washington with 500 employees, and several regional sales teams are locate

23、d throughout their market base. </p><p>  Adventure Works sells products wholesale to specialty shops and to individuals through the Internet. For the data mining exercises, you will work with the AdventureW

24、orksDW Internet sales tables, which contain realistic patterns that work well for data mining exercises. </p><p>  For more information on Adventure Works Cycles see "Sample Databases and Business Scena

25、rios" in SQL Server Books Online.</p><p>  Database Details</p><p>  The Internet sales schema contains information about 9,242 customers. These customers live in six countries, which are c

26、ombined into three regions:</p><p>  North America (83%)</p><p>  Europe (12%)</p><p>  Australia (7%)</p><p>  The database contains data for three fiscal years: 2002,

27、 2003, and 2004. </p><p>  The products in the database are broken down by subcategory, model, and product.</p><p>  Business Intelligence Development Studio</p><p>  Business Intel

28、ligence Development Studio is a set of tools designed for creating business intelligence projects. Because Business Intelligence Development Studio was created as an IDE environment in which you can create a complete sol

29、ution, you work disconnected from the server. You can change your data mining objects as much as you want, but the changes are not reflected on the server until after you deploy the project.</p><p>  Working

30、 in an IDE is beneficial for the following reasons:</p><p>  The Analysis Services project is the entry point for a business intelligence solution. An Analysis Services project encapsulates mining models and

31、 OLAP cubes, along with supplemental objects that make up the Analysis Services database. From Business Intelligence Development Studio, you can create and edit Analysis Services objects within a project and deploy the p

32、roject to the appropriate Analysis Services server or servers.</p><p>  If you are working with an existing Analysis Services project, you can also use Business Intelligence Development Studio to work connec

33、ted the server. In this way, changes are reflected directly on the server without having to deploy the solution.</p><p>  SQL Server Management Studio</p><p>  SQL Server Management Studio is a

34、collection of administrative and scripting tools for working with Microsoft SQL Server components. This workspace differs from Business Intelligence Development Studio in that you are working in a connected environment w

35、here actions are propagated to the server as soon as you save your work. </p><p>  After the data has been cleaned and prepared for data mining, most of the tasks associated with creating a data mining solut

36、ion are performed within Business Intelligence Development Studio. Using the Business Intelligence Development Studio tools, you develop and test the data mining solution, using an iterative process to determine which mo

37、dels work best for a given situation. When the developer is satisfied with the solution, it is deployed to an Analysis Services server. From this point, the</p><p>  Data Transformation Services</p>&

38、lt;p>  Data Transformation Services (DTS) comprises the Extract, Transform, and Load (ETL) tools in SQL Server 2005. These tools can be used to perform some of the most important tasks in data mining: cleaning and pre

39、paring the data for model creation. In data mining, you typically perform repetitive data transformations to clean the data before using the data to train a mining model. Using the tasks and transformations in DTS, you c

40、an combine data preparation and model creation into a single DTS packa</p><p>  DTS also provides DTS Designer to help you easily build and run packages containing all of the tasks and transformations. Using

41、 DTS Designer, you can deploy the packages to a server and run them on a regularly scheduled basis. This is useful if, for example, you collect data weekly data and want to perform the same cleaning transformations each

42、time in an automated fashion.</p><p>  You can work with a Data Transformation project and an Analysis Services project together as part of a business intelligence solution, by adding each project to a solut

43、ion in Business Intelligence Development Studio.</p><p>  Mining Model Algorithms</p><p>  Data mining algorithms are the foundation from which mining models are created. The variety of algorith

44、ms included in SQL Server 2005 allows you to perform many types of analysis. For more specific information about the algorithms and how they can be adjusted using parameters, see "Data Mining Algorithms" in SQL

45、 Server Books Online.</p><p>  Microsoft Decision Trees</p><p>  The Microsoft Decision Trees algorithm supports both classification and regression and it works well for predictive modeling. Us

46、ing the algorithm, you can predict both discrete and continuous attributes. </p><p>  In building a model, the algorithm examines how each input attribute in the dataset affects the result of the predicted a

47、ttribute, and then it uses the input attributes with the strongest relationship to create a series of splits, called nodes. As new nodes are added to the model, a tree structure begins to form. The top node of the tree d

48、escribes the breakdown of the predicted attribute over the overall population. Each additional node is created based on the distribution of states of the predi</p><p>  Microsoft Clustering</p><p&

49、gt;  The Microsoft Clustering algorithm uses iterative techniques to group records from a dataset into clusters containing similar characteristics. Using these clusters, you can explore the data, learning more about the

50、relationships that exist, which may not be easy to derive logically through casual observation. Additionally, you can create predictions from the clustering model created by the algorithm. For example, consider a group o

51、f people who live in the same neighborhood, drive the same kind o</p><p>  Microsoft Naïve Bayes</p><p>  The Microsoft Naïve Bayes algorithm quickly builds mining models that can be u

52、sed for classification and prediction. It calculates probabilities for each possible state of the input attribute, given each state of the predictable attribute, which can later be used to predict an outcome of the predi

53、cted attribute based on the known input attributes. The probabilities used to generate the model are calculated and stored during the processing of the cube. The algorithm supports only discrete or disc</p><p&

54、gt;  Microsoft Time Series</p><p>  The Microsoft Time Series algorithm creates models that can be used to predict continuous variables over time from both OLAP and relational data sources. For example, you

55、can use the Microsoft Time Series algorithm to predict sales and profits based on the historical data in a cube.</p><p>  Using the algorithm, you can choose one or more variables to predict, but they must b

56、e continuous. You can have only one case series for each model. The case series identifies the location in a series, such as the date when looking at sales over a length of several months or years. A case may contain a s

57、et of variables (for example, sales at different stores). The Microsoft Time Series algorithm can use cross-variable correlations in its predictions. For example, prior sales at one store may be </p><p>  Mi

58、crosoft Neural Network</p><p>  In Microsoft SQL Server 2005 Analysis Services, the Microsoft Neural Network algorithm creates classification and regression mining models by constructing a multilayer percept

59、ron network of neurons. Similar to the Microsoft Decision Trees algorithm provider, given each state of the predictable attribute, the algorithm calculates probabilities for each possible state of the input attribute. Th

60、e algorithm provider processes the entire set of cases , iteratively comparing the predicted classificati</p><p>  Microsoft Linear Regression</p><p>  The Microsoft Linear Regression algorithm

61、 is a particular configuration of the Microsoft Decision Trees algorithm, obtained by disabling splits (the whole regression formula is built in a single root node). The algorithm supports the prediction of continuous at

62、tributes.</p><p>  Microsoft Logistic Regression</p><p>  The Microsoft Logistic Regression algorithm is a particular configuration of the Microsoft Neural Network algorithm, obtained by elimina

63、ting the hidden layer. The algorithm supports the prediction of both discrete andcontinuous attributes.) </p><p>  中文譯文(字?jǐn)?shù)3795)</p><p><b>  數(shù)據(jù)挖掘技術(shù)簡介</b></p><p>  摘要:微軟&#

64、174; SQL Server?2005中提供用于創(chuàng)建和使用數(shù)據(jù)挖掘模型的集成環(huán)境的工作。本教程使用的四種情況:有針對(duì)性的郵件預(yù)測;順序分析和聚類;演示如何使用挖掘模型算法;挖掘模型查看器和數(shù)據(jù)挖掘工具。 </p><p><b>  介紹</b></p><p>  數(shù)據(jù)挖掘教程旨在通過創(chuàng)建走在Microsoft SQL Server 2005的數(shù)據(jù)挖掘模型的過程。

65、數(shù)據(jù)挖掘算法,并在SQL Server 2005工具可以很容易地建立一個(gè)項(xiàng)目,包括市場購物籃分析各種全面的解決方案,預(yù)測分析,有針對(duì)性的郵件分析。這些解決方案的情景更詳細(xì)的解釋在后面的教程。</p><p>  SQL Server 2005最明顯的部分是用來創(chuàng)建和處理數(shù)據(jù)挖掘模型的工作室。在線分析處理( OLAP )和數(shù)據(jù)挖掘工具被統(tǒng)一為兩個(gè)工作環(huán)境:商業(yè)智能開發(fā)工作室和SQL Server 管理工作室。通過商

66、業(yè)智能開發(fā)工作室,您可以在與服務(wù)器斷開連接的情況下建立一個(gè)服務(wù)項(xiàng)目分析。當(dāng)項(xiàng)目已經(jīng)準(zhǔn)備就緒,您可以發(fā)布到服務(wù)器上。您也可以直接面向服務(wù)器工作。SQL Server 管理工作室的主要職能是管理服務(wù)器。之后將有針對(duì)每一個(gè)環(huán)境的詳細(xì)說明。欲了解更多關(guān)于從兩個(gè)環(huán)境中選擇的信息,請(qǐng)參看SQL Server聯(lián)機(jī)叢書中的“在SQL Server 工作室和商業(yè)智能開發(fā)工作室中選擇”。</p><p>  數(shù)據(jù)挖掘工具都存在于數(shù)據(jù)

67、挖掘的編輯。使用編輯器,您可以管理挖掘模型,創(chuàng)造新模式,查看模型,比較模型,并建立在現(xiàn)有模型的預(yù)測。</p><p>  當(dāng)你創(chuàng)建一個(gè)挖掘模型,你會(huì)想要去探索它,尋找有趣的模式和規(guī)則。在編輯器中的每個(gè)挖掘模型查看器是自定義進(jìn)行探討,以特定的算法建立的模型。如需觀眾的信息,請(qǐng)參看SQL Server聯(lián)機(jī)叢書中的“查看數(shù)據(jù)挖掘模型”。</p><p>  您的項(xiàng)目往往會(huì)包含多個(gè)挖掘模型,所以才

68、能使用的模式創(chuàng)建的預(yù)測,你要能夠確定哪些模式是最準(zhǔn)確的。出于這個(gè)原因,編輯包含一個(gè)模型比較工具挖掘精度的圖表標(biāo)簽。使用此工具,您可以比較準(zhǔn)確的預(yù)測模型和您確定最佳模式。 </p><p>  為了建立數(shù)據(jù)預(yù)期,你將使用一種 DME語言,DMX擴(kuò)展了傳統(tǒng)的SQL語法,包含了一些創(chuàng)建修改和建立數(shù)據(jù)預(yù)期的命令,關(guān)于DMX的詳細(xì)信息,請(qǐng)參考SQL BOL中的 “Data Mining Extensions (DMX) R

69、eference”章節(jié)。因?yàn)榻⒁粋€(gè)數(shù)據(jù)預(yù)期可能比較復(fù)雜,所以數(shù)據(jù)挖掘編輯器包含了一個(gè)工具叫做 “Prediction Query Builder”, 該工具可以讓你在一個(gè)圖形化的界面下編輯DMX查詢語句,你也可以在該工具中可以查看自動(dòng)生成的DMX語句。</p><p>  了解了前面介紹的實(shí)現(xiàn)數(shù)據(jù)挖掘的工具之外,同等重要的是了解數(shù)據(jù)挖掘模型的結(jié)構(gòu)本身,建立一個(gè)數(shù)據(jù)模型的關(guān)鍵是數(shù)據(jù)挖掘算法,該算法在你操作的數(shù)據(jù)中

70、尋找我們需要的部分,并且轉(zhuǎn)換這些數(shù)據(jù)成為一個(gè)可操作的數(shù)據(jù)模型。 </p><p>  一些很重要的建立數(shù)據(jù)挖掘解決方案的步驟是用來整理準(zhǔn)備那些用于建立數(shù)據(jù)模型的數(shù)據(jù),SQL2005包含一個(gè)DTS的工作環(huán)境以及一些DTS的工具用于清理驗(yàn)證準(zhǔn)備數(shù)據(jù),關(guān)于DTS的更多信息請(qǐng)查看SQL BOL中的‘DTS Data Mining Tasks and Transformations’ 章節(jié)。</p><

71、p>  Adventure 數(shù)據(jù)庫</p><p>  AdventureWorksDW 數(shù)據(jù)庫是基于一個(gè)虛構(gòu)的自行車制造公司而建立,公司的名稱叫做 “Adventure Works Cycles”(簡稱AW公司)。AW公司生產(chǎn)并向北美,歐洲和亞洲的商業(yè)市場銷售金屬和復(fù)合材料的自行車,主要的工作都在華盛頓Bothell完成,那里擁有 500 員工,以及一些地區(qū)銷售部門遍及各地。 </p>&l

72、t;p>  AW公司通過INTERNET批發(fā)和零售他們的產(chǎn)品,本教程中的數(shù)據(jù)模型實(shí)例需要你使用這些網(wǎng)絡(luò)銷售數(shù)據(jù)作為數(shù)據(jù)模型。 </p><p>  關(guān)于AW公司數(shù)據(jù)庫的更多信息請(qǐng)參考 SQL Server聯(lián)機(jī)叢書中的如下章節(jié):‘Sample Databases and Business Scenarios’。</p><p><b>  數(shù)據(jù)庫詳細(xì)信息</b>&

73、lt;/p><p>  網(wǎng)絡(luò)銷售數(shù)據(jù)構(gòu)架包含9242個(gè)客戶的信息,這些客戶分布在6個(gè)國家,并被合并為3個(gè)區(qū)域:</p><p><b>  南美 (83%)</b></p><p><b>  歐洲 (12%)</b></p><p><b>  澳大利亞 (7%)</b><

74、/p><p>  該數(shù)據(jù)庫包含三個(gè)財(cái)政年度的數(shù)據(jù): 2002年, 2003年和2004年。數(shù)據(jù)庫中的產(chǎn)品根據(jù)子類別,型號(hào)和產(chǎn)品來分類。</p><p><b>  商業(yè)智能開發(fā)工作室</b></p><p>  商業(yè)智能開發(fā)工作室是一套用于創(chuàng)建商務(wù)智能項(xiàng)目的工具。由于商業(yè)智能開發(fā)工作室是創(chuàng)建于IDE環(huán)境中的,在該環(huán)境中,你可以在脫機(jī)狀態(tài)下創(chuàng)建一個(gè)完

75、整地解決方案。你可以想改多少數(shù)據(jù)挖掘?qū)ο缶透亩嗌?,但是在你發(fā)布該項(xiàng)目前,這些改變將不會(huì)反映在服務(wù)器上。</p><p>  一個(gè)SSAS數(shù)據(jù)庫用于集成多種技術(shù),這個(gè)數(shù)據(jù)庫作為數(shù)據(jù)挖掘模型以及OLAP等技術(shù)的基礎(chǔ)。你可以使用商業(yè)智能 建立和修改一個(gè)SSAS項(xiàng)目并部署這個(gè)項(xiàng)目到一個(gè)或多個(gè)SSAS服務(wù)如果你在開發(fā)一個(gè)SSAS項(xiàng)目你也可以使用商業(yè)智能開發(fā)工作室直接連接數(shù)據(jù)庫,這樣你所作的改動(dòng)可以立刻影響到數(shù)據(jù)庫中。<

76、;/p><p>  SQL Server 管理工作室</p><p>  SQL Server管理工作室是一個(gè)行政和腳本工具與Microsoft SQL Server組件工作的集合。此工作區(qū)的不同之處,你是在互聯(lián)環(huán)境中工作的行動(dòng)是在傳播到服務(wù)器只要您保存您的工作從商務(wù)智能開發(fā)工作室中。</p><p>  在數(shù)據(jù)被清理并為數(shù)據(jù)挖掘準(zhǔn)備好后,大多數(shù)和創(chuàng)建蘇局挖掘解決方案相

77、關(guān)聯(lián)的工作都在商業(yè)智能開發(fā)工作室中工作。通過使用商業(yè)智能開發(fā)工作室,你可以利用迭代過程確定的給定情況下的最佳模式來發(fā)布和測試數(shù)據(jù)挖掘解決方案。一旦開發(fā)商對(duì)解決方案滿意,就可以將其發(fā)布到分析服務(wù)服務(wù)器。</p><p>  從這點(diǎn)來看,重點(diǎn)從SQL Server管理工作室的開發(fā)轉(zhuǎn)移到了維護(hù)和應(yīng)用。在SQL Server管理工作室中,您可以管理您的數(shù)據(jù)庫和執(zhí)行一些在商業(yè)智能開發(fā)工作室中的相同的職能,比如在挖掘模式中查

78、看、創(chuàng)建預(yù)測。</p><p><b>  數(shù)據(jù)轉(zhuǎn)換服務(wù)</b></p><p>  在SQL Server 2005中數(shù)據(jù)轉(zhuǎn)換服務(wù)( DTS )包括抽取,轉(zhuǎn)換和加載(簡稱ETL )工具 。這些工具可用于執(zhí)行一些數(shù)據(jù)挖掘中最重要的任務(wù),為數(shù)據(jù)模型的建立清理和準(zhǔn)備數(shù)據(jù)。在數(shù)據(jù)挖掘,您通??梢詧?zhí)行重復(fù)數(shù)據(jù)轉(zhuǎn)換清理數(shù)據(jù),然后利用這些數(shù)據(jù)組成挖掘模型。利用DTS中的任務(wù)和轉(zhuǎn)移

79、,您可以把數(shù)據(jù)準(zhǔn)備和模型建立結(jié)合為一個(gè)單一的DTS包。</p><p>  DTS公司還提供了DTS設(shè)計(jì)器,以幫助您輕松地建立和運(yùn)行的包含了所有的任務(wù)和轉(zhuǎn)變的軟件包。利用DTS設(shè)計(jì)器,您可以將包發(fā)布到服務(wù)器上并定期的運(yùn)行他們。這是非常有用例如,你每周收集數(shù)據(jù)資料,并向要每次自動(dòng)執(zhí)行相同的清潔轉(zhuǎn)換工作。</p><p>  你可以通過向商業(yè)智能開發(fā)式的解決方案中分別增加項(xiàng)目來將數(shù)據(jù)轉(zhuǎn)換項(xiàng)目和

80、分析服務(wù)項(xiàng)目結(jié)合起來工作,作為商務(wù)智能解決方案的一部分。</p><p><b>  挖掘模式算法</b></p><p>  數(shù)據(jù)挖掘算法是挖掘模型的創(chuàng)建的基礎(chǔ)。SQL Server 2005中各種各樣的算法可以讓你執(zhí)行多種類型的執(zhí)行。欲了解更多有關(guān)算法及其參數(shù)調(diào)整的信息,請(qǐng)參看SQL Server聯(lián)機(jī)叢書中的“數(shù)據(jù)挖掘算法”。</p><p&g

81、t;<b>  決策樹</b></p><p>  決策樹算法支持分類與回歸并且對(duì)預(yù)測模型也行之有效。利用該算法,你可以預(yù)測離散和連續(xù)這兩個(gè)屬性。</p><p>  在建立模型時(shí),該算法檢查每個(gè)數(shù)據(jù)集的輸入屬性是怎樣的影響預(yù)測屬性的結(jié)果,以及使用最強(qiáng)的關(guān)系的輸入屬性制造了一系列的分裂,稱為節(jié)點(diǎn)。隨著新節(jié)點(diǎn)添加到模型中,樹狀結(jié)構(gòu)開始形成。頂端節(jié)點(diǎn)樹描述了大多數(shù)預(yù)測屬性

82、的統(tǒng)計(jì)分析。每個(gè)節(jié)點(diǎn)建立把預(yù)測屬性比作投入的屬性的分布情況上。如果輸入的屬性被視為導(dǎo)致預(yù)測屬性有利于促成比另一個(gè)更好的狀態(tài),于是一個(gè)新的節(jié)點(diǎn)添加到模型。該模型繼續(xù)增長,直到?jīng)]有剩余的屬性制造分裂提供了一個(gè)更好的預(yù)測在現(xiàn)有節(jié)點(diǎn)。該模型力圖找到一個(gè)結(jié)合的屬性和引起在預(yù)測屬性不成比例分配的狀態(tài),因此,您可以預(yù)測預(yù)測屬性的結(jié)果。</p><p><b>  簇</b></p><

83、p>  簇算法采用迭代技術(shù)組從包含相似特性的數(shù)據(jù)及中進(jìn)行分類。利用這些組合,您可以探討的數(shù)據(jù),更多地了解存在的關(guān)系,這在理論上可能不容易通過偶然的觀察獲得。此外,您也可以從算法創(chuàng)建的簇建立預(yù)測模型。例如,考慮那些住在同一社區(qū),驅(qū)動(dòng)器相同的車,吃同樣的食物,買了類似的版本的產(chǎn)品的那一個(gè)群體的人。這是一組數(shù)據(jù)。另一組可能包括去相同的餐廳,也有類似的薪金,休假和每年兩次以外的地區(qū)的人。觀測這些集合是如何的分布,可以更好地了解預(yù)測屬性的結(jié)

84、果是如何相互影響的。</p><p><b>  傳統(tǒng)貝葉斯</b></p><p>  在傳統(tǒng)貝葉斯算法快速生成挖掘,可用于分類和預(yù)測的模型。它計(jì)算的每個(gè)輸入屬性的國家給予每個(gè)可預(yù)測屬性,它可以用來預(yù)測以后的預(yù)測屬性上已知的結(jié)果輸入屬性狀態(tài),概率。用于生成該模型的概率計(jì)算,并在立方體的處理中。該算法只支持離散或離散化的屬性,它認(rèn)為所有輸入屬性是獨(dú)立的。在傳統(tǒng)貝葉斯算

85、法產(chǎn)生一個(gè)簡單的挖掘模型可以被認(rèn)為是在數(shù)據(jù)挖掘過程的起點(diǎn)。由于在建立模型中使用的計(jì)算大多是在加工過程中產(chǎn)生的立方體,迅速返回結(jié)果。這使得該模型的一個(gè)探索發(fā)現(xiàn)的數(shù)據(jù)和如何在不同的輸入屬性的預(yù)測屬性的不同分布狀態(tài)不錯(cuò)的選擇。</p><p><b>  時(shí)間系</b></p><p>  Microsoft時(shí)序算法創(chuàng)建,可用于預(yù)測了來自O(shè)LAP和關(guān)系數(shù)據(jù)源的時(shí)間連續(xù)變量模

86、型。例如,您可以使用Microsoft時(shí)序算法來預(yù)測銷售和在一個(gè)立方體的歷史數(shù)據(jù)為基礎(chǔ)的利潤。 利用該算法,你可以選擇一個(gè)或多個(gè)變量進(jìn)行預(yù)測,但必須是連續(xù)的。您只能有一個(gè)為每個(gè)模型病例。此案系列標(biāo)識(shí)系列中的位置,如超過之日起在幾個(gè)月或幾年的長度尋找銷售。</p><p>  一個(gè)案件可能含有一組變量(例如,在不同的商店銷售)。 Microsoft時(shí)序算法 可以用其預(yù)測交叉變量的相關(guān)性。例如,在一家商店

87、前的銷售可能會(huì)在其他商店的預(yù)測目前的銷售非常有用。</p><p><b>  神經(jīng)網(wǎng)絡(luò)</b></p><p>  在Microsoft SQL Server 2005分析服務(wù),Microsoft神經(jīng)網(wǎng)絡(luò)算法創(chuàng)建通過構(gòu)建一個(gè)多層感知器神經(jīng)元網(wǎng)絡(luò)分類和回歸挖掘模型。類似Microsoft決策樹算法提供程序,那么每一個(gè)可預(yù)測屬性的狀態(tài),該算法計(jì)算出的每個(gè)輸入屬性可能狀態(tài)

88、的概率。該算法提供程序處理案件的整套,反復(fù)比較,與已知的案件實(shí)際的分類個(gè)案的預(yù)測分類。從整個(gè)案件的第一次迭代的初始設(shè)置分類的錯(cuò)誤是反饋到網(wǎng)絡(luò),并用于修改為下一次迭代網(wǎng)絡(luò)的性能,等等。您可以在以后使用這些概率來預(yù)測一個(gè)屬性的預(yù)測結(jié)果,根據(jù)輸入的屬性。該算法之間和Microsoft決策樹算法的主要區(qū)別之一,但是,是其學(xué)習(xí)的過程是朝著減少錯(cuò)誤,而Microsoft決策樹算法拆分規(guī)則,以最大限度地獲取信息,優(yōu)化網(wǎng)絡(luò)參數(shù)。該算法同時(shí)支持離散和連續(xù)

89、屬性的預(yù)測。</p><p><b>  線性回歸</b></p><p>  線性回歸算法是決策樹算法的一種特殊的構(gòu)造,獲得了無效的分裂(整個(gè)回歸公式是建立在一個(gè)單一根節(jié)點(diǎn))。該算法支持預(yù)測連續(xù)屬性。</p><p><b>  邏輯回歸</b></p><p>  邏輯回歸算法是神經(jīng)網(wǎng)絡(luò)算法的一

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論