2023年全國(guó)碩士研究生考試考研英語(yǔ)一試題真題(含答案詳解+作文范文)_第1頁(yè)
已閱讀1頁(yè),還剩12頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、中文 中文 4300 字, 字,2400 英文單詞, 英文單詞,1.3 萬(wàn)英文字符 萬(wàn)英文字符出處: 出處:Ishibashi K, Iwasaki T, Otomasa S, et al. Model selection for financial statement analysis: Variable selection with data mining technique [J]. Procedia Computer Scien

2、ce, 2016, 96(C):1681-1690.英 文: 文:Model selection for financial statement analysis: Variable selection with data mining techniqueKen Ishibashia, Takuya Iwasakia, Shota Otomasaa and Katsutoshi YadaaAbstractThe purpose of t

3、his study is to verify the effectiveness of a data-driven approach for financial statement analysis. In the area of accounting, variable selection for construction of models to predict firm’s earnings based on financial

4、statement data has been addressed from perspectives of corporate valuation theory, etc., but there has not been enough verification based on data mining techniques. In this paper, an attempt was made to verify the applic

5、ability of variable selection for the construction of an earnings prediction model by using recent data mining techniques. From analysis results, a method that considers the interaction among variables and the redundancy

6、 of model could be effective for financial statement data.Keywords: Financial statement analysis; earnings prediction model; model selection; variable selection; data mining1. IntroductionRecent advancement in informat

7、ion and communication technology is dramatically improving computational speeds. Under the circumstances, researchers have addressed studies focused on big data accumulated in various areas. Data mining techniques play a

8、n important role in data-driven analysis and modeling. Various methods related to data mining have been developed until now, and software such as SPSS and Weka has been developed to enable us to use them easily. However,

9、 for these applications, we generally need to select a method appropriate to data.The purpose of this study is to verify the effectiveness of a data-driven approach for the financial statement analysis. In the area of ac

10、counting, Ou and Penman (1989)1) addressed the construction of an earnings prediction model focused on financial statement data. They constructed a prediction model for the probability of a firm’s earnings increase in t

11、he subsequent fiscal year by using stepwise logistic regression analysis. By introducing variable selection, their prediction model used variables’ interactions that have not been proved theoretically. That is, it is

12、possible that they constructed an earnings prediction model using unusual information that other people do not have.The result of Ou and Penman (1989)1) has various problems related to the practical use of their method.

13、 In that research1), they did not state the reason why they applied logistic regression analysis to the model construction. Furthermore, follow-up studies2), 3) pointed out various problems through additional verificatio

14、ns of the model of Ou and Penman (1989)1). For example, Holthausen and Larcker (1992)2) applied the strategy of Ou and Penman (1989)1) to another fiscal period, but could not obtain anomalies of the probability of Relief

15、 is an instance-based attribute ranking scheme proposed by Kira and Rendell (1992)6), and later improved by Kononenko (1994)10). This method is applied to the estimation of a variable’s importance for the classification.

16、 In a classification of certain class, Relief decides a variable’s importance by focusing on instances located around the border of the class. From these instances, two instances are selected as near-miss and near-hit. T

17、he near-miss is an instance that is the closest to randomly selected samples but is not the same class as them. On the other hand, an instance selected as near-hit is the closest to them and is the same class. In Relief,

18、 the importance of a variable is decided based on the effectiveness for the classification of near-miss. Existing research5) showed that this method had large tolerance to noise but low redundancy.In the application of R

19、elief to variable selection, variables to adopt are generally decided by setting a threshold to their estimated ranks. In this study, the importance of variables is decided by 10-fold cross-validation, and we adopt varia

20、bles for which the “Merit” criterion for the classification is more than 0 are adopted.2.3. Correlation-based feature selectionCFS is a method that evaluates subsets of variables, not individual variables7). This method

21、searches subsets containing variables that are highly correlated with the class and have low inter-correlation with each other. CFS tends to be computationally cheap and choose small variables’ subsets, but it is difficu

22、lt to search solutions if there are strong variable interactions5).In this study, we use a Greedy algorithm to search for a subset that has the best CFS’s evaluation.2.4. Consistency-based subset evaluationCNS evaluates

23、variables’ subsets by using class consistency8). This method searches for combinations of variables which divide the data into subsets containing strong single class majority. Thus, this search tends to be biased in f

24、avor of small variable subsets with high-class consistency. Compared with CFS, CNS is useful if there are strong variable interactions, but the size of subset tends to be large5).In this study, CNS searches for subsets b

25、y using a Greedy algorithm like in CFS.2.5. C4.5 decision tree learnerC4.5 is a learning algorithm that constructs a decision tree by selecting variables appropriate to maximize the mutual information for classification9

26、). This method can avoid over-training to data by the function called “branch pruning”, which removes branches that have little mutual information or classify few instances. In the variable selection, variables contained

27、 in the decision tree are adopted as a subset of variables.In this study, a decision tree is constructed by using all training data for modeling, and then branches of which the number of classifying data is less than 50

28、are removed by the pruning. In this way, we obtain a subset with a size equivalent to CFS’s subsets.2.6. Stepwise methodIn existing research, Ou and Penman (1989)1) constructed an earnings prediction model by using stepw

29、ise logistic regression. Stepwise method is a conventional method that sequentially chooses variables to enhance evaluation criteria. In this method, the process of variable selection is very clear. However, because the

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫(kù)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論