外文翻譯--預(yù)測(cè)電信行業(yè)客戶(hù)流失——基于一種sas生存分析模式的應(yīng)用程序

上傳人：奔*** IP屬地：河北更新時(shí)間：2024-03-01 格式：doc 頁(yè)數(shù)：14 大?。?7.50KB 人氣指數(shù)：12 舉報(bào) 版權(quán)申訴

外文翻譯--預(yù)測(cè)電信行業(yè)客戶(hù)流失——基于一種sas生存分析模式的應(yīng)用程序_第1頁(yè)

已閱讀1頁(yè)，還剩13頁(yè)未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說(shuō)明：本文檔由用戶(hù)提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、　　中文3770字　　標(biāo)題：Predicting Customer Churn in the Telecommunications Industry –– An Application of Survival Analysis Modeling Using SAS　　原文：ABSTRACT</

2、p>　　Conventional statistical methods (e.g. logistics regression, decision tree, and etc.) are very successful in predicting customer churn. However, these methods could hardly predict when customers will ch

3、urn, or how long the customers will stay with. The goal of this study is to apply survival analysis techniques to predict customer churn by using data from a telecommunications company. This study will help telecommunica

4、tions companies understand customer churn risk and customer churn hazard in a 　　INTRODUCTION　　In the telecommunication industry, customers are able to choose among multiple servi

5、ce providers and actively exercise their rights of switching from one service provider to another. In this fiercely competitive market, customers demand tailored products and better services at less prices, while service

6、 providers constantly focus on acquisitions as their business goals. Given the fact that the telecommunications industry experiences an average of 30-35 percent annual churn rate and it costs 5-10　　I

7、n order to support telecommunications companies manage churn reduction, not only do we need to predict which customers are at high risk of churn, but also we need to know how soon these high-risk customers will churn. Th

8、erefore the telecommunications companies can optimize their marketing intervention resources to prevent as many customers as possible from churning. In other words, if the telecommunications companies know which customer

9、s are at high risk of churn and when they will churn, they 　　Conventional statistical methods (e.g. logistics regression, decision tree, and etc.) are very successful in predicting customer churn. Th

10、ese methods could hardly predict when customers will churn, or how long the customers will stay with. However, survival analysis was, at the very beginning, designed to handle survival data, and therefore is an efficient

11、 and powerful tool to predict customer churn.　　OBJECTIVES　　The objectives of this study are in two folds. The first objective is to estimate customer survival function and custom

12、er hazard function to gain knowledge of customer churn over the time of customer tenure. The second objective is to demonstrate how survival analysis techniques are used to identify the customers who are at high risk of

13、churn and when they will churn.　　DEFINITIONS AND EXCLUSIONS　　This section clarifies some of the important concepts and exclusions used in this study.　　Churn

14、– In the telecommunications industry, the broad definition of churn is the action that a customer’s telecommunications service is canceled. This includes both service-provider initiated churn and customer initiated churn

15、. An example of service-provider initiated churn is a customer’s account being closed because of payment default. Customer initiated churn is more complicated and the reasons behind vary. In this study, only customer ini

16、tiated churn is considered and it is defined by a seri　　High-Value Customers – Only customers who have received at least three monthly bills are considered in the study. High-value customers are thes

17、e with monthly average revenue of $X or more for the last three months. If a customer’s first invoice covers less than 30 days of service, then the customer monthly revenue is prorated to a full month’s revenue.

18、;　　Granularity – This study examines customer churn at the account level.　　Exclusions – This study does not distinguish international customers from domestic customers. However it is desir

19、able to investigate international customer churn separately from domestic customer churn in the future.Also, this study does not include employee accounts, since churn for employee accounts is not of a problem or an inte

20、rest for the company.　　SURVIVAL ANALYSIS AND CUSTOMER CHURN　　Survival analysis is a clan of statistical methods for studying the occurrence and timing of events. From the beginni

21、ng, survival analysis was designed for longitudinal data on the occurrence of events. Keeping track of customer churn is a good example of survival data. Survival data have two common features that are difficult to handl

22、e with conventional statistical methods: censoring and time-dependent covariates. 　　Generally, survival function and hazard function are used to describe the status of customer survival during the te

23、nure of observation. The survival function gives the probability of surviving beyond a certain time point t. However, the hazard function describes the risk of event (in this case, customer churn) in an interval time aft

24、er time t, conditional on the customer already survived to time t. Therefore the hazard function is more intuitive to use in survival analysis because it attempts to　　For survival analysis, the best

25、observation plan is prospective. We begin observing a set of customers at some well-defined point of time (called the origin of time) and then follow them for some substantial period of time, recording the times at which

26、 customer churns occur. It’s not necessary that every customer experience churn (customers who are yet to experience churn are called censored cases, while those customers who already churned are called observed cases).

27、Typically, not only do we pr　　SAS/STAT has two procedures for survival analysis: PROC LIFEREG and PROC PHREG. The LIFEREG procedure produces parametric regression models with censored survival data u

28、sing maximum likelihood estimation. The PHREG procedure is a semi-parametric regression analysis using partial likelihood estimation. PROC PHREG has gained popularity over PROC LIFEREG in the last decade since it handles

29、 time dependent .However if the shapes of survival distribution and hazard function are known, PROC LIFEREG pro　　SAMPLING STRATEGY　　On August 16, 2000, a sample of 41,374 active

30、high-value customers was randomly selected from the entire customer base from a telecommunications company. All these customer were followed for the next 15 months. Therefore August 16, 2000 is the origin of time and Nov

31、ember 15, 2001 is the observation termination time. During this 15-month observation period, the timing of customer churn was recorded. For each customer in the sample, a variable of DUR is used to indicate the time that

32、 customer chur　　DATA SOURCES　　There are four major data sources for this study: block level marketing and financial information, customer level demographic data provided through

33、a third party vendor, customer internal data, and customer contact records. A brief description of some of the data sources follows.　　Demographic Data – Demographic dada is from a third party vendor.

34、 In this study, the following are examples of customer level demographic information:　　- Primary household member’s age　　- Gender and marital status　　- Numbe

35、r of adults　　- Primary household member’s occupation　　- Household estimated income and wealth ranking　　- Number of children and children’s age<

36、;p>　　- Number of vehicles and vehicle value　　- Credit card　　- Frequent traveler　　- Responder to mail orders　　- Dwelling and length of

37、 residence　　Customer Internal Data – Customer internal data is from the company’s data warehouse. It consists of two parts. The first part is about customer information like market channel, plan type

38、, bill agency, customer segmentation code, ownership of the company’s other products, dispute, late fee charge, discount, promotion/save promotion, additional lines, toll free services, rewards redemption, billing disput

39、e, and so on. The second part of customer internal data is customer’s telecommunications usag　　- Weekly average call counts　　- Percentage change of minutes

40、- Share of domestic/international revenue　　Customer Contact Records – The Company’s Customer Information System (CIS) stores detailed records of customer contacts. This basically includes customer ca

41、lls to service centers and the company’s mail contacts to customers. The customer contact records are then classified into customer contact categories. Among the customer contact categories are customer general inquiry,

42、customer requests to change service, customer inquiry about cancel, and so on.　　MODELING PROCESS　　Model process includes the following four major steps. Explanatory Data Analysis

43、 (EDA) – Explanatory data analysis was conducted to prepare the data for the survival analysis. An univariate frequency analysis was used to pinpoint value distributions, missing values and outliers.

44、　　Variable transformation was conducted for some necessary numerical variables to reduce the level of skewness, because transformations are helpful to improve the fit of a model to the data. Outliers are filtered to excl

45、ude observations, such as outliers or other extreme values that are suggested not to be included in the data mining analysis. Filtering extreme values from the training data tends to produce better models because the par

46、ameter estimates are more stable. Variables with missing value　　For interval variables, replacement values were calculated based on the random percentiles of the variable’s distribution, i.e., values

47、 were assigned based on the probability distribution of the nonmissing observations. Missing values for class variables were replaced with the most frequent values (count or mode).　　Variable reductio

48、n – Started with 212 variables in the original data set, by using PROC FREQ, an initial univariate analysis of all categorical variables crossed with customer churn status (STATUS) was carried out to determine the statis

49、tically significant categorical variables to be included in the next modeling step. All the categorical variables with a chi-square value or t statistics of 0.05 or less were kept. This step reduced the number of variabl

50、es to 115 (&VARLIST1) – including all the n　　The next step is to use PROC PHREG to further reduce the number of variables. A stepwise selection method was used to create a final model with statis

51、tically significant effects of 29 exploratory variables on customer churn over time.　　PROC PHREG DATA = SASOUT2.ALL2 OUTEST =　　SASOUT2.BETA;　　MODEL DUR*STATU

52、S(0) = &VARLIST1　　/ SELECTION = STEPWISE　　SLENTRY = 0.0025 SLSTAY = 0.0025 DETAILS;　　Model Estimation – With only 29 exploratory variables, the final dat

53、a set has reasonable number of variables to perform survival analysis. Before applying survival analysis procedures to the final data set, the customer survival function and hazard function were estimated using the follo

54、wing code. The purpose of estimating customer survival function and customer hazard function is to gain knowledge of customer churn hazard characteristics. From the shape of hazard function, customer churn in thi</p&g

55、t;　　PROC LIFETEST DATA = SASOUT2.ALL3 OUTSURV =　　SASOUT2.OUTSURV　　METHOD = LIFE PLOT = (S, H) WIDTH = 1　　GRAPHICS;&

56、lt;p>　　TIME DUR*STATUS(0);　　RUN;　　The final step is to estimate customer churn. PROC LIFEREG was used to calculate customer survival probability. At this st

57、ep the final data set was divided 50/50 into two data sets: model data set and validation data set. The model data set is used to fit the model and the validation data set is used to score the survival probability for ea

58、ch customer. A variable of USE is used to distinguish the model data set (set USE = 0) and validation data set (set USE = 1). In the validation data set, set 　　出處：Jun Xiang Lu, Ph.D. Predicting Custo

59、mer Churn in the Telecommunications Industry –– An Application of Survival Analysis Modeling Using SAS: SAS User Group International (SUGI27) Online Proceedings,2002, Paper No. 114-27. 　　譯文：預(yù)測(cè)電信行業(yè)客戶(hù)流

60、失——基于一種SAS生存分析模式的應(yīng)用程序　　Jun Xiang Lu, Ph.D.　　Sprint Communications Company　　Overland Park, Kansas　　摘要<p

61、>　　傳統(tǒng)的統(tǒng)計(jì)方法（如logistic回歸，決策樹(shù)等等）都是能非常成功的預(yù)測(cè)客戶(hù)流失的。但是，這些方法是很難預(yù)測(cè)什么時(shí)候客戶(hù)會(huì)流失，或者這些客戶(hù)還能保留多久。這項(xiàng)研究的目的是運(yùn)用生存分析技術(shù)通過(guò)使用來(lái)自電信公司的數(shù)據(jù)來(lái)預(yù)測(cè)客戶(hù)流失。這項(xiàng)研究將會(huì)幫助電信公司了解客戶(hù)流失的風(fēng)險(xiǎn)和通過(guò)預(yù)測(cè)那些和何時(shí)客戶(hù)將要流失的一種時(shí)間方式的危害。這一研究的結(jié)果有助于電信公司優(yōu)化客戶(hù)的保留和（或）處理資源來(lái)努力降低他們的客戶(hù)流失。

62、　　引言　　在電信行業(yè)，客戶(hù)可以在多個(gè)提供服務(wù)的供應(yīng)者中進(jìn)行選擇，積極運(yùn)用他們從一個(gè)服務(wù)供應(yīng)商轉(zhuǎn)換到另一個(gè)供應(yīng)商的權(quán)利。在這個(gè)競(jìng)爭(zhēng)激烈的市場(chǎng)，客戶(hù)需要用低價(jià)格獲得的按要求特質(zhì)非產(chǎn)品和更好的服務(wù)，　　服務(wù)的供應(yīng)商要不斷的專(zhuān)注于收購(gòu)作為他們的業(yè)務(wù)目標(biāo)。鑒于電信業(yè)的經(jīng)驗(yàn)是30-35%的平均客戶(hù)流失率，

63、開(kāi)發(fā)一個(gè)新客戶(hù)的成本是保留原有客戶(hù)成本的5-10倍。對(duì)于許多老牌的運(yùn)營(yíng)商，企業(yè)的主要頭痛的是留住高利潤(rùn)的客戶(hù)。許多電信公司在協(xié)調(diào)方案和過(guò)程時(shí)使用保持戰(zhàn)略通過(guò)提供量身定做的產(chǎn)品和服務(wù)來(lái)更長(zhǎng)時(shí)間的保持客戶(hù)。隨著各地方使用客戶(hù)保持戰(zhàn)略，很多公司開(kāi)始把降低客戶(hù)流失作為他們業(yè)務(wù)的目標(biāo)之一。　　為了支持電信企業(yè)管理客戶(hù)流失的減少，我們不僅需要預(yù)測(cè)那些客戶(hù)存在流失的高風(fēng)險(xiǎn)，還需要知道什么時(shí)候這些高風(fēng)險(xiǎn)的客戶(hù)要

64、流失。因此，電信公司優(yōu)化了其市場(chǎng)營(yíng)銷(xiāo)的資源來(lái)防止很多可能的客戶(hù)流失。換句話說(shuō)，如果電信公司知道他們的客戶(hù)有流失的高風(fēng)險(xiǎn)和什么時(shí)候他們將要流失，他們就設(shè)計(jì)出與客戶(hù)即使有效的交流溝通的方案。　　傳統(tǒng)的統(tǒng)計(jì)方法（如logistic回歸，決策樹(shù)等等）都是能非常成功的預(yù)測(cè)客戶(hù)流失的。但是，這些方法是很難預(yù)測(cè)什么時(shí)候客戶(hù)會(huì)流失，或者這些客戶(hù)還能保留多久。然而，生存分析的最初設(shè)計(jì)是用于處理存在的數(shù)據(jù)，因此是預(yù)

65、測(cè)客戶(hù)流失的一種有效和強(qiáng)大的工具。　　目標(biāo)　　這項(xiàng)預(yù)測(cè)研究的目標(biāo)有兩個(gè)。第一個(gè)目標(biāo)是為了建立客戶(hù)生存函數(shù)和客戶(hù)風(fēng)險(xiǎn)函數(shù)來(lái)獲取在客戶(hù)的任期時(shí)間的客戶(hù)流失的知識(shí)。第二個(gè)目標(biāo)是演示用來(lái)識(shí)別那些是高風(fēng)險(xiǎn)流失的客戶(hù)和什么時(shí)候他們將要流失的生存分析技術(shù)。　　定義和排除&

66、lt;/b>　　本問(wèn)澄清一些重要的概念和排除在本次研究之外的使用。　　流失——在電信含有，客戶(hù)流失的廣泛定義是指一個(gè)客戶(hù)的電信服務(wù)被取消了。這包括服務(wù)提供者引發(fā)的客戶(hù)流失，和客戶(hù)主動(dòng)的流失。一個(gè)服務(wù)提供者引發(fā)的客戶(hù)流失的例子有客戶(hù)的賬戶(hù)因?yàn)榭蛻?hù)欠費(fèi)被關(guān)閉?？蛻?hù)主動(dòng)流失就比較復(fù)雜，流失的原因也是不同的。在這項(xiàng)研究中只研究客戶(hù)的主動(dòng)流失，它被定義為由一

67、系列取消原因代碼，原因代碼的舉例有：不能接受通話質(zhì)量，競(jìng)爭(zhēng)對(duì)手的更優(yōu)惠的定價(jià)計(jì)劃，在銷(xiāo)售中誤傳了信息，客戶(hù)的期望得不到滿足，計(jì)費(fèi)問(wèn)題，移動(dòng)，業(yè)務(wù)上的變化等等。　　高價(jià)值客戶(hù)——僅僅只那些已經(jīng)接受至少有三個(gè)月賬單的客戶(hù)。高價(jià)值客戶(hù)是那些在過(guò)去三個(gè)月每個(gè)月平均收益在x美元或以上的客戶(hù)。如果客戶(hù)的第一張發(fā)票少于30天的服務(wù)，那么客戶(hù)的每個(gè)月的收益是按比例分配到一個(gè)整月的收入。&

68、lt;p>　　尺度——本研究討論關(guān)于賬戶(hù)的客戶(hù)流失率　　排除——這項(xiàng)研究沒(méi)有區(qū)分國(guó)內(nèi)客戶(hù)和國(guó)際客戶(hù)，實(shí)際上把國(guó)際客戶(hù)流失從國(guó)內(nèi)客戶(hù)流失中分開(kāi)是值得做的。此外，這項(xiàng)研究不包括員工的賬戶(hù)，因?yàn)閱T工賬戶(hù)的流失不只是一個(gè)問(wèn)題或是企業(yè)的一種權(quán)利。　　生存分析和客戶(hù)流失　　生存分析

69、是為學(xué)習(xí)發(fā)生的事情和實(shí)時(shí)的事件的一種統(tǒng)計(jì)研究方法。從一開(kāi)始，生存分析對(duì)發(fā)生的事件的設(shè)計(jì)縱向數(shù)據(jù)。對(duì)客戶(hù)流失的跟蹤時(shí)一個(gè)生存數(shù)據(jù)的很好的例子。生存數(shù)據(jù)有兩個(gè)共同的特點(diǎn)，很難用傳統(tǒng)的統(tǒng)計(jì)方法處理：審查和時(shí)間上的依賴(lài)性變量。　　一般情況下，生存函數(shù)和風(fēng)險(xiǎn)函數(shù)是用來(lái)描述在任期間觀察客戶(hù)存在的狀態(tài)。生存函數(shù)給出了超過(guò)一定時(shí)間t的存在概率，而風(fēng)險(xiǎn)寒素描述在間隔時(shí)間t的事件風(fēng)險(xiǎn)（在這種情況下，客戶(hù)流失）在時(shí)間

70、t后的一段間隔時(shí)間，在時(shí)間t 中考慮已經(jīng)生存下來(lái)的客戶(hù)。因此，風(fēng)險(xiǎn)功能更直觀的在生存分析中的使用，因?yàn)樗噲D把風(fēng)險(xiǎn)量化，客戶(hù)流失將在這個(gè)客戶(hù)存貨的時(shí)間t內(nèi)發(fā)生。　　為了生存分析，最佳觀測(cè)計(jì)劃是有前瞻性，我開(kāi)始觀測(cè)在一些時(shí)間定義的明確點(diǎn)（成為時(shí)間的起源）的客戶(hù)集，然后按照相當(dāng)長(zhǎng)的一段時(shí)間記錄在那時(shí)間所發(fā)生的客戶(hù)流失。每個(gè)客戶(hù)體驗(yàn)流失（客戶(hù)沒(méi)有體驗(yàn)流失被稱(chēng)為審查情況，這些客戶(hù)已經(jīng)流失的稱(chēng)為觀察情況）

71、是不必要的。通常情況下，我們不僅預(yù)測(cè)客戶(hù)流失的時(shí)間，我們也需要分析如何隨著時(shí)間變化（如客戶(hù)服務(wù)呼叫中心，客戶(hù)變更計(jì)劃類(lèi)型，客戶(hù)改變結(jié)算方式等）發(fā)生和時(shí)間影響流失的客戶(hù)。　　SAS/STAT對(duì)生存分析有兩個(gè)程序：LIFEREG程序和PHREG程序。LIFEREG程序產(chǎn)生的參數(shù)回歸模式對(duì)生存分析的數(shù)據(jù)使用最大可能的估計(jì)。PHREG過(guò)程時(shí)一個(gè)半?yún)?shù)回歸分析使用部分可能的估計(jì)。PHREG程序在過(guò)去的十年

72、里依賴(lài)它處理的時(shí)間性，已經(jīng)獲得了的普及超過(guò)LIFEREG程序。但是，如果生存分布和風(fēng)險(xiǎn)函數(shù)的形狀是已知的，LIFEREG程序比PHREG程序更有效的估計(jì)（標(biāo)準(zhǔn)誤差較?。?。　　抽樣策略　　2000年8月16日，41374活動(dòng)的高價(jià)值客戶(hù)的樣本是從整個(gè)電信公司的客戶(hù)群中隨機(jī)挑選的。所有的客戶(hù)在未來(lái)的15個(gè)月的跟隨，200

73、0年8月16日是時(shí)間的起點(diǎn)，2001年11月15日時(shí)觀察的終止時(shí)間。在這15個(gè)月的觀察期，客戶(hù)流失的時(shí)間被記錄。對(duì)于樣本中的每一個(gè)客戶(hù)，一個(gè)變量的總指數(shù)是用來(lái)表示在客戶(hù)流失情況或者審查情況下的時(shí)間，最后一次客戶(hù)進(jìn)行觀察，從開(kāi)始的時(shí)間（2000年8月16日）進(jìn)行測(cè)量。第二個(gè)變量狀態(tài)是用來(lái)區(qū)分審查情況和觀察情況的。在觀察情況下?tīng)顟B(tài)=1和在審查情況下?tīng)顟B(tài)=0都是常見(jiàn)的。在這項(xiàng)研究中，生存數(shù)據(jù)是單獨(dú)正確的審查情況，所有的審查情況有15個(gè)（月）有

74、價(jià)值的總指數(shù)為變量值。　　資料來(lái)源　　這里有四個(gè)主要數(shù)據(jù)來(lái)源的研究：數(shù)據(jù)營(yíng)銷(xiāo)和財(cái)務(wù)信息，客戶(hù)水平，通過(guò)第三方的供應(yīng)商提供的人口統(tǒng)計(jì)數(shù)據(jù)，客戶(hù)內(nèi)部數(shù)據(jù)和客戶(hù)聯(lián)系記錄。一個(gè)數(shù)據(jù)源的一些簡(jiǎn)要說(shuō)明如下。　　人口數(shù)據(jù)——人口數(shù)據(jù)時(shí)來(lái)自第三方的廠商。在這項(xiàng)研究中，以下是客戶(hù)級(jí)別的人口信息

75、的例子：　　小學(xué)家庭成員的年齡　　性別和婚姻狀況　　成人人數(shù)　　小學(xué)家庭成員的職業(yè)<p&g

76、t;　　家用估計(jì)收入和財(cái)富排名　　兒童和兒童人數(shù)的年齡　　車(chē)輛輛數(shù)和車(chē)輛價(jià)值　　信用卡　　頻繁游客

77、　　有響應(yīng)的郵件訂單　　住宅與居住期限　　客戶(hù)內(nèi)部數(shù)據(jù) —— 客戶(hù)內(nèi)部數(shù)據(jù)是從該公司的數(shù)據(jù)倉(cāng)庫(kù)得到的。它由兩部分組成。第一部分是關(guān)于客戶(hù)如市場(chǎng)渠道，計(jì)劃的類(lèi)型，票據(jù)代理，客戶(hù)細(xì)分的代碼，該公司的其他產(chǎn)品的所有權(quán)，糾紛，滯納金費(fèi)用，折扣，促銷(xiāo)信息/保存推廣，額外的線路，免費(fèi)服務(wù)，獎(jiǎng)勵(lì)贖回，結(jié)算糾紛等

78、等。對(duì)客戶(hù)內(nèi)部數(shù)據(jù)的第二個(gè)部分是客戶(hù)的電信使用數(shù)據(jù)?？蛻?hù)使用變量的例子有：　　每周平均通話次數(shù)　　會(huì)議紀(jì)要變動(dòng)百分率　　應(yīng)占的國(guó)內(nèi)/國(guó)際業(yè)務(wù)收入　　客戶(hù)聯(lián)系記錄——該公司的客戶(hù)信息系統(tǒng)（CIS）存

79、儲(chǔ)客戶(hù)接觸的詳細(xì)記錄。這基本上包括客戶(hù)呼叫服務(wù)中心和公司的郵件往來(lái)的客戶(hù)?？蛻?hù)聯(lián)系記錄為客戶(hù)聯(lián)系的類(lèi)別分類(lèi)。其中客戶(hù)聯(lián)系客戶(hù)類(lèi)別有一般查詢(xún)，客戶(hù)要求變更服務(wù)，客戶(hù)查詢(xún)有關(guān)取消等等。　　模型建立過(guò)程　　模型建立的過(guò)程包括以下四個(gè)主要步驟。說(shuō)明資料分析（EDA）——說(shuō)明數(shù)據(jù)進(jìn)行分析，以備生存分析的數(shù)據(jù)。一個(gè)的頻率分析被使

80、用于精確值分布，遺漏值和離群值。　　變量變換進(jìn)行了一些必要的數(shù)字變量，以減少偏度水平，因?yàn)橛欣谔岣咿D(zhuǎn)換一種模式適合數(shù)據(jù)。離群的篩選，以排除如離群或其他不建議在數(shù)據(jù)挖掘分析包括極端值的觀察。從訓(xùn)練數(shù)據(jù)篩選極端值往往會(huì)產(chǎn)生更好的模型，因?yàn)閰?shù)估計(jì)更穩(wěn)定。變量有遺漏值不是一個(gè)大問(wèn)題，除了這些人口統(tǒng)計(jì)變數(shù)。超過(guò)20％的人口遺漏值的變量被淘汰。對(duì)于遺漏值的觀察，一個(gè)選擇是使用不完整的意見(jiàn)，但可能導(dǎo)致忽略

81、的變量有沒(méi)有遺漏價(jià)值的有用信息。它也可能帶有偏見(jiàn)的誤差樣本，因?yàn)橐庖?jiàn)有遺漏值在其他中可能有共同的東西。因此，在這項(xiàng)研究中，遺漏值改為適當(dāng)?shù)姆椒ā?lt;/p>　　對(duì)于區(qū)間變量，重置價(jià)值計(jì)算依據(jù)變量的分布，即價(jià)值被分配的基礎(chǔ)上，在沒(méi)有遺漏觀測(cè)概率分布的隨機(jī)百分點(diǎn)。為類(lèi)變量遺漏值被替換最頻繁值（計(jì)數(shù)或模式）。　　減少變項(xiàng) ——212中的原始數(shù)據(jù)集的變量使用了FREQ程序，

82、初步的交叉與客戶(hù)的所有分類(lèi)變量單因素分析，流失狀態(tài)進(jìn)行了以決定在未來(lái)包括分類(lèi)變量顯著建模步驟。所有一卡方值的分類(lèi)變量或t為0.05統(tǒng)計(jì)或更小統(tǒng)計(jì)分類(lèi)變量統(tǒng)統(tǒng)保留。這一步變量的數(shù)目減少了115（＆變量1）---包括所有的數(shù)字變量，從一個(gè)步驟保持絕對(duì)的變數(shù)。　　接下來(lái)的步驟是使用PHREG程序進(jìn)一步減少變數(shù)。一個(gè)逐步選擇方法被用于創(chuàng)建與探索29客戶(hù)顯著影響一個(gè)變量的最終模型隨著時(shí)間的推移流失。<

83、;/p>　　PHREG程序數(shù)據(jù) = SASOUT2.ALL2 OUTEST =SASOUT2.??;?　　指數(shù)模型*狀態(tài)(0) = &變量/ 選擇 = 遞進(jìn)　　SLENTRY = 0.0025 SLSTAY = 0.0025 詳情;　　模型的估計(jì) ——只有29探索變量，最終的

84、數(shù)據(jù)集有合理數(shù)量的變量進(jìn)行生存分析。在申請(qǐng)程序，以存活分析最終數(shù)據(jù)集，客戶(hù)生存函數(shù)和風(fēng)險(xiǎn)函數(shù)估計(jì)采用下面的代碼。顧客的生存函數(shù)估計(jì)和客戶(hù)風(fēng)險(xiǎn)函數(shù)的目的是為了獲取客戶(hù)知識(shí)流失的危險(xiǎn)特性。從風(fēng)險(xiǎn)函數(shù)的形，狀，這項(xiàng)研究的客戶(hù)流失是對(duì)數(shù)正態(tài)模型典型的風(fēng)險(xiǎn)函數(shù)。如前所述，由于生存分布和危害函數(shù)的形狀是眾所周知的LIFEREG程序比PHREG程序的估計(jì)數(shù)（標(biāo)準(zhǔn)誤差較?。└行?。　　LIFETEST程序數(shù)據(jù)

85、 = SASOUT2.ALL3 OUTSURV SASOUT2.OUTSURV　　方法 = 上升容積= (面積, 高) 寬 = 1　　圖形;　　時(shí)間總指數(shù)*狀態(tài)(0);　　運(yùn)行;</p&

86、gt;　　最后一步是評(píng)估客戶(hù)流失。LIFEREG程序是用來(lái)計(jì)算客戶(hù)的生存概率。在這一步最后的數(shù)據(jù)集被分成50/50的兩組數(shù)據(jù)：模型數(shù)據(jù)集和驗(yàn)證數(shù)據(jù)集。該模型的數(shù)據(jù)集是用于擬合模型和驗(yàn)證數(shù)據(jù)集是用于評(píng)分為每一個(gè)客戶(hù)的生存概率。USE的一個(gè)變量是用來(lái)區(qū)分模型數(shù)據(jù)集（設(shè)置使用= 0）和驗(yàn)證數(shù)據(jù)集（設(shè)置使用= 1）。在驗(yàn)證數(shù)據(jù)集，總指數(shù)和狀態(tài)都設(shè)置失蹤，以便在驗(yàn)證數(shù)據(jù)集是不能在模型的估計(jì)使用。<

87、;p>　　出處：Jun Xiang Lu, Ph.D. Predicting Customer Churn in the Telecommunications Industry –– An Application of Survival Analysis Modeling Using SAS: SAS User Group International (SUGI27) Online Proceedings.2002, Paper

眾賞文庫(kù)> 全部分類(lèi)> 畢業(yè)設(shè)計(jì)

溫馨提示

1. 本站所有資源如無(wú)特殊說(shuō)明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 眾賞文庫(kù)僅提供信息存儲(chǔ)空間，僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

外文翻譯--預(yù)測(cè)電信行業(yè)客戶(hù)流失——基于一種sas生存分析模式的應(yīng)用程序

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

外文翻譯--預(yù)測(cè)電信行業(yè)客戶(hù)流失——基于一種sas生存分析模式的應(yīng)用程序

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

免費(fèi)下載