版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
1、Evolutionary Scheduling of Parallel Tasks Graphs onto Homogeneous ClustersSascha Hunold LIG Laboratory Grenoble, Francesascha.hunold@imag.frJoachim LeppingRobotics Research InstituteTU Dortmund University, Germanyjoachim
2、.lepping@udo.eduAbstract—Parallel task graphs (PTGs) arise when parallelprograms are combined to larger applications, e.g., scientific workflows. Scheduling these PTGs onto clusters is a challenging problem due to the ad
3、ditional degree of parallelism stemming from moldable tasks. Most algorithms are based on the assump- tion that the execution time of a parallel task is monotonically decreasing as the number of processors increases. But
4、 this assumption does not hold in practice since parallel programs often perform better if the number of processors is a multiple of internally used block sizes. In this article, we introduce the Evolutionary Moldable Ta
5、sk Scheduling (EMTS) algorithm for scheduling static PTGs onto homogeneous clusters. We apply an evolutionary approach to determine the processor allocation of each task. The evolutionary strategy ensures that EMTS can b
6、e used with any underlying model for predicting the execution time of moldable tasks. With the purpose of finding solutions quickly, EMTS considers results of other heuristics (e.g., HCPA, MCPA) as starting solutions. Th
7、e experimental results show that EMTS significantly reduces the makespan of PTGs compared to other heuristics for both non-monotonically and monotonically decreasing models.Keywords-task scheduling; parallel tasks; task
8、graphs; evo-lutionary algorithm; clusterI. INTRODUCTIONScientific workflows are an important type of paralleltask graphs and are often processed on computational grids. Many scientific workflows contain only a few parall
9、el tasks. Yet, as Cirne et al. stated, almost 98% of parallel jobs submitted to computational clusters are moldable [1]. The number of processors of a moldable task is determined before its execution and stays unchanged
10、during execu- tion [2]. If these parallel tasks are combined, a parallel task graph (PTG) arises. Several algorithms can be expressed as PTGs such as Strassen’s matrix multiplication or the Fast Fourier Transformation (F
11、FT) [3]. The nodes of a PTG denote the computations and the edges denote data or control dependencies. Executing a PTG leads to a mixed-parallel schedule as nodes are implemented in a data-parallel way and independent ta
12、sks can be executed concurrently.Let us examine the execution time of the parallel matrixmultiplication routine PDGEMM from ScaLAPACK [4] for two matrix sizes, which is shown in Figure 1. As we can see, the execution tim
13、e is not monotonically decreasing, but most scheduling algorithms assume a monotonically decreasing model of the execution time for computing the schedule.Hence, applying a non-monotonically decreasing model can lead to
14、inefficient decisions of these algorithms.For that reason, we focus on the question whether anevolutionary algorithm (EA) for scheduling PTGs can cope with irregularities in the execution time model for parallel tasks. W
15、e introduce the algorithm EMTS, which can be used with an arbitrary execution time model. We show that EMTS, when compared to other heuristics, produces better schedules with respect to the makespan objective. This holds
16、 for non-monotonically and monotonically decreasing execution time models.The paper is organized as follows. Section II details theproblem and discusses related approaches. In Section III the evolutionary method EMTS is
17、presented. The methodology and the setup of the experiments is described in Section IV. The experimental results are discussed in Section V and Section VI draws conclusions.II. PROBLEM DESCRIPTION AND RELATED WORKA. Appl
18、ication and Platform ModelA PTG (mixed-parallel application) can be representedby a directed acyclic graph (DAG) G = (V, E), where V = {vi | i = 1, . . . , V } is a set of nodes that represent the tasks and E = {ei,j | (
19、i, j) ∈ {1, . . . , V } × {1, . . . , V }} is a set of edges representing task interdependencies. A PTG can be executed on P identical processors interconnected by a network, so that each pair of processors can comm
20、unicate. It is assumed that the parallel tasks are moldable, i.e., exe- cutable by an arbitrary number of processors (1 ≤ p ≤ P).0.02 0.05 0.10 0.20number of processorstime [s]10242 8 16 320.15 0.20 0.25number of process
21、ors204816 24 32Figure 1. PDGEMM timings on the Cray XT4 of LBNL.2011 IEEE International Conference on Cluster Computing978-0-7695-4516-5/11 $26.00 © 2011 IEEE DOI 10.1109/CLUSTER.2011.45 344problem of being trapped
22、in local optima, but which cannot be guaranteed. The main advantages of evolutionary search strategies are: (1) They can cope with large problem in- stances; (2) They can be applied to an unknown search space without hav
23、ing a precise mathematical model of its structure; (3) They are capable of optimizing a solution starting anywhere in search space. The downside is that evolutionary approaches tend to converge slowly to the optimum, and
24、 one has usually no measure of how close the current result is to the optimal solution.While being aware of these properties, we want to designan evolutionary algorithm for this scheduling problem that provides a good tr
25、ade-off between the time used to compute the solution and the resulting makespan. Since we can usually trade time for solution quality, we focus on a given time constraint.III. ALGORITHM EMTSThe algorithm presented in th
26、is section is based on the ap-plication and platform models (PTG), which were introduced in Section II-A. In the context of this article, a homogeneous cluster comprises the same type of computational nodes, i.e., the sa
27、me processor and the same amount of memory. In addition, communication costs between tasks are not considered. If communication or data redistributions are necessary, they need to be included in the execution time model
28、of the parallel tasks.A. Designing the AlgorithmThe Evolutionary Moldable Task Scheduling (EMTS)utilizes a two-step approach for solving the scheduling problem. In the first step, the allocations of each task are compute
29、d while in the second step, the allocations are mapped onto the processors of the system.Mapping function: As the execution time model ofparallel tasks has no influence on the mapping step, we start with defining the map
30、ping function of EMTS. Since the mapping function also evaluates the fitness of all individuals, it should be as fast as possible and produce short schedules. Previous work showed that a list scheduling approach leads to
31、 efficient schedules [9]. In the list scheduling algorithm used by EMTS, the ready nodes are sorted by decreasing bottom level and each ready node v is mapped to the first processor set that contains s(v) available proce
32、ssors1. The fitness level of a set of allocations is defined as the resulting makespan of the PTG. A smaller makespan corresponds to a better fitness of an EA’s individual.Allocation function: The EA modifies the process
33、orallocations of tasks and its goal is to find the right allocation for each task so that the resulting fitness is optimized. The set of allocations is encoded as individual I. For a task1s(v) is the allocation size of n
34、ode v, and the bottom level bl(v) is thelength of the longest path from a node v to the sink of the PTG including its own execution time.Figure 2. Encoding of individuals: the allocation s(vi) of node vi isstored at posi
35、tion i.vi of PTG Gj the individual Ij(i) holds the number of processors allocated to vi at position i. Thus, Ij(i) = s(vi) for 1 ≤ i ≤ V , where s(vi) denotes the current allocation of task vi. Figure 2 illustrates this
36、encoding of the allocations of a PTG. The depicted PTG contains five nodes and each node possesses a processor allocation, e.g., three processors are allocated to node 1. An individual is the collection of all allocation
37、s of a PTG. So, the number of processors of node 1 is stored at the first position of the individual.The EA modifies the individuals randomly to optimizethe schedule. Thus, the objective function can be de- fined as foll
38、ows: For a given PTG G and fitness func- tion F : S → R, the EA tries to find a set of allocations S = {s(v0), . . . , s(vV )} that minimizes F. If we consider PTGs of about 100 tasks and a platform with hundreds of proc
39、essors, the search space to find the best solution S?is enormous. Nonetheless, EMTS may find an S which is closer to S? than the solutions produced by other heuristics. This leads to two main questions: (1) Where should
40、the EA start searching? Or to put it in another way, how should the original individuals be initialized? (2) How to produce new individuals from existing ones?B. Retrieving Starting SolutionsTo obtain starting solutions,
41、 EMTS makes use of resultsproduced by other heuristics. In the present work, we execute the allocation functions of MCPA [11] and HCPA [10] and encode their results as individuals in the initial population. These allocat
42、ion functions were chosen because they pro- duce sufficiently good solutions in a short amount of time.Additionally, we designed another heuristic to create astarting individual: First, the bottom level of each task is c
43、omputed assuming that each task is allocated to one processor. Having the bottom level, EMTS determines the critical path of the PTG. The idea is to assign all processors of the system to nodes on the critical path. Sinc
44、e several nodes on a precedence level have similar or equal criticality (bottom level), we consider several almost critical paths to make this heuristic effective. Hence, we separate the nodes by precedence level (depth
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫僅提供信息存儲(chǔ)空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- [雙語翻譯]---外文翻譯---同類集群上并行任務(wù)圖的進(jìn)化調(diào)度(節(jié)選)
- [雙語翻譯]--外文翻譯---同類集群上并行任務(wù)圖的進(jìn)化調(diào)度(節(jié)選)
- 2011年--外文翻譯---同類集群上并行任務(wù)圖的進(jìn)化調(diào)度(英文).PDF
- 2011年--外文翻譯---同類集群上并行任務(wù)圖的進(jìn)化調(diào)度(英文).pdf
- 2011年--外文翻譯---同類集群上并行任務(wù)圖的進(jìn)化調(diào)度
- 2011年---外文翻譯---同類集群上并行任務(wù)圖的進(jìn)化調(diào)度(節(jié)選)
- 2011年--外文翻譯---同類集群上并行任務(wù)圖的進(jìn)化調(diào)度(節(jié)選).docx
- 2011年--外文翻譯---同類集群上并行任務(wù)圖的進(jìn)化調(diào)度(節(jié)選).DOCX
- [雙語翻譯]鄉(xiāng)鎮(zhèn)選舉外文翻譯(英文)
- [雙語翻譯]外文翻譯--品牌資產(chǎn)的價(jià)值(英文)
- [雙語翻譯]高層建筑外文翻譯(英文)
- 外文翻譯--gpu集群的混合并行編程
- [雙語翻譯]鄉(xiāng)鎮(zhèn)選舉外文翻譯(英文).PDF
- [雙語翻譯]海運(yùn)外文翻譯--安全海運(yùn)的表現(xiàn)(英文)
- [雙語翻譯]外文翻譯-醫(yī)學(xué)圖像水印基準(zhǔn)(英文)
- [雙語翻譯]高層建筑外文翻譯(英文).PDF
- [雙語翻譯]外文翻譯--關(guān)于設(shè)施布置問題的調(diào)查(英文)
- [雙語翻譯]外文翻譯--影響公司盈利能力的因素(英文)
- [雙語翻譯]--外文翻譯---dsp上的通用實(shí)時(shí)操作系統(tǒng)的設(shè)計(jì)和實(shí)現(xiàn)(英文)
- [雙語翻譯]--橋梁外文翻譯-預(yù)應(yīng)力鋼梁的性能(英文)
評論
0/150
提交評論