版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、<p><b> 畢業(yè)設(shè)計(jì)(論文)</b></p><p><b> 外文資料翻譯 </b></p><p> 學(xué)生姓名: </p><p> 學(xué) 號(hào): </p>&l
2、t;p> 所在學(xué)院: </p><p> 專 業(yè): </p><p> 指導(dǎo)教師: </p><p> 2013年 4 月 18 日</p><p>
3、; Hybrid Evolutionary Algorithm based solution for Register Allocation for Embedded Systems</p><p> Anjali Mahajan</p><p> G H Raisoni College of Engineering, Nagpur, India</p><p&g
4、t; Email : armahajan@rediffmail.com</p><p><b> M S Ali</b></p><p> Prof. Ram Meghe Institute of Technology and Research,</p><p> Badnera, Amravati, India</p>
5、<p> Email : softalis@hotmail.com</p><p> Abstract— Embedded systems have an ever-increasing need for optimizing compilers to produce high quality codes with a limited general purpose register set. Ei
6、ther memory or registers are used to store the results of computation of a program. As compared to memory, accessing a register is much faster, but they are scarce resources and have to be utilized very efficiently. The
7、optimization goal is to hold as many live variables as possible in registers in order to avoid expensive memory accesses. </p><p> I. INTRODUCTION</p><p> Register allocation is one of the mos
8、t important optimizations a compiler performs and is becoming increasingly important as the gap between processor speed and memory access time widens. Its goal is to find a way to map the temporary variables used in a pr
9、ogram into physical memory locations (either main memory or machine registers). </p><p> Accessing a register is much faster than accessing memory; therefore one tries to use registers as much as possible.
10、Of course, this is not always possible, thus some variables must be transferred (spilled) to and from memory. This has a cost, the cost of load and store operations, which should be avoided as much as possible. Typically
11、, this degrades runtime performance and increases power consumption. Therefore, an efficient mapping should generally minimize the register requirements and the nu</p><p> This approach attempts to combine
12、the flexibility of general-purpose programmable processors with the performance achieved by domain-specific architectureoptimizations. Compilers for embedded processors must cope with these architectural optimizations a
13、nd be able to exploit them.</p><p> This paper presents a heuristic algorithm for graph coloring register allocation problem for embedded systems based on a new crossover operator and a new local search fun
14、ction. </p><p> Graph coloring abstracts the problem of assigning registers to live ranges in a program into the problem of assigning colors to nodes in an interference graph. The register allocator attempt
15、s to “color” the graph with a finite number of machine registers, with one constraint that any two nodes connected by an interference edge must be colored with different registers.</p><p> To model register
16、 allocation as a graph coloring problem, the compiler first constructs an interference graph G. The nodes in G correspond to live ranges, and the edges represent interferences. Thus, there is an edge in G from node i to
17、node j if live range of i interferes with live range of j, that is, if they are simultaneously live at some point and cannot occupy the same register.</p><p> To find an allocation from G, the compiler look
18、s for a k-coloring of G, that is, an assignment of k colors to the nodes of G such that adjacent nodes always have distinct colors. If we choose k to match the number of machine registers, then we can map a k-coloring o
19、f G into a feasible register assignment for the underlying code. Because graph coloring is NP-complete, the compiler uses a heuristic method to search for a coloring; it is not guaranteed to find a k-coloring for all k-c
20、olorable grap</p><p> Spilling one or more live ranges creates a new and different interference graph. The compiler proceeds by iteratively spilling some live ranges and attempting to color the resulting ne
21、w graph. In the context of embedded processors, the most important restriction of the graph coloring approach is that it is based on the assumption of a homogenous register set. The different phases of register alloca
22、tion are differentlyimpacted By embedded processor irregularities.</p><p> II. RELATED WORK</p><p> Register allocation has been widely addressed in literature, and many approaches have been p
23、roposed. Most of them are related to Chaitin’s [6] approach, but very few address problems raised by embedded processor’s architecture like support for irregular registers via register classes and register set concatenat
24、ion. Although [13] proposed an approach using integer-programming supporting irregular register sets, this approach requires modeling of register constraints by inequations, making it diff</p><p> Embedded
25、processors are often characterized by a small number of registers. Algorithms and heuristics have been adapted in our infrastructure to limit spill when few registers are available. Ref. [17] presents a generalization of
26、 the degree < K test , called the <p, q> test, to handle irregular register sets and register classes. VPO [13] is a portable optimizer for DSPs, based on the Zephyr retargetable compiler infrastructure. VPO add
27、ress the problem of heterogeneous register classes by comput</p><p> Some researchers attempt to use non-graph-coloring methods. Koes[15] proposes a progressive register allocator which uses a multi-commodi
28、ty networkflow mode tp represent the intricacies of irregular architectures. Scholz[18] formulates register allocation as a partitioned boolean quadratic optimization problem that allows generic modeling of processors pe
29、culiarities. Hirnschrott [11] used Partitioned Boolean Quadratic Programming for register allocation shows better results in spill costs and co</p><p> There are many local search algorithms for graph color
30、ing, such as simulated annealing [7], tabu search[10],etc Galinier and Hao[8] introduced a algorithm based on hybrid evolutionary algorithm. An important feature of this algorithm is a specialized crossover but the mutat
31、ion operator of the GA is replaced by an LS operator.</p><p> III. A HYBRID EVOLUTIONARY ALGORITHM</p><p> Evolutionary algorithms involve natural evolution and genetics. The genetic algorithm
32、 is a classical method in this category. It has been applied to various optimization problems. There are several other methods like genetic programming, ant colony optimization, etc. The simple evolutionary algorithms ar
33、e not generally efficient for complex combinatorial problems. The performance of the evolutionary algorithms is improved with the addition of problem specific knowledge. Specialized operators are</p><p> An
34、 approach for combinatorial optimization is to embed local search into the framework of population based evolutionary algorithm, leading to hybrid evolutionary algorithm. HEA is based on two elements: an efficient local
35、search (LS) operator and a highly specialized crossover operator. The basic idea consists in using the crossover operator to create new and potentially interesting configurations which are then improved by the LS operato
36、r.</p><p> We present a hybrid evolutionary algorithm for graph coloring based on a new crossover operator for register allocation problem and a new local search function. This section presents our algorith
37、m- Hybrid evolutionary Algorithm for Graph coloring Register Allocation with proposed operator called crossover by conflict-free sets(CCS).</p><p> A. The Algorithm</p><p> The HEA consists of
38、 a genetic component and a local search (LS). The genetic component initializes and evolves a population of solutions. The general algorithm is as given below:</p><p> Input : Interference graph , IG = ( V,
39、E ) ; number of</p><p> registers , k</p><p> Output : best configuration</p><p><b> begin</b></p><p> P = generate_population(|P|)</p><p>
40、<b> iter = 0</b></p><p> while ( iter < MaxIter or popu-diversity>0 ) do</p><p> (p1, p2) = select_parents(P)</p><p> p = crossover (p1, p2)</p><p&g
41、t; p = local_search( p, L )</p><p> P = update_population(P,p)</p><p> iter =iter + 1</p><p><b> endwhile</b></p><p><b> end</b></p>
42、<p> The algorithm first builds an initial population of configurations (generate_population) and then performs a series of cycles called generation. At each generation, two configurations p1 and p2 are chosen in
43、the population (select_parents). A crossover is then used to produce an offspring p from p1 and p2 (crossover). The LS operator is applied to improve p for a fixed number L of iterations (local_search). Finally, the impr
44、oved configuration p is inserted in the population by replacing another </p><p> Population diversity is calculated as the average distance between all configurations in the population. For two configuratio
45、ns p1 and p2, the distance between p1 and p2 is the minimum number of elementary transformations necessary to transform p1 into p2.</p><p> In our approach, we consider the partition method for string repre
46、sentation [8]. Each solution Pi partitions the variables into register classes, Pi = {R1 , R2 , …. Rk} where each class Ri includes the live ranges of variables that are mapped to the registers ri and k is the total numb
47、er of registers. Given two parents p1= { R11 , R12 , …. R1k} and p2= { R21 , R22 , …. R2k}, the partial configuration will be a set { R1 , R2 , …. Rk} of disjoint sets of nodes where each subset Ri is included in a</p
48、><p> B. Initial population generation</p><p> Generally the initial population is generated using the DSatur algorithm [5], which is a graph coloring heuristic. It considers the saturation degre
49、e of each node, which can be defined as the number of different colors to which the node is adjacent to. In the register allocation problem, both the spill costs and the degree of the nodes in a given interference graph
50、are considered. We adopt the new metric called the spill degree proposed by [17] that can be used for ordering the nodes. The spill </p><p> SDegree1(i) = SCost(i) x Degree(i)</p><p> SDegree2
51、(i) = SCost(i) x Degree2(i)</p><p> SDegree3(i) = SCost(i) (1)</p><p> In these expressions, SCost(i) is the spill cost and Degree(i) is the number of edges incident to node
52、i. In order to generate the initial population, the spill degrees are set using a combination of the three equations given in (1). The nodes are sorted in decreasing order of spill degrees. At each iteration, an unsigned
53、 node with the maximum spill degree is mapped to the register class with the lowest possible index value, where the selected node should be conflict free with the other nodes in</p><p> C. The crossover ope
54、rator</p><p> The crossover used here is the new proposed operator Crossover by Conflict-free Sets (CCS). Given two parent configurations p1= { R11 , R12 , …. R1k} and p2= { R21 , R22 , …. R2k}, chosen rand
55、omly by the select_parents operator from the population, the algorithm crossover (p1, p2) builds an offspring p= {R1 , R2 , …. Rk} as follows</p><p> Input: configurations p1= { R11 , R12 , …. R1k}</p>
56、;<p> and p2= { R21 , R22 , …. R2k}</p><p> Output: configuration p= { R1 , R2 , …. Rk}</p><p><b> begin</b></p><p> // consider p1</p><p> if c
57、onflict(p1)>0 then</p><p> call function conflict_free(p1)</p><p> // consider p2</p><p> if conflict(p2)>0 then</p><p> call function conflict_free(p2)</p
58、><p> for i (1< i < k ) do</p><p> CfR1n= max q?p1 Cf_Individual(q)</p><p> // CfR1n is the partition with maximum number of</p><p> //conflict-free variables fr
59、om p1</p><p> CfR2n= max q?p2 Cf_Individual(q)</p><p> // CfR2n is the partition with maximum number of</p><p> // conflict-free variables from p2</p><p> choose j
60、such that | CfR1n∩?CfR2j| is maximum</p><p> Ri = |CfR1n ∪?CfR2j|</p><p> SpillQuality(Ri)=Spillcost(Ri)*Reg_class_Conflict(Ri)</p><p> remove the nodes of Ri from p1 and p2</
61、p><p><b> endfor</b></p><p> assign randomly the nodes of R- (Rl U…U Rk)</p><p><b> end</b></p><p> function conflict_free(p)</p><p&
62、gt;<b> begin</b></p><p> Cf_Individual=?</p><p> for i ( 1< i < k) do</p><p> if Reg_class_Conflict(i) >0 then</p><p> // Reg_class_Conflict(i
63、) is the total number of conflicts</p><p> // in ith register class</p><p><b> begin</b></p><p> initialize t to 0</p><p> while (t < (Reg_class_Conf
64、lict(i)/2)) do</p><p> select a variable m from Ri where</p><p> Conflict(m)=maxc?Ri{Conflict(c)} or</p><p> spillcost(m)=minc?Ri{spillcost(c)}</p><p> remove varia
65、ble m from Ri</p><p> t=t+Conflict(m)</p><p><b> endwhile</b></p><p> CfR(i)= Ri</p><p><b> endif</b></p><p> Cf_Individual= C
66、f_Individual∪ CfR(i)</p><p><b> endfor</b></p><p><b> end</b></p><p> He algorithm builds step by step the k classes Rl , R2 … Rk of the offspring. If any
67、 register set of the parent has conflict, the algorithm first determines the conflict free sets of all the register classes of each parent. The function to find the conflict free set is based on the heuristic from [20] f
68、or determining the conflict-free set with the maximum number of nodes of each register class. In that, initially the conflict-free set includes all variables in Rl. The total number of conflicts o</p><p> A
69、t the end of k steps, some nodes may remain unassigned. These are then assigned to a class which has minimum conflict factor.</p><p> D. The Local Search LS operator</p><p> After a solution i
70、s generated using the crossover operator, the local search phase improves it before inserting it into the population for a maximum of L iterations. We have used a Local search operator LS. It uses a 1-exchange neighborho
71、od. Formally, given a partition { R1 , R2 , …. Rk}, a neighboring partition is obtained by assigning a single variable to other register set, i.e., a variable vi is removed from the original register class Ri to which it
72、 belongs to and where vi is already in con</p><p> A variable vi which has maximum conflict factor is selected. Assume that vi is already assigned to class k. If the fitness value of the register class k wi
73、th vi is more than the fitness of register class j without considering vi, then vi is moved from register class k to register class j. This process is repeated for other register classes also. The register class which ha
74、s minimum value is the final register class for variable vi.</p><p> The same process is repeated with a new variable according to the decreasing order of conflict factor. In this way the quality of generat
75、ed offspring is improved. It is then inserted in the population by replacing the worst of the two parents.</p><p> E. Fitness function calculation</p><p> To solve a register allocation proble
76、m, we consider a set of k - register classes. In the interference graph IG = (V, E), all variables (i.e., nodes) vi are assigned to k - register classes. The nodes i and j in an interference graph are said to be conflict
77、-free, when there is no edge connecting them. Conflict(m) denotes the conflict of a variable m in register class Ri. Spillcost is the spill cost of variable m. The fitness of a register class is given as</p><p
78、> fitness(Ri)=∑??∈??Conflict(m)*Spillcost(m)/degree(m) </p><p><b> (3)</b></p><p> The total sum of fitness values of all register classes gives the fitness value of the given
79、individual </p><p><b> k</b></p><p> Fitness = ∑ fitness(i) (4)</p><p><b> i=1</b></p><p> The goal of the optimization process is t
80、o minimize fitness until zero.</p><p> IV. EXPERIMENTAL EVALUATION</p><p> The experiments are based on MachineSUIF [19] compiler research framework. The register allocator of MachineSUIF impl
81、ements George-Appel’s ‘Iterated Register Coalescing’ algorithm [9]. Our experimental analysis includes three algorithms We compare our algorithm with these algorithms. The first is the MachineSUIF’s default algorithm[9]
82、with extenstions by Smith-Ramsey and Holloway [19] for handling register aliasing. We denote it by IRC. The second is based on GPX operator[8]. The third is optima</p><p> We applied the algorithms to 6 emb
83、edded and real time applications. For each application, the population size is set to 20 and LS iterations are all set from 500 to 2000. For each application, we run the allocator five times and average the results.</
84、p><p> Performance evaluation was done with respect to the following parameters : number of memory accesses required , spill loads, spill costs, the compile time needed by the allocator, the execution time of
85、the generated code, including the number of load/store generated , size of allocator itself.</p><p> Figure 1. Number of memory accesses</p><p><b> TABLE I </b></p><p>
86、; SPILL COST OF INSTRUCTION</p><p><b> TABLE II</b></p><p> RATIO OF THE SPILL LOADS PRODUCED</p><p> Comparison of number of memory accesses is the comparison of t
87、otal static number of load and store instructions inserted by each register allocator. Figure 1. compares the number of load/store instructions in the assembly code. The HEA inserts fewer memory- access instruction than
88、ORA, 1.6 % fewer memory- access instruction than IRC and 8.9 % fewer memory- access instruction than GPX.</p><p> Table I give the spill costs of all the algorithms. For lowest population size the spill cos
89、t is less. For each benchmark given, the spill cost of each variable is set to the number of occurrences of the variables. The spill costs of the variables are the average values. The IRC algorithm gives the highest tota
90、l spill cost. The HEA algorithm produce less spill cost than the maximum in four tests and it outperforms the IRC and GPX algorithms in all tests. Genetic operator systematically eliminate</p><p> Spill loa
91、ds refers to additional number of loads incurred by the allocation algorithm. Spill loads give an indication of how well the allocator is able to perform the task. The number of spill loads is highly correlated with appl
92、ication running time.</p><p> We calculate the dynamic number of spill loads added to each module of the program by multiplying the number of spill loads added to each block by the number of times that bloc
93、k is executed. Then we sum the number of dynamic spill loads added to each block. We obtain the dynamic number of spill loads for the entire program by summing the number of dynamic spill loads added to each module.</
94、p><p> Table II shows the spill loads for each allocator as a ratio to spill loads generated by IRC allocator (considered as base allocator for comparison). The numbers are given as geometric mean. We see impr
95、ovements of HEA over other allocator. Table III gives results in terms of compile time and run time. We observe that in most of the cases, the performance of different allocators were almost similar in terms of compile t
96、imes; however, the execution time of the code generated by our method is less </p><p> Code quality is measured in terms of number of loads/stores generated for of the program and the number of static instr
97、uctions generated. Table IV give the total number of loads/stores generated for 8 registers. Our algorithm is superior to all other algorithms.</p><p> V. CONCLUSION</p><p> Register allocatio
98、n for embedded systems is complex having to deal with irregular architectural characteristics and to meet strict requirements.</p><p> In this paper, we propose a novel crossover operator for graph coloring
99、 register allocation problem for embedded systems, based on hybrid evolutionary algorithm. Compared to classical graph coloring register allocation algorithms, this technique can deal with irregular architectural charact
100、eristics of embedded processors more uniformly. Our experimental results indicate that compared to other algorithms, HEA is one of the most efficient algorithms with highly specialized, domain specific crossov</p>
101、<p> Population size of the problem should be set properly. Compared to the size of population, the length of iterations L of LS is more critical on the performance of the algorithm.</p><p> In the f
102、uture, we intend to test our HEA heuristic on graphs using coalescing and on chordal interference graphs generated from a large set of applications. HEA can also be applied to other compiler optimization problems such as
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 嵌入式系統(tǒng)寄存器分配-啟發(fā)式與進(jìn)化算法.pdf
- 適應(yīng)嵌入式系統(tǒng)編譯器的寄存器分配技術(shù)研究與實(shí)踐.pdf
- 基于龍芯Ⅰ的全局寄存器分配研究.pdf
- 16位嵌入式芯片C編譯器中寄存器分配與優(yōu)化技術(shù)研究.pdf
- 相連多寄存器組體系結(jié)構(gòu)上的寄存器分配技術(shù).pdf
- 基于嵌入式技術(shù)的傳感器信號(hào)無線傳輸解決方案的研究.pdf
- 外文翻譯---基于linux的嵌入式采集系統(tǒng)
- 基于fpga的嵌入式系統(tǒng)設(shè)計(jì)外文翻譯
- 基于fpga的嵌入式系統(tǒng)設(shè)計(jì)外文翻譯
- 手持嵌入式終端的圖像應(yīng)用解決方案.pdf
- 面向嵌入式CPU的高密度寄存器堆設(shè)計(jì)技術(shù)研究.pdf
- 32位高性能嵌入式CPU中寄存器堆模塊的設(shè)計(jì)和實(shí)現(xiàn).pdf
- 嵌入式存儲(chǔ)器的修復(fù)在ATE上的測(cè)試解決方案.pdf
- 主流嵌入式linux下gui解決方案
- 基于串聯(lián)寄存器和純輪換寄存器構(gòu)造de Bruijn序列的研究.pdf
- 基于fpga的嵌入式系統(tǒng)設(shè)計(jì)外文翻譯(中文)
- 基于ZigBee的嵌入式無線網(wǎng)關(guān)解決方案研究.pdf
- word圖片嵌入式后無法顯示完整的解決方案
- 基于fpga的嵌入式系統(tǒng)設(shè)計(jì)外文翻譯(英文)
- 基于虛擬寄存器的中間語言
評(píng)論
0/150
提交評(píng)論