版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、<p> Multi-Agent Quadrotor Testbed Control Design: Integral Sliding Mode vs. Reinforcement Learning</p><p> Steven L. Waslander, Gabriel M. Hoffmann</p><p> Ph.D. Candidate Aeronautics
2、and Astronautics Stanford University</p><p> {stevenw, gabeh}@stanford.edu</p><p> Jung Soon Jang Research Associate Aeronautics and Astronautics Stanford University jsjang@stanford.edu</p
3、><p> Claire J. Tomlin Associate Professor Aeronautics and Astronautics Stanford University tomlin@stanford.edu</p><p> Abstract—The Stanford Testbed of Autonomous Rotorcraft for Multi-Agent Con
4、trol (STARMAC) is a multi-vehicle testbed currently comprised of two quadrotors, also called X4-?yers, with capacity for eight. This paper presents a comparison of control design techniques, speci?cally for outdoor altit
5、ude control, in and above ground effect, that accommodate the unique dynamics of the aircraft. Due to the complex air?ow in- duced by the four interacting rotors, classical linear techniques failed to prov</p>&l
6、t;p> I. INTRODUCTION </p><p> As ?rst introduced by the authors in [1],the Stanford Testbed of Autonomous Rotorcraft for Multi-Agent Control(STARMAC) is an aerial platform intended to validate novel m
7、ulti-vehicle control techniques and present real-world problems for further investigation.The base vehicle for STARMAC is a four rotor aircraft with ?xed pitch blades, referred to as a quadrotor, or an X4-?yer.They are
8、capable of 15 minute outdoor ?ights in a 100m square area[1].</p><p> Fig. 1. One of the STARMAC quadrotors in action.</p><p> There have been numerous projects involving quadrotors to date,wi
9、th the ?rst known hover occurring in October,1922[2]. Recent interest in the quadrotor concept has been sparked by commercial remote control versions, such as the DraganFlyer IV[3]. Many groups [4]–[7]have seen significa
10、nt success in developing autonomous quadrotor vehicles. To date,however,STARMAC is the only operational multi-vehicle quadrotor platform capable of autonomous outdoor ?ight, without tethers or motion guides.</p>&
11、lt;p> The ?rst major milestone for STARMAC was autonomous hover control,with closed loop control of attitude, altitude and position. Using inertial sensing, the attitude of the aircraft is simple to control, by apply
12、ing small variations in the relative speeds of the blades. In fact, standard integral LQR techniques were applied to provide reliable attitude stability and tracking for the vehicle.Position control was also achieved wit
13、h an integral LQR, with careful design in order to ensure spectral sep</p><p> Unfortunately, altitude control proves less straightforward. There are many factors that affect the altitude loop specifically
14、that do not amend themselves to classical control techniques. Foremost is the highly nonlinear and destabilizing effect of four rotor downwashes interacting. In our experience, this effect becomes critical when motion is
15、 not damped by motion guides or tethers. Empirical observation during manual ?ight revealed a noticeable loss in thrust upon descent through the highly </p><p> In order to accommodate this combination of
16、noise and disturbances, two distinct approaches are adopted. Integral Sliding Mode (ISM) control[10]–[12] takes the approach that the disturbances cannot be modeled, and instead designsa control law that is guaranteed to
17、 be robust to disturbances as long as they do not exceed a certain magnitude. Model-based reinforcement learning[13] creates a dynamic model based on recorded inputs and responses, without any knowledge of the underlyin
18、g dynamics, and </p><p> II. SYSTEM DESCRIPTION </p><p> STARMAC consists of a ?eet of quadrotors and a ground station. The system communicates over a Bluetooth Class 1 network. The core of t
19、he aircraft are microcontroller circuit boards designed and assembled at Stanford, for this project. The microcontrollers run real-time control code, interface with sensors and the ground station, and supervise the syst
20、em. </p><p> The aircraft are capable of sensing position, attitude, and proximity to the ground. The differential GPS receiver is theTrimble Lassen LP, operating on the L1 band, providing 1Hz updates. The
21、IMU is the MicroStrain 3DM-G, a low cost, light weight IMU that delivers 76 Hz attitude, attitude rate, and acceleration readings. The distance from the ground is found using ultrasonic ranging at 12 Hz.</p><p
22、> The ground station consists of a laptop computer, to interface with the aircraft, and a GPS receiver, to provide differential corrections. It also has a battery charger, and joysticks for control-augmented manual ?
23、ight, when desired.</p><p> III. QUADROTOR DYNAMICS</p><p> The derivation of the nonlinear dynamics is performed in North-East-Down (NED) inertial and body ?xed coordinates. Let {eN , eE , eD
24、 } denote the inertial axes, and {xB , yB , zB } denote the body axes, as de?ned in Figure 2. Euler angles of the body axes are {φ, θ, ψ} with respect to the eN , eE and eD axes, respectively, and are referred to as roll
25、, pitch andyaw. Let r be de?ned as the position vector from the inertial origin to the vehicle center of gravity (CG), and let ωB be de?ned as the a</p><p> Fig.2. Free body diagram of a quadrotor aircraft.
26、</p><p> The rotors, numbered 1?4, are mounted outboard on the xB,yB,?xB and -yB axes,respectively, with position vectors ri with respect to the CG. Each rotor produces an aerodynamic torque, Qi , and thrus
27、t, Ti , both parallel to the rotor’s axis of rotation, and both used for vehicle control.Here, , where ui is the voltage applied to the motors, as determined from a load cell test. In ?ight, Ti can vary greatly from this
28、 approximation. The torques, Qi , are proportional to the rotor thrust, and are giv</p><p> The body drag force is de?ned as DB , vehicle mass is m, acceleration due to gravity is g, and the inertia
29、matrix is I ∈ R3×3 . A free body diagram is depicted in Figure 2. The total force, F, and moment, M, can be summed as,</p><p><b> ?。?)</b></p><p><b> (2)</b></p&
30、gt;<p> The full nonlinear dynamics can be described as,</p><p><b> (3)</b></p><p> where the total angular momentum of the rotors is assumed to be near zero, because they
31、are counter-rotating. Near hover conditions, the contributions by rolling moment and drag can be neglected in Equations (1) and (2). De?ne the total thrust as The translational motion is de?ned by,</p><p>&l
32、t;b> ?。?)</b></p><p> Where Rφ,Rθ, and Rψ are the rotation matrices for roll, pitch, and yaw, respectively. Applying the small angle approximation to the rotation matrices,</p><p><
33、b> (5)</b></p><p> Finally, assuming total thrust approximately counteracts gravity, except in the eD axis.</p><p><b> ?。?)</b></p><p> For small angular vel
34、ocities, the Euler angle accelerations are determined from Equation (3) by dropping the second order term,ω×Iω, and expanding the thrust into its four constituents. The angular equations become,</p><p>
35、<b> (7)</b></p><p> Where the moment arm lengthl=||ri×zB||is identical for all rotors due to symmetry. The resulting linear models can now be used for control design. </p><p>
36、 IV. ESTIMATION AND CONTROL DESIGN</p><p> Applying the concept of spectral separation, inner loop control of attitude and altitude is performed by commanding motor voltages, and outer loop position contro
37、l is performed by commanding attitude requests for the inner loop. Accurate attitude control of the plant in Equation (7) is achieved with an integral LQR controller design to account for thrust biases. </p><p
38、> Position estimation is performed using a navigation ?lter that combines horizontal position and velocity information from GPS, vertical position and estimated velocity information from the ultrasonic ranger, and ac
39、celeration and angular rates from the IMU in a Kalman ?lter that includes bias estimates. Integral LQR techniques are applied to the horizontal components of the linear position plant described in Equation (6). The resul
40、ting hover performance is shown in Figure 6. </p><p> As described above, altitude control suffers exceedingly from unmodeled dynamics. In fact, manual command of the throttle for altitude control remains a
41、 challenge for the authors to this day. Additional complications arise from the ultrasonic ranging sensor, which has frequent erroneous readings, as seen in Figure 3. To alleviate the effect of this noise, rejection of i
42、nfeasible measurements is used to remove much of the non-Gaussian noise component. This is followed by altitude and altitude rat</p><p> Fig. 3. Characteristic unprocessed ultrasonic ranging data, displayin
43、g spikes, false echoes and dropouts. Powered ?ight commences at 185 seconds.</p><p> Integral Sliding Mode Control</p><p> A linear approximation to the altitude error dynamics of a quadrotor
44、aircraft in hover is given by,</p><p><b> ?。?)</b></p><p> where{x1, x2}={(rz,des?rz),( rz,des?r˙z)}are the altitude error states,ui is the control input, andξ(·) is a bounded
45、model of disturbances and dynamic uncertainty. It is assumed that ξ(·) satis?es ||ξ||≤γ where γ is the upper bounded norm of ξ(·). </p><p> In early attempts to stabilize this system, it was obser
46、ved that LQR control was not able to address the instability and performance degradation due to ξ(g, x). Sliding Mode Control (SMC) was adapted to provide a systematic approach to the problem of maintaining stability and
47、 consistent performance in the face of modeling imprecision and disturbances. However, until the system dynamics reach the sliding mani-fold, such nice properties of SMC are not assured. In order to provide robust contro
48、l th</p><p><b> ?。?)</b></p><p> Where Kp and Kd are proportional and derivative loop gains that stabilize the linear dynamics without disturbances. For disturbance rejection, a sli
49、ding surface,s, is designed,</p><p><b> ?。?0)</b></p><p> such that state trajectories are forced towards the manifold s= 0. Here,s0 is a conventional sliding mode design, Z is an a
50、dditional term that enables integral control to be included, and α, k∈R are positive constants. Based on the following Lyapunov function candidate,</p><p> , the control component,ud, can be determined such
51、 that V <0, guranteeing convergence to the sliding manifold.</p><p><b> ?。?1)</b></p><p> The above condition holds if z = ?α(up+kx2) and ud can be guaranteed to satisfy,</p&g
52、t;<p><b> ?。?2)</b></p><p> Since the disturbances,ξ(g, x), are bounded by γ, de?ne ud to be ud=?λs with λ∈R. Equation (11) becomes,</p><p><b> (13)</b></p>
53、;<p> and it can be seen that λ|s| ?γ >0. As a result, for up and ud as above, the sliding mode condition holds when,</p><p><b> (14)</b></p><p> With the input derived
54、above, the dynamics are guaranteed to evolve such that s decays to within the boundary layer,, of the sliding manifold. Additionally, the system does not suffer from input chatter as conventional sliding mode controller
55、s do, as the control law does not include a switching function along the sliding mode.</p><p> V. REINFORCEMENT LEARNING CONTROL</p><p> An alternate approach is to implement a reinforcement l
56、earning controller. Much work has been done on continuous state-action space reinforcement learning methods[13], [14]. For this work, a nonlinear,nonparametric model of the system is ?rst constructed using ?ight data, ap
57、proximating the system as a stochastic Markov process[15], [16]. Then a model-based reinforcement learning algorithm uses the model in policy-iteration to search for an optimal control policy that can be implemented on t
58、he em</p><p> In order to model the aircraft dynamics as a stochastic Markov process, a Locally Weighted Linear Regression (LWLR) approach is used to map the current state,S(t)∈Rns, and input,u(t)∈Rnu, onto
59、 the subsequent state estimate,S(t+ 1).</p><p> In this application,,where V is the battery level. In the altitude loop, the input,u∈R, is the total motor power,u. The subsequent state mapping is the summat
60、ion of the traditional LWLR estimate, using the current state and input, with the random vector,v∈Rns, representing unmodeled noise. The value for v is drawn from the distribution of output error as determined by using a
61、 maximum likelihood estimate[16] of the Gaussian noise in the LWLR estimate. Although the true distribution is not perfect</p><p> The LWLR method[17] is well suited to this problem, as it ?ts a non-paramet
62、ric curve to the local structure of the data. The scheme extends least squares by assigning weights to each training data point according to its proximity to the input value, for which the output is to be computed. The t
63、echnique requires a sizable set of training data in order to re?ect the full dynamics of the system, which is captured from ?ights ?own under both automatic and manually controlled thrust, with the attitud</p><
64、;p> For m training data points, the input training samples are stored in X∈R(m)×(ns+nu+1), and the outputs corresponding to those inputs are stored inY∈Rm×ns. These matrices are de?ned as</p><p&g
65、t;<b> ,(15)</b></p><p> The column of ones in X enables the inclusion of a constant offset in the solution, as used in linear regression.The diagonal weighting matrix W ∈ Rm×m , which act
66、s on X , has one diagonal entry for each training data point. That entry gives more weight to training data points that are close to the S(t) and u(t) for which S? (t + 1) is to be computed.</p><p> The dis
67、tance measure used in this work is</p><p><b> (16)</b></p><p> Where x(i) is the ith row of X, x is the vector,</p><p> and ?t parameter τ is used to adjust the range
68、 of in?uence of training points. The value for τ can be tuned by cross validation to prevent over- or under-?tting the data. Note that it may be necessary to scale the columns before taking the Euclidean norm to prevent
69、undue in?uence of one state on the W matrix. </p><p> The subsequent state estimate is computed by summing the LWLR estimate with v,</p><p><b> ?。?7)</b></p><p> Becau
70、se W is a continuous function of x and X, as x is varied, the resulting estimate is a continuous non-parametric curve capturing the local structure of the data. The matrix computations, in code, exploit the large diagona
71、l matrix W; as each Wi,i is computed, it is multiplied by row x(i), and stored in W X. </p><p> The matrix being inverted is poorly conditioned, because weakly related data points have little in?uence, so
72、their contribution cannot be accurately numerically inverted. To more accurately compute the numerical inversion, one can perform a singular value decomposition,</p><p> (XTW X) =UΣVT. Then, numerical error
73、 during inversion can be avoided by using the n singular values σi with values of , where the value of Cmax is chosen by cross validation. In this work,Cmax ≈10 was found to minimize numerical error, and was typically sa
74、tis?ed by n= 1. The inverse can be directly computed using the n upper singular values in the diagonal matrixΣn∈Rn×n, and the corresponding singular vectors, in Un∈Rm×n and Vn∈Rm×n. Thus, the stochastic Ma
75、rkov model becomes</p><p><b> ?。?8)</b></p><p> Next, model-based reinforcement learning is implemented, incorporating the stochastic Markov model, to design a controller. A quadrat
76、ic reward function is used,</p><p><b> ?。?9)</b></p><p> whereR:R2ns→R,C1>0 and C2>0 are constants giving reward for accurate tracking and good damping respectively, and is t
77、he reference state desired for the system. </p><p> The control policy maps the observed state S onto the input </p><p> Command u. In this work, the state space has the constraint of rz ≥0, a
78、nd the input command has the constraint of 0≤u≤ u max. The control policy is chosen to be</p><p><b> (20)</b></p><p> Where w∈R nc is the vector of policy coef?cients w1, . . . ,
79、wnc. Linear functions were suf?cient to achieve good stability and performance. Additional terms, such as battery level and integral of altitude error, could be included to make the policy more resilient to differing ?ig
80、ht conditions. Policy iteration is performed as explained in Algorithm 1. The algorithm aims to ?nd the value of w that yields the greatest total reward R total, as determined by simulating the system over a ?nite hori&l
81、t;/p><p> Algorithm 1 Model-Based Reinforcement Learning </p><p> 1: Generate set S0 of random initial states </p><p> 2: Generate set T of random reference trajectories </p>
82、<p> 3: Initialize w to reasonable values </p><p> 4:R best← ?∞,W best←w</p><p><b> 5: repeat</b></p><p> 6: Rtotal←0</p><p> 7: for s0∈S0, t∈T
83、 do</p><p> 8: S(0)←s0</p><p> 9: for t= 0 to tmax?1 do</p><p> 10: u(t)←π(S(t) , w)</p><p> 11: S(t+ 1)←LWL( R(S(t) , u(t) ) +v</p><p>
84、; 12: R total←R total+R(S(t+ 1))</p><p> 13: end for</p><p> 14: end for</p><p> 15: if R total> R best then</p><p> 16: Rbest←Rtotal,wbest←w&l
85、t;/p><p> 17: end if</p><p> 18: Add Gaussian random vector to w best, store as w </p><p> 19: until w best converges </p><p> In policy iteration, a ?xed set of
86、 random initial conditions and reference trajectories are used to simulate ?ights at each iteration, with a given policy parameterized by w. It is necessary to use the same random set at each iteration in order for conve
87、rgence to be possible[15]. After each iteration, the new value of w is stored as w best if it outperforms the previous best policy, as determined by comparing Rtotal to Rbest, the previous best reward encountered. Then,
88、a Gaussian random vector i</p><p> By using a Gaussian update rule for the policy weights,w, it is possible to escape local maxima of Rtotal. The highest probability steps are small, and result in re?nement
89、 of a solution near a local maximum of Rtotal. However, if the algorithm is not at the global maximum, and is allowed to continue, there exists a ?nite probability that a suf?ciently large Gaussian step will be performed
90、 such that the algorithm can keep ascending.</p><p> VI. FLIGHT TEST RESULTS</p><p> A. Integral Sliding Mode</p><p> The results of an outdoor ?ight test with ISM control can be
91、 seen in Figure 4. The response time is on the order of 1-2 seconds, with 5 seconds settling time, and little to no steady state offset. Also, an oscillatory character can be seen in the response, which is most likely be
92、ing triggered by the nonlinear aerodynamic effects and sensor data spikes described earlier.</p><p> Fig. 4. Integral sliding mode step response in outdoor ?ight test.</p><p> Compared to line
93、ar control design techniques implemented on the aircraft, the ISM control proves a signi?cant enhancement. By explicitly incorporating bounds on the unknown disturbance forces in the derivation of the control law, it is
94、possible to maintain stable altitude on a system that has evaded standard approaches.</p><p> B. Reinforcement Learning Control</p><p> One of the most exciting aspects of RL control design is
95、 its ease of implementation. The policy iteration algorithm arrived at the implemented control law after only 3 hours on a Pentium IV computer. Figure 5 presents ?ight test results for the controller. The high ?delity mo
96、del of the system, used for RL control design, provides a useful tool for comparison of the RL control law with other controllers. In fact, in simulation with linear controllers that proved unstable on the quadrotor, ?ig
97、ht p</p><p> The locally weighted linear regression model showed many relations that were not re?ected in the linear model, but that re?ect the physics of the system well. For instance, with all other state
98、s held ?xed, an upward velocity results in more acceleration at the subsequent time step for a throttle level, and a downward velocity yields the opposite effect. This is essentially negative damping. The model also show
99、s a strong ground effect. That is, with all other states held ?xed, the closer the vehi</p><p> Fig. 5. Reinforcement learning controller response to manually applied step input, in outdoor ?ight test. Spik
100、es in state estimates are from sensor noise passing through the Kalman ?lter.</p><p> The reinforcement learning control law is susceptible to system disturbances for which it is not trained. In particular,
101、 varying battery levels and blade degradation may cause a reduction in stability or steady state offset. Addition of an integral error term to the control policy may prove an effective means of mitigating steady state di
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫(kù)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 食品安全外文文獻(xiàn)翻譯(適用于畢業(yè)論文外文翻譯中英文對(duì)照)
- 電子信息工程外文翻譯--綜合布線的未來(lái)(適用于畢業(yè)論文外文翻譯+中英文對(duì)照)
- 基于b_s模式的教務(wù)管理系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)_畢業(yè)設(shè)計(jì)外文翻譯(適用于畢業(yè)論文外文翻譯+中英文對(duì)照)
- 畢業(yè)設(shè)計(jì)(論文)外文資料翻譯----倒立擺
- 畢業(yè)論文外文翻譯(中英文)
- 機(jī)械畢業(yè)設(shè)計(jì)英文外文翻譯-倒立擺系統(tǒng)
- 畢業(yè)設(shè)計(jì)論文外文中英文翻譯
- 單級(jí)倒立擺畢業(yè)設(shè)計(jì)外文翻譯
- 沖壓外文翻譯中英文對(duì)照論文
- 畢業(yè)設(shè)計(jì)論文外文中英文翻譯講解
- asp畢業(yè)論文中英文資料外文翻譯
- 智能小車畢業(yè)論文中英文資料外文翻譯
- labview畢業(yè)論文畢業(yè)論文中英文資料外文翻譯文獻(xiàn)
- 機(jī)械手臂課程畢業(yè)設(shè)計(jì)外文文獻(xiàn)翻譯@中英文翻譯@外文翻譯
- 模具設(shè)計(jì)與制造畢業(yè)論文中英文資料外文翻譯
- 排氣控制系統(tǒng)課程畢業(yè)設(shè)計(jì)外文文獻(xiàn)翻譯、中英文翻譯、外文翻譯
- 機(jī)械設(shè)計(jì)與制造畢業(yè)設(shè)計(jì)論文中英文翻譯外文翻譯
- 沖壓模具設(shè)計(jì)畢業(yè)外文翻譯@中英文翻譯@外文文獻(xiàn)翻譯
- 超聲波測(cè)距畢業(yè)論文中英文對(duì)照資料外文翻譯、中英對(duì)照、英漢互譯
- 畢業(yè)設(shè)計(jì)外文翻譯---單片機(jī)的組成(中英文)
評(píng)論
0/150
提交評(píng)論