For example,Бобцов

STRUCTURED REINFORCEMENT LEARNING FOR TIME-OPTIMAL QUADROTOR FLIGHT

Annotation

The problem of synthesizing reactive, time-optimal control for quadcopters is aggravated by their multifaceted, underactuated dynamics and the complexity of solving boundary-value problems in real time. This work addresses these challenges, presenting a reinforcement learning framework that learns to autonomously navigate in collision-free environments with optimal waypoint-reaching policies. Our contributions include a cascaded actor architecture inspired by position-velocity separation in classical control to improve flight stability and smooth actions, as well as a composite reward function incorporating radial velocity and acceleration components, promoting maximal progress toward targets and steering the agent toward bang-bang-like maneuvers. Quantitative comparisons prove that our agent achieves smooth control actions, leading to optimal trajectories that adhere tightly with minimal deviations to the desired path.

Keywords

Articles in current issue