STRUCTURED REINFORCEMENT LEARNING FOR TIME-OPTIMAL QUADROTOR FLIGHT
Annotation
The problem of synthesizing reactive, time-optimal control for quadcopters is aggravated by their multifaceted, underactuated dynamics and the complexity of solving boundary-value problems in real time. This work addresses these challenges, presenting a reinforcement learning framework that learns to autonomously navigate in collision-free environments with optimal waypoint-reaching policies. Our contributions include a cascaded actor architecture inspired by position-velocity separation in classical control to improve flight stability and smooth actions, as well as a composite reward function incorporating radial velocity and acceleration components, promoting maximal progress toward targets and steering the agent toward bang-bang-like maneuvers. Quantitative comparisons prove that our agent achieves smooth control actions, leading to optimal trajectories that adhere tightly with minimal deviations to the desired path.
Keywords
Постоянный URL
Articles in current issue
- ENERGI: A MULTIMODAL DATA CORPUS OF INTERACTION OF PARTICIPANTS IN VIRTUAL COMMUNICATION
- GENERAL ARCHITECTURE OF DISTRIBUTED AUTOMATED INFORMATION SYSTEMS FOR INTEGRATED MODELING FOR ASSESSING FOREST ECOSYSTEMS SUSTAINABILITY
- SOFTWARE EMULATOR OF THE DIGITAL PROCESSOR OF AN ELECTRONIC COMPUTER
- MULTI-AGENT SYSTEM CONTROL ALGORITHMS FOR COORDINATED ROUTE MOVEMENT
- DEVELOPMENT OF A LIBRARY OF MULTIBOND GRAPHS FOR MODELING MULTIBODY SYSTEMS
- ASSESSMENT OF RADIATION-INDUCED LOSSES IN FIBER-OPTIC SYSTEMS
- IMPROVING THE RELIABILITY OF OIL CHARACTERISTIC MEASUREMENTS BY A PROTON MAGNETIC RESONANCE RELAXOMETER AS PART OF A FLOW-THROUGH ANALYZER
- IMPROVED COMPLEX FOR STUDYING THE KINETICS OF EXTRACTION OF TARGET COMPONENTS IN A SOLID–LIQUID SYSTEM
- NOISE DIODE AS A BASIS FOR CREATING THERMO-ANEMOMETERS
- DEVELOPMENT OF A ROUTE TECHNOLOGY FOR MANUFACTURING PRODUCTS OF DISTRIBUTED INSTRUMENT MANUFACTURING