Introduction

‣

Preface: My (Tyler) Background and Motivation

Drifting has been a difficult problem in both optimal control and reinforcement learning. While there are many reasons why, one of the main issues is this:

💡

The drifting maneuver is entirely opposite to standard turning.

In addition, the maneuver is unstable. In a physical system, we call a state “unstable” if the world naturally wants to move away from it. For example, standing on one leg or balancing a pencil on its tip. Without precisely controlling a drift, the car will very easily spin out or come to a stop.

However, instability is extremely natural for animals. In fact, we often use it to our advantage (like running and jumping). For a long time, robots were always built to be 100% stable all of the time. That’s why they used to look like this:

Looks like it just pooped itself

Controlled Instability is currently one of the many things that separates animals apart from robots for physical tasks. As it turns out, drifting happens to be a perfect task to get some hands-on experience for this without breaking the bank. Because, to execute a drift, the robot must first destabilize itself then quickly regain control after the turn. But, one wrong move and it will easily spin out.

So, how do we teach the robot to drift?

Environment

Assuming you’ve read about configclass and our config structure here: Installation, Setup & Codebase

We’ll be referencing the MushrDriftRLEnvCfg here: GitHub WheeledLab/source/wheeledlab_tasks/wheeledlab_tasks/drifting/mushr_drift_env_cfg.py at main · UWRobotLearning/WheeledLab. You can take a quick scan yourself first, or just follow along here.

💡

Before you read about the configs below: as a challenge, think about what they might contain and how you might implement them yourself. You will likely have to change the settings to fit your pipeline.

‣

Observations (Policy Input)

‣

Actions (Policy Output)

‣

Rewards

‣

Events

‣

Terminations

‣

Drifting

Introduction

Environment

Observations (Policy Input)

Actions (Policy Output)

Rewards

Events

Terminations

Curriculum

Training

References