Machine Learning Unicycling

Teaching a computer to ride a simulated unicycle

3 min

This project is a combination of two of my passions: programming and unicycling. I used the Keras reinforcement learning library to train a machine learning model how to ride a unicycle. I published the code in this GitHub repo.

Overall Structure

This project uses the Gym library. This library is designed to provide a common interface between different machine learning models (the controller), and physical models of the thing being controlled (the plant).

The controller is a Keras-rl model, which uses a TensorFlow backend.

Each frame Gym tells my Keras model the current state of the unicycle. The Keras model makes a decision (push on the left pedal or right). This is fed into the model, which updates the state for the next frame. This new state, and is fed back into Keras, in addition to a reward to tell Keras how well it's doing.


Unicycles are a type of inverted pendulum. These problems are well suited to machine learning. (Yes, this can be solved with traditional PID controllers and fuzzy logic. I chose machine learning as something fun and educational.)

The physics engine used is a simple custom script which evaluates the inverted pendulum problem. It is a modified version of the CartPole-V0 environment from the gym library.

Unicycling is a lot like balancing a pencil on your finger. If the pencil starts to fall forwards, you accelerate your finger forwards. However unlike pencil balancing, this project is only in 1 dimension.

There are 4 states:

There is 1 input. The options are:

  1. push on the left pedal - hard
  2. push on the left pedal - medium
  3. push on the left pedal - soft
  4. don't push on either pedal
  5. push on the right pedal - soft
  6. push on the right pedal - medium
  7. push on the right pedal - hard

Machine Learning

The layers of the neural network are published here.

The environment code for the model normalises all the states so that they range from -1 to 1. This is because neural networks perform best when the inputs are the same order of magnitude.

The horizontal position of the unicycle is fed in as 3 separate variables. There is the total rotation of the wheel from the centre position. Additionally there is the sine and cosine of the wheel rotation. This is because the physical model is inherently trigonometric. Pushing down on horizontal pedals results in more torque than when the pedals are diagonal or vertical. Rather than force the neural network to figure out this relationship, I guided it by calculating the sine and cosine for it.

The reward function contains several components. The Keras-rl CartPole-V0 example simply returns 1 for each frame the pole hasn't fallen over. Because the unicycle model is more complicated, I found this wasn't sufficient. So I added more components to the reward function to help guide it towards success:


There is a lot of unsubstantiated hype about what machine learning can do, including the misnomer of "artificial intelligence". As technologists it's our job to ensure normal people don't get carried away with the hype.

One reason machine learning is well suited to unicycling is because it has a small action space, completely deterministic, and easily assessable. Anyone can tell you whether a unicyclist fell over or not. As a counter-example, using machine learning to flag terrorist propaganda or copyright infringement on the web is never going to be good, because humans can't agree on what counts and what doesn't. (example, example, example) How could we possibly train a computer to classify things based on categories we can't agree on? There are some problems which can't be solved by machine learning. Trying to throw machine learning at everything results in war criminals escaping punishment, and criminals being imprisoned for longer just because they're from a poor postcode. Another example of a problem which is foolishly being 'solved' by machine learning is adding people to the no-fly list based on postcode.

Machine learning is useful for some problems. Not all. Throwing CPUs and data at a problem isn't a guarantee for success.