How many neurons do you need to play flappy bird perfectly?

Introduction

Research Focus:
- Investigate the neural network complexity required for optimal Flappy Bird performance.
- Define "perfect performance" as surpassing a handcrafted evaluation agent scoring 900/1000 over 10 runs.
Contributions:
- Development of a high-performing handcrafted agent.
- Analysis of network complexity versus performance.
Methodology Rationale:
- Opt for non-pixel-based learning to explore diverse model types beyond CNNs.

There are lots of interesting ways to extend this project, but I have to start focusing on communicating my findings instead of more discovery.

Related Work

Chen. Deep Reinforcement Learning for Flappy Bird. Stanford Final Project, 2015

Learns from pixels to play flappy bird
Uses DQN from Mnih et al.

Yenchenlin. Deep Reinforcement Learning for Flappy Bird. Github, 2016

Implementation of the approach. Inspired by the work of Chen.

johnnycode8. Train the DQN Algorithm on Flappy Bird, Youtube, 2024

Implements the dueling-dqn approach on the simplified observation space.

xviniette. Flappy Bird with PPO, Github, 2024

Implements the PPO approach on the simplified observation space.

markub3327. flappy-bird-gymnasium, Github, 2023

Implements the Dueling DQN approach on two state representations: (1) the simplified observation space (2) LIDAR measurements of the environment.

SarvagaVaish. FlappyBirdRL, Github, 2014

Simplified observation space and used a QTable to implement a perfect agent.

kyokin78. rl-flappybird, Github, 2019

Further simplified the simplified observation space and used a QTable to implement a perfect agent. Inspired by the work of SarvagaVaish.

foodsung. DRL-FlappyBird, Github, 2016

Inspired by yenchenlin. Provides an even more mature implementation.

A winning state for this project is finding a perfect scoring bot for significantly less memory than a q-table for every state.

Flappy Bird

Mobile game from 2013 that went viral
Game mechanics
Simple game mechanics make it an ideal candidate for reinforcement learning tasks
RL environment exists at flappy-bird-gymnasium
Simplified observation space 12 features, [0,1]

Reinforcement Learning Techniques

Reinforcement learning techniques are roughly divided into two categories:

Value-based methods

Value-based methods

Implicitly assume optimal action is not a mixed strategy.
More sample efficient, because the value function is computed per action.

Policy-based methods

Assume that there is a mixed strategy for each state that is optimal.
More stable, because they naturally explore more in each state instead of using hard-coded exploration strategies.

Benchmarking Reinforcement Learning

Best achievable average score with a score limit of 1000.
Sample inefficiency does not count
How close does it get to handcrafted agent.

Handcrafted evaluation function

I made a handcrafted evaluation function.
It gets approximately 900 on a training run of 10 runs with a score limit of 1000.
I can measure the success of the reinforcement learning agents utilizing how

Simplified State Space

How well do the other benchmarks perform?
How well does my agent perform?
- Parameters that impacted the performance of my agent

DQN

Rough idea from Mnih et al. 2013, but incorporates improvements
Uses network architecture of [12, 64, 64, 2]
Double DQN
Dueling DQN

Name	Mean Score (1000 runs)	Std Score (1000 runs)
Handcrafted Agent	`<mean>`	`<std>`
dqn_flappybird_v1_1300000_steps	20.82	15.83

Note: Change the v1 to be more descriptive. Only supposed to be identified by me right now.

Results

Original training run was 30M learning steps.
Catastrophic forgetting around 1.2M learning steps, weird jumps in score after that.
Random chance tweaked parameters to get 900 average score. <Include tensorboard chart here>

Ablation DQN

Cartesian product of all possible combinations of the following:

Double DQN
Dueling DQN
{Prioritized Experience Replay, Hindsight Experience Replay}

Further Work

Learn directly from pixels
Experiment with transformer models
Experiment with Partially Observable Environments
Try Hindsight Experience Replay
Try Policy-based methods

Conclusion

Summary of findings
Comparative analysis
future work
research implications

Postmortem

One of my biggest mistakes was that I started by trying to reimplment my own QTable approach. This wasted lots of time and effort by trying to resolve problems that had already been solved in prior work. I thought that looking at their code would be "cheating." I did not see that I was not adding anything new to the field by doing this. It would have been better to review all prior work and try to replicate it before I built my own QTable approach.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
data		data
flappybird		flappybird
paper		paper
.continueignore		.continueignore
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
README.md		README.md
Slides.pptx		Slides.pptx
pyproject.toml		pyproject.toml
slides.md		slides.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

How many neurons do you need to play flappy bird perfectly?

Introduction

Related Work

Flappy Bird

Reinforcement Learning Techniques

Value-based methods

Policy-based methods

Benchmarking Reinforcement Learning

Handcrafted evaluation function

Simplified State Space

DQN

Results

Ablation DQN

Further Work

Conclusion

Postmortem

About

Uh oh!

Releases

Packages

Languages

akshaygulabrao/flappybyrd

Folders and files

Latest commit

History

Repository files navigation

How many neurons do you need to play flappy bird perfectly?

Introduction

Related Work

Flappy Bird

Reinforcement Learning Techniques

Value-based methods

Policy-based methods

Benchmarking Reinforcement Learning

Handcrafted evaluation function

Simplified State Space

DQN

Results

Ablation DQN

Further Work

Conclusion

Postmortem

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages