Shibuya Sunrise

I become blind to the ordinary, everyday gifts of life when my mind wrestles me into believing that rattling the locked door even harder will somehow transport me beyond the skin that holds my flesh…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




How to train AI Game Bots with screen pixels as input

Overview of the paper “CURL: Contrastive Unsupervised Representations for Reinforcement Learning” by Laskin et al.

A big step towards building human-like game-playing AI is to remove the unfair advantage that game AIs have with the access to the internal game state information, which is not available to us directly on the screen while playing the game. This can be done by forcing the AI bot to only use the image that we see on the screen and ask it to make decisions based on that information alone.

From my own experiments in the past building a football AI bot, I have seen promising results with this approach, but have always faced one big problem. Training RL agents on raw pixel information requires lot of compute power and thousands of training samples, because the agent has to not only learn the game itself, but also has to learn to analyze these images and extract meaningful representation from them that are needed for good decision-making.

Here’s the core idea of the CURL framework. Consider an RL training process on images of FIFA. Normally, we feed the pixels of the game image directly to RL process and use a CNN to process this information. Uptil now, it hasn’t been clear what the best way is to train this CNN so that it makes the reinforcement learning process easier.

In this paper, the authors present a framework which adds a Contrastive Learning module to train this CNN encoder. It adds a separate contrastive loss to optimize the encoder’s output which can be used either before or along with the RL training process. This takes the image understanding workload off of RL learning process, thereby enabling us to provide fewer training samples and improving the sample efficiecy.

Let’s try to udnerstand at a high level what contrastive learning tries to do. Consider this screenshot of the game fifa as a base or an anchor in our unsupervised learning process.

Now, we provide mulitple options to our image encoder, one of which is derived from our base anchor image. We perform multiple data augmentation techniques like cropping the image, flipping it horizontally or vertically, or even changing the color temperature. Now, our encoder has to find out which of these options is the most similar to the anchor.

In order to do so correctly, our encoder learns to identify the important features in the image that can help it classify correctly. Thus, with this training process, we obtain an encoder that converts our image into representations ideal for feeding into our RL agent. This is a powerful learning technique for pixel-based RL training of game agents.

Add a comment

Related posts:

The Best Vegan Pesto Pasta

Not all of us can drop everything and book a trip to Italy… however, we can bring Italy into our home by cooking traditional dishes made with simple, fresh and wholesome ingredients. This recipe is…

An Open Letter To Parents Who Choose Not To Teach Their Children Life Skills

I get it. I really do. You wanted a cute little baby who would love you unconditionally, that’s why you did the nasty with no protection. Or maybe it wasn’t the plan. Either way, you created human…

Like Stumbling Through Fog

Throughout the month of June, I‘m taking the Know Thyself, Heal Thyself 30 Day Poetry Challenge, with thanks to Diana C. Thanks also to Tree Langdon for inspiring me to take part. Tagging David…