From gymecs to openai gym and vice et versa
(Note from the previous post)
Since the previous post, we have updated the examples providing a maze with goal game (gymecs_examples/maze/maze_with_goal.py
) but also a multi-agent version of the maze with goal (gymecs_examples/maze/multiagent_maze_with_goal.py
). Take a look to better understand how to implement more complex systems with ECS.
The Multiagent with Goal Game. Multiple (red) players have to go to the (green) goal before the other players.
Game-centric versus Agent-centric
One main difference between gymECS and openAI gym is that, while gym
is agent centric and allows to access the state of the world from the agent point of view, gymECS
is system centric/game centric and allows access to any information that describe the state of the world. In that sense, gymECS is much more general than gym since it exposes all the data that describe the complete state of the system at each timestep. One of the good characteristic is that, with gymECS, it is possible to put neural networks everywhere in the dynamical system, to replace players, physics, etc…, while gym is limited to focusing on the agent. We will demonstrate the interest of the system centric approach in the to gym
section. But let first start showing how a gym environment can be transformed to and ECS game.
From gym to gymECS
Since gymECS is more general than gym, building a bridge from gym to gymECS is easy: any gym environment can be easily converted to a gymECS game.
For that, we just need to implement components and entities describing the information computed by the gym environment.
Note: As a first step, we updated the Component and Entity classes to incorporate _structureclone
and _deepclone
methods to facilitate the cloning of data (see core.py
)
ECS Components and Entities
To describe the state of a gym environment, we use two entites: Agent
that contains the observation, reward and action of the agent (the agent-centric information), and Game
that contains the state of the game, including if the game is finshed or not.
The agent information is defined as follows:
The components and entity to describe an agent in a gym environment
For the game state, we will include the done
information, but also the timestep of the game. Since the ECS exposes all the data, the gym.Env
is also contained in the World
as a GameEnv
component that makes it usable by system.
The components and entity to describe a game state
ECS System
To update the state of our game, we need to define a step
system. This sytem will read the action
information, and execute it to compute update the world.
The Step system to execute one step of the environment
ECS Game
The resulting game can then be defined as follows.
The Game capturing a gym.Env
Playing with the game
To test the game, we need to define a system modeling the player. In our case, it is a simple random player with 2 actions.
The Game capturing a gym.Env
The final loop to test our game is the following:
The Game capturing a gym.Env
A few words
It is very simple to cast any gym environment to a gymecs.Game
and we provide a generic wrapper. A similar wrapper can be easily made for other interfaces like deepmind lab for instance. gymecs thus provides a unified API for dynamical systems, making my life much easier ! But gymECS can represent dynamical systems much more complex than agent centric frameworks.
From gymECS to openAI gym
The reverse path from the ECS to gym is the most interesting property. Indeed, as stated before, gymECS is system centric while many RL frameworks including openAI gym are agent centric. But in a game, we may want to control different stuffs, not only a single agent: a bot, a part of the game logics, multiple bots at once, etc…. gymECS allows to do that, but not openAI gym.
To move from the system centric to the agent centric point of view (from gymECS to gym), we need to specify what is the agent
in the game, what are its observations, reward, etc… It means that one gymECS game can be transformed in multiple gym environments depending on what we decide the agent
to be.
The Maze Game
To convert our simple maze game to a gym environments, we first define the following class and abstract methods:
The gym.Env class to capture a game as a gym environments (see complete code in togym.py)
Then, a maze game can be matched to a gym environment as follows:
For that, we define the follwing class and abstract methods:
Casting our Maze as a gym environment. The observation will be the X,Y position of the agent.
To execute this example: python gymecs\gymecs\togym.py
The MultiAgent Maze Game - Single Agent point of view
Let us now take the multi agent maze game as an example. In this game (gymecs_examples/maze/multiagent_maze_with_goal.py
), there are multiple agents trying to reach the goal. So there are multiple ways to convert this game to a gym environments: maybe we want to focus on one of the agents, maybe with want to learn the multiple agents as one agent at once, etc….
First case: we focus on a single agent. In that case, the implementations is made as follows (see gymecs_examples/maze/gym_multiagemt_maze_with_goal_singleagent_pointofview.py
):
Casting our Multiagent Maze as a gym environment focusing on a single agent
In addition, we have to take care about who is in charge of controlling the other agents. To do that, we can put the dynamics of the other agents directly in the game:
Putting othe players dynamics in the game
The main function is thus:
Putting othe players dynamics in the game
The MultiAgent Maze Game - All Agents point of view
But maybe we want to learn to control all the agents simultaneously in a synchronous way. In that case, we can also adapt the gymECS game to take control of all the agents, providing a vector of actions to the resulting gym environment (see gymecs_examples/maze/gym_multiagemt_maze_with_goal_allagents_pointofview.py
):
Putting othe players dynamics in the game
The main function is simply:
Putting othe players dynamics in the game
Conclusion
Building bridges between existing frameworks and the gymECS one is easy. But, one interesting property is that gymECS is game centric and thus much less restricted than openAI gym, letting anyone define on which aspect of the game he/she wants to work on. As defended previously in the SALINA library, reinforcement learning has provided the environment/agent
formalism which is in fact very restricted.
I advocate to consider that a dynamical system (or an ECS) is a combination of many multiple dynamics functions, and reinforcement learning is one potential set of algortihms to learn one or multiple of these functions (that are usually called agent
in RL) while letting the other functions fixed (the environment
). This can be seen as a useless semantic debate, but actually, I think that considering the objects we manipulate as dynamic systems/ECS
instead of agent+environment
opens many different interesting directions. In that view, learning the environment dynamics, the physics, the bot, the agent, the rendering, etc…. is the same, and it makes everything much simpler.
The next post will be about 3D rendering.