Learning social laws in multi agents environment


Roy Ganz
Yoni Keselbrener


Reinforcment Learning project for CRML lab at the Technion - Israel Institute of Technology


project's goal

Using Reinforcement Learning to perform social laws learning in multi agents environment. These laws will enable the autonomous cars to cross intersections without accidents.


Unity and ml-agents

We used Unity to model our environment and to rum simulations. In addition, we used ml-agents tool which trains intelligent agents with Reinforcement Learning via a simple method API. The ml-agnets connects the simulation environment to the learning algorithm.


Reinforement Learning Based algorithm

Reinforcement Learning enables learning a policy that maps from the agent's state space to the agent's action space. namely, what action the agent should take in every possible state, in order to maximize its reward.

State Space - The agent's current state. In our project, the state space contains the agents observations (using Unity RayPerception), the agents velocity, rotation and direction to target. the state space vector size is 44.

Action Space - Contains all the possible actions that the agent can take. In our project, the agent can change its velocity and rotation. The action space vector size is 5 (Add velocity, Subtract Velocity, Turn left, Turn right and Don't change anything).

Reward Function - The policy being learned tries to maximize the cumulative reward in an episode. We defined a big positive reward for reaching the target, big negative reward for collision and a small negative reward for each step (to encourage quick arrival to the target).

NN architecture

In this project we've used a fully connected Neural Network architecture. The Input dimension is 44, as the space state size, the Output dimension is 5, as the action space size. Furthermore, we added 2 hidden layes with 256 neurons each.


visual results

social law #1 : slow down & yield

social law #2 : stick to the left lane


Learning results

( Made with Carrd )