Sunday, April 16, 2017

Landing on The Moon using AI Bot

Recently I was playing around with OpenAI Gym and Keras Reinforcement Learning library (keras-rl). I was able to train an AI Agent for a task of landing on the Moon. 

OpenAI Gym provides all sorts of different environments to explore using AI bots. One of them is lunar lander, which I'll focus on here. 
Keras on the other hand is a high level library built on top of TensorFlow (or Theano). It provides mechanisms for constructing deep learning models easily. 

Creating a new environment using OpenAI Gym is as easy as this:

env = gym.make('LunarLander-v2')

Here's how you can add video recorder of progress during training:

env = gym.wrappers.Monitor(env, 
                           'recording', 
                           resume=True, 
                           video_callable=lambda count: count % record_video_every == 0
                           )

The model itself is quite simple DQN Agent with LinearAnnealedPolicy. The most important layer is Dense 512 neuron internal layer. It's responsible for understanding of the current situation during landing. Next small, dense layer on top of it is responsible for final decisions related to lunar lander actions (steering the engines).

Here's how it can be instantiated:

model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dense(nb_actions))
model.add(Activation('linear'))
print(model.summary())

memory = SequentialMemory(limit=1000000, window_length=WINDOW_LENGTH)

policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=.1, value_test=.05, nb_steps=1000000)

dqn = DQNAgent(model=model, nb_actions=nb_actions, policy=policy, memory=memory, nb_steps_warmup=50000, gamma=.99, target_model_update=10000, train_interval=4, delta_clip=1.)

dqn.compile(Adam(lr=.00025), metrics=['mae'])


Here's a summary of the model:

Layer (type)                     Output Shape          Param #     Connected to                 7    
flatten_1 (Flatten)              (None, 8)             0           flatten_input_1[0][0]            
dense_1 (Dense)                  (None, 512)           4608        flatten_1[0][0]                  
activation_1 (Activation)        (None, 512)           0           dense_1[0][0]                    
dense_2 (Dense)                  (None, 4)             2052        activation_1[0][0]               
activation_2 (Activation)        (None, 4)             0           dense_2[0][0]                    

Total params: 6,660
Trainable params: 6,660
Non-trainable params: 0


Training can take a lot of time. In this example I used 3.5 million steps.
The outcome gives quite reasonable behavior for lunar lander.
Note that depending on environment, learning can take less time (if environement is not very tricky). In this case I found that difficulties are related to sensitivity of the last phase of landing (touch down). It took some time for the model to figure this part out.

As a side note, OpenAI provides lunar lander Agent using optimal trajectory. In this example, the Agent polls environment to figure out actions that give highest output of Q function. 



3 comments:

  1. What you have written in this post is exactly what I have experience when I first started my blog.I’m happy that I came across with your site this article is on point,thanks again and have a great day.Keep update more information.
    Salesforce Training in Chennai
    SAS Training in Chennai
    Selenium Training in Chennai

    ReplyDelete
  2. Its a wonderful post and very helpful, thanks for all this information. You are including better information regarding this topic in an effective way.Thank you so much

    Installment loans
    Payday loans
    Title loans
    Cash Advances

    ReplyDelete