#gym - Janak-Lal

July 12, 2022

Double Deep Q-Network

In double DQNs, we use a separate network to estimate the target rather than the prediction network. The separate network has the same structure as the prediction network. And its weights are fixed for every T episode (T is a hyperparameter we can tune), which means they are only updated after every T episode. The update is simply done by […]

PyTorch 0 4 min read

July 10, 2022

Climbing the Mountain with Neural Network

Function Approximation For problems with very large number of states it will not be feasible for our agent to use table to record the value of all the action for each state and make its policy accordingly. In Function approximation agent learns a function which will approxmately give it best action for particular state. In this example we will use […]

PyTorch 0 5 min read

July 8, 2022

SARSA in the Wind

We will use SARSA algorithm to find the optimal policy so that our agent can navigate in windy world. SARSA State–action–reward–state–action (SARSA) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning. SARSA focuses on state-action values. It updates the Q-function based on the following equation: Q(s,a) = Q(s,a) + α (r + γ Q(s’,a’) – Q(s,a)) Here s’ […]

PyTorch 0 4 min read

July 7, 2022

Balancing pole with Policy Gradient

The policy gradient algorithm trains an agent by taking small steps and updating the weight based on the rewards associated with those steps at the end of an episode. The technique of having the agent run through an entire episode and then updating the policy based on the rewards obtained is called Monte Carlo policy gradient. The action is selected […]

PyTorch 0 3 min read

July 4, 2022

Q-Taxi

Introduction There are four designated locations in the grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the episode starts, the taxi starts off at a random square and the passenger is at a random location. The taxi drives to the passenger’s location, picks up the passenger, drives to the passenger’s destination (another one of the four specified locations), […]

Reinforcement Learning 0 6 min read

Double Deep Q-Network

Climbing the Mountain with Neural Network

SARSA in the Wind

Balancing pole with Policy Gradient

Q-Taxi

Categories

Recent Posts