#Reinforcement Learning

September 2, 2025

Teaching Robots to Slide Objects: TQC Reinforcement Learning on FetchSlideDense-v4

you will learn how to train a Truncated Quantile Critics (TQC) agent on the FetchSlideDense-v4 environment using Stable-Baselines3 and Gymnasium Robotics.

Python 0 12 min read

August 19, 2025

Mastering Autonomous Parking with SAC and HER: A Deep Reinforcement Learning Guide

Reinforcement Learning (RL) is revolutionizing autonomous driving, and one of the key challenges is autonomous parking—a complex task requiring precise control and decision-making. In this blog, we’ll explore how Soft Actor-Critic (SAC) combined with Hindsight Experience Replay (HER) can train an RL agent to master parking in the highway-env environment.

Python 0 7 min read

August 14, 2025

Solving the MuJoCo HumanoidStandup Task with PPO and Stable-Baselines3

We learn to solve MuJoCo HumanoidStandup problem using PPO with the help of stable baseline.

Python 0 9 min read

September 27, 2024

Solving Bipedal Walker Hardcore Challenge with Soft Actor-Critic Algorithm

we will learn to solve Bipedal Walker Hardcore Challenge with Soft Actor-Critic Algorithm

Python 0 17 min read

September 18, 2024

Bipedal Walker with PPO: A Step-by-Step Guide to Solving the RL Challenge

In this tutorial we will learn how to master a Bipedal Walker with PPO (Proximal Policy Optimization).

Python 0 12 min read

September 8, 2024

Master Snake Game AI with PPO: Step-by-Step Guide (Part II)

Second Part we will learn about the major components PPO for ai agent.

Python 0 11 min read

September 7, 2024

Master Snake Game AI with PPO: Step-by-Step Guide (Part I)

This is first of two part tutorial. Here we learn to build snake game. In part two, we will learn to build a PPO agent to play with it.

PyTorch 0 6 min read

July 16, 2024

PPO Implementation in PyTorch

In this blog post, we will explore the Proximal Policy Optimization (PPO) algorithm. We’ll compare it to other deep reinforcement learning algorithms like Double Deep Q-learning and TRPO. Additionally, we’ll learn how to implement PPO using PyTorch.

Python 0 16 min read

July 1, 2024

Prioritized Experience Replay Using PyTorch

Introduction of Prioritized Experience Replay and its implementation with PyTorch.

Python 0 11 min read

June 26, 2024

Policy Gradient Implementation Using PyTorch

This is an implementation of Policy Gradient algorithm using PyTorch.

Python 0 8 min read

June 20, 2024

Gaussian Double Deep Q Learning

Implementation of Gaussian Double Deep Q network with PyTorch

Python 0 7 min read

June 19, 2024

MoG-DQN Implementation In PyTorch To Solve Lunar Lander

This is implementation of MoG-DQN using PyTorch.

Python 0 10 min read

June 13, 2024

Understanding Implicit Quantile Networks in Reinforcement Learning

IQN is a state-of-the-art RL algorithm that focuses on predicting the full distribution of returns rather than just the mean. This approach provides a more comprehensive understanding of the value of actions, allowing for better decision-making in uncertain environments

Python 0 8 min read

June 9, 2024

Double DQN For Lunar Lander With PyTorch

In this blog post, we will implement Double DQN using PyTorch to solve the Lunar Lander environment from OpenAI Gym.

Python 0 9 min read

August 24, 2022

Acrobot-ing with Actor Critic

Solving the Acrobot problem with the help of Actor-Critic algorithm.

PyTorch 0 5 min read

July 12, 2022

Double Deep Q-Network

In double DQNs, we use a separate network to estimate the target rather than the prediction network. The separate network has the same structure as the prediction network. And its weights are fixed for every T episode (T is a hyperparameter we can tune), which means they are only updated after every T episode. The update is simply done by […]

PyTorch 0 4 min read

July 10, 2022

Climbing the Mountain with Neural Network

Function Approximation For problems with very large number of states it will not be feasible for our agent to use table to record the value of all the action for each state and make its policy accordingly. In Function approximation agent learns a function which will approxmately give it best action for particular state. In this example we will use […]

PyTorch 0 5 min read

July 8, 2022

SARSA in the Wind

We will use SARSA algorithm to find the optimal policy so that our agent can navigate in windy world. SARSA State–action–reward–state–action (SARSA) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning. SARSA focuses on state-action values. It updates the Q-function based on the following equation: Q(s,a) = Q(s,a) + α (r + γ Q(s’,a’) – Q(s,a)) Here s’ […]

PyTorch 0 4 min read

July 7, 2022

Balancing pole with Policy Gradient

The policy gradient algorithm trains an agent by taking small steps and updating the weight based on the rewards associated with those steps at the end of an episode. The technique of having the agent run through an entire episode and then updating the policy based on the rewards obtained is called Monte Carlo policy gradient. The action is selected […]

PyTorch 0 3 min read

July 4, 2022

Q-Taxi

Introduction There are four designated locations in the grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the episode starts, the taxi starts off at a random square and the passenger is at a random location. The taxi drives to the passenger’s location, picks up the passenger, drives to the passenger’s destination (another one of the four specified locations), […]

Reinforcement Learning 0 6 min read