Example usage of the ToTorchWrapper
is demonstrated below.
env = gym.make("CartPole-v1")
env = ToTorchWrapper(env)
obs = env.reset()
print("initial obs:", obs)
action = env.action_space.sample()
# need to convert action to PyTorch Tensor because ToTorchWrapper expects actions as Tensors.
# normally you would not need to do this, your PyTorch NN actor will output a Tensor by default.
action = torch.as_tensor(action, dtype=torch.float32)
stepped = env.step(action)
print("stepped once:", stepped)
print("\nEntering interaction loop! \n")
# interaction loop
obs = env.reset()
ret = 0
for i in range(100):
action = torch.as_tensor(env.action_space.sample(), dtype=torch.float32)
state, reward, done, _ = env.step(action)
ret += reward
if done:
print(f"Random policy got {ret} reward!")
obs = env.reset()
ret = 0
if i < 99:
print("Starting new episode.")
if i == 99:
print(f"\nInteraction loop ended! Got reward {ret} before episode was cut off.")
break
Note: Testing needed for StateNormalizeWrapper. At present, use ToTorchWrapper
for guaranteed working.
Here is a demonstration of using the StateNormalizeWrapper
.
env = gym.make("CartPole-v1")
env = StateNormalizeWrapper(env)
obs = env.reset()
print("initial obs:", obs)
# the StateNormalizeWrapper expects NumPy arrays, so there is no need to convert action to PyTorch Tensor.
action = env.action_space.sample()
stepped = env.step(action)
print("stepped once:", stepped)
print("\nEntering interaction loop! \n")
# interaction loop
obs = env.reset()
ret = 0
for i in range(100):
action = env.action_space.sample()
state, reward, done, _ = env.step(action)
ret += reward
if done:
print(f"Random policy got {ret} reward!")
obs = env.reset()
ret = 0
if i < 99:
print("Starting new episode.")
if i == 99:
print(f"\nInteraction loop ended! Got reward {ret} before episode was cut off.")
break
Note: Testing needed for RewardScalerWrapper. At present, use ToTorchWrapper
for guaranteed working.
An example usage of the RewardScalerWrapper.
env = gym.make("CartPole-v1")
env = RewardScalerWrapper(env)
obs = env.reset()
print("initial obs:", obs)
action = env.action_space.sample()
stepped = env.step(action)
print("stepped once:", stepped)
print("\nEntering interaction loop! \n")
# interaction loop
obs = env.reset()
ret = 0
for i in range(100):
action = env.action_space.sample()
state, reward, done, _ = env.step(action)
ret += reward
if done:
print(f"Random policy got {ret} reward!")
obs = env.reset()
ret = 0
if i < 99:
print("Starting new episode.")
if i == 99:
print(f"\nInteraction loop ended! Got reward {ret} before episode was cut off.")
break
Combining Wrappers
All of these wrappers can be composed together! Simply be sure to call the ToTorchWrapper
last, because the others expect NumPy arrays as input, and the ToTorchWrapper
converts outputs to PyTorch tensors. Below is an example.
env = gym.make("CartPole-v1")
env = StateNormalizeWrapper(env)
print(f"After wrapping with StateNormalizeWrapper, output is still a NumPy array: {env.reset()}")
env = RewardScalerWrapper(env)
print(f"After wrapping with RewardScalerWrapper, output is still a NumPy array: {env.reset()}")
env = ToTorchWrapper(env)
print(f"But after wrapping with ToTorchWrapper, output is now a PyTorch Tensor: {env.reset()}")
Note: Testing needed for BestPracticesWrapper. At present, use ToTorchWrapper
for guaranteed working.
Below is a usage example of the BestPracticesWrapper
. It is used in the same way as the ToTorchWrapper
.
env = gym.make("CartPole-v1")
env = BestPracticesWrapper(env)
obs = env.reset()
print("initial obs:", obs)
action = torch.as_tensor(env.action_space.sample(), dtype=torch.float32)
stepped = env.step(action)
print("stepped once:", stepped)
print("\nEntering interaction loop! \n")
# interaction loop
obs = env.reset()
ret = 0
for i in range(100):
action = torch.as_tensor(env.action_space.sample(), dtype=torch.float32)
state, reward, done, _ = env.step(action)
ret += reward
if done:
print(f"Random policy got {ret} reward!")
obs = env.reset()
ret = 0
if i < 99:
print("Starting new episode.")
if i == 99:
print(f"\nInteraction loop ended! Got reward {ret} before episode was cut off.")
break