I encountered an error while using SB3-contrib Maskable PPO action masking algorithm.
File ~\anaconda3\lib\site-packages\sb3_contrib\common\maskable\distributions.py:231, in MaskableMultiCategoricalDistribution.apply_masking(self, masks) 228 masks = th.as_tensor(masks) 230 # Restructure shape to align with logits --> 231 masks = masks.view(-1, sum(self.action_dims)) 233 # Then split columnwise for each discrete action 234 split_masks = th.split(masks, tuple(self.action_dims), dim=1)
RuntimeError: shape '[-1, 1600]' is invalid for input of size 800
I am running learning progamme with an action being a MultiBinary space with 800 selections of 0, 1.
The action space is defined as below:
self.action_space = spaces.MultiBinary(800)
Within the custom environment class, an "action_mask" function was created such that it returns a List of 800 boolean values.
Now, when I follow the document and start to train the model, the error message pops:
from sb3_contrib import MaskablePPO
from Equities_RL_Env import Equities_RL_Env
import time
from sb3_contrib.common.maskable.utils import get_action_masks
models_dir = f"models/V1 31-Jul/"
logdir = f"logs/{time.strftime('%d %b %Y %H-%M',time.localtime())}/"
if not os.path.exists(models_dir):
os.makedirs(models_dir)
if not os.path.exists(logdir):
os.makedirs(logdir)
env = Equities_RL_Env(Normalize_frame(historical_frame), pf)
env.reset()
model = MaskablePPO('MlpPolicy', env, verbose=1, tensorboard_log=logdir)
TIMESTEPS = 1000
iters = 0
while iters <= 1000000:
iters += 1
model.learn(total_timesteps=TIMESTEPS, reset_num_timesteps=False, tb_log_name=f"PPO")
model.save(f"{models_dir}/{TIMESTEPS*iters}")
May I know is there a way to define that shape within the custom environment?