mtenv.envs.hipbmdp.wrappers package¶
Submodules¶
mtenv.envs.hipbmdp.wrappers.dmc_wrapper module¶
mtenv.envs.hipbmdp.wrappers.framestack module¶
Wrapper to stack observations for single task environments.
-
class
mtenv.envs.hipbmdp.wrappers.framestack.
FrameStack
(env: gym.core.Env, k: int)[source]¶ Bases:
gym.core.Wrapper
Wrapper to stack observations for single task environments.
- Parameters
env (gym.core.Env) – Single Task Environment
k (int) – number of frames to stack.
-
reset
() → numpy.ndarray[source]¶ Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
-
step
(action: Union[str, int, float, numpy.ndarray]) → Tuple[numpy.ndarray, float, bool, Dict[str, Any]][source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
mtenv.envs.hipbmdp.wrappers.sticky_observation module¶
Wrapper to enable sitcky observations for single task environments.
-
class
mtenv.envs.hipbmdp.wrappers.sticky_observation.
StickyObservation
(env: gym.core.Env, sticky_probability: float, last_k: int)[source]¶ Bases:
gym.core.Wrapper
Env wrapper that returns a previous observation with probability p and the current observation with a probability 1-p. last_k previous observations are stored.
- Parameters
env (gym.Env) – Single task environment.
sticky_probability (float) – Probability p for returning a previous observation.
last_k (int) – Number of previous observations to store.
- Raises
ValueError – Raise a ValueError if sticky_probability is not in range [0, 1].
-
reset
()[source]¶ Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)