mtenv.envs.control package

Submodules

mtenv.envs.control.acrobot module

class mtenv.envs.control.acrobot.Acrobot[source]

Bases: mtenv.envs.control.acrobot.MTAcrobot

The original acrobot environment in the MTEnv fashion

Main class for multitask RL Environments.

This abstract class extends the OpenAI Gym environment and adds support for return the task-specific information from the environment. The observation returned from the single task environments is encoded as env_obs (environment observation) while the task specific observation is encoded as the task_obs (task observation). The observation returned by mtenv is a dictionary of env_obs and task_obs. Since this class extends the OpenAI gym, the mtenv API looks similar to the gym API.

import mtenv
env = mtenv.make('xxx')
env.reset()

Any multitask RL environment class should extend/implement this class.

Parameters
  • action_space (Space) –

  • env_observation_space (Space) –

  • task_observation_space (Space) –

sample_task_state()[source]

Sample a task_state.

task_state contains all the information that the environment needs to switch to any other task.

The subclasses, extending this class, should ensure that the task seed is set (by calling seed(int)) before invoking this method (for reproducibility). It can be done by invoking self.assert_task_seed_is_set().

Returns

For more information on task_state, refer Task State.

Return type

TaskStateType

class mtenv.envs.control.acrobot.MTAcrobot[source]

Bases: mtenv.core.MTEnv

A acrobot environment with varying characteristics The task descriptor is composed of values between -1 and +1 and mapped to acrobot physical characcteristics in the self._mu_to_vars function.

Main class for multitask RL Environments.

This abstract class extends the OpenAI Gym environment and adds support for return the task-specific information from the environment. The observation returned from the single task environments is encoded as env_obs (environment observation) while the task specific observation is encoded as the task_obs (task observation). The observation returned by mtenv is a dictionary of env_obs and task_obs. Since this class extends the OpenAI gym, the mtenv API looks similar to the gym API.

import mtenv
env = mtenv.make('xxx')
env.reset()

Any multitask RL environment class should extend/implement this class.

Parameters
  • action_space (Space) –

  • env_observation_space (Space) –

  • task_observation_space (Space) –

MAX_VEL_1 = 15.707963267948966
MAX_VEL_2 = 34.55751918948772
action_arrow = None
actions_num = 3
book_or_nips = 'book'

use dynamics equations from the nips paper or the book

domain_fig = None
dt = 0.2
get_task_obs()[source]

Get the current value of task observation.

Environment returns task observation everytime we call step or reset. This function is useful when the user wants to access the task observation without acting in (or resetting) the environment.

Returns

Return type

TaskObsType

get_task_state()[source]

Return all the information needed to execute the current task again.

This function is useful when we want to set the environment to a previous task.

Returns

For more information on task_state, refer Task State.

Return type

TaskStateType

metadata = {'render.modes': ['human', 'rgb_array'], 'video.frames_per_second': 15}
reset()[source]

Reset the environment to some initial state and return the observation in the new state.

The subclasses, extending this class, should ensure that the environment seed is set (by calling seed(int)) before invoking this method (for reproducibility). It can be done by invoking self.assert_env_seed_is_set().

Returns

For more information on multitask observation returned by the environment, refer MultiTask Observation.

Return type

ObsType

sample_task_state()[source]

Sample a task_state.

task_state contains all the information that the environment needs to switch to any other task.

The subclasses, extending this class, should ensure that the task seed is set (by calling seed(int)) before invoking this method (for reproducibility). It can be done by invoking self.assert_task_seed_is_set().

Returns

For more information on task_state, refer Task State.

Return type

TaskStateType

seed(env_seed)[source]

Set the seed for the environment’s random number generator.

Invoke seed_task to set the seed for the task’s random number generator.

Parameters

seed (Optional[int], optional) – Defaults to None.

Returns

Returns the list of seeds used in the environment’s random number generator. The first value in the list should be the seed that should be passed to this method for reproducibility.

Return type

List[int]

seed_task(task_seed)[source]

Set the seed for the task’s random number generator.

Invoke seed to set the seed for the environment’s random number generator.

Parameters

seed (Optional[int], optional) – Defaults to None.

Returns

Returns the list of seeds used in the task’s random number generator. The first value in the list should be the seed that should be passed to this method for reproducibility.

Return type

List[int]

set_task_state(task_state)[source]

Reset the environment to a particular task.

task_state contains all the information that the environment needs to switch to any other task.

Parameters

task_state (TaskStateType) – For more information on task_state, refer Task State.

step(a)[source]

Execute the action in the environment.

Parameters

action (ActionType) –

Returns

Tuple of multitask observation, reward, done, and info. For more information on multitask observation returned by the environment, refer MultiTask Observation.

Return type

StepReturnType

torque_noise_max = 0.0
mtenv.envs.control.acrobot.bound(x, m, M=None)[source]
Parameters

x – scalar

Either have m as scalar, so bound(x,m,M) which returns m <= x <= M OR have m as length 2 vector, bound(x,m, <IGNORED>) returns m[0] <= x <= m[1].

mtenv.envs.control.acrobot.rk4(derivs, y0, t, *args, **kwargs)[source]

Integrate 1D or ND system of ODEs using 4-th order Runge-Kutta. This is a toy implementation which may be useful if you find yourself stranded on a system w/o scipy. Otherwise use scipy.integrate(). y0

initial state vector

t

sample times

derivs

returns the derivative of the system and has the signature dy = derivs(yi, ti)

args

additional arguments passed to the derivative function

kwargs

additional keyword arguments passed to the derivative function

Example 1 ::

## 2D system def derivs6(x,t):

d1 = x[0] + 2*x[1] d2 = -3*x[0] + 4*x[1] return (d1, d2)

dt = 0.0005 t = arange(0.0, 2.0, dt) y0 = (1,2) yout = rk4(derivs6, y0, t)

Example 2::

## 1D system alpha = 2 def derivs(x,t):

return -alpha*x + exp(-t)

y0 = 1 yout = rk4(derivs, y0, t)

If you have access to scipy, you should probably be using the scipy.integrate tools rather than this function.

mtenv.envs.control.acrobot.wrap(x, m, M)[source]
Parameters
  • x – a scalar

  • m – minimum possible value in range

  • M – maximum possible value in range

Wraps x so m <= x <= M; but unlike bound() which truncates, wrap() wraps x around the coordinate system defined by m,M.

For example, m = -180, M = 180 (degrees), x = 360 –> returns 0.

mtenv.envs.control.cartpole module

class mtenv.envs.control.cartpole.CartPole[source]

Bases: mtenv.envs.control.cartpole.MTCartPole

The original cartpole environment in the MTEnv fashion

Main class for multitask RL Environments.

This abstract class extends the OpenAI Gym environment and adds support for return the task-specific information from the environment. The observation returned from the single task environments is encoded as env_obs (environment observation) while the task specific observation is encoded as the task_obs (task observation). The observation returned by mtenv is a dictionary of env_obs and task_obs. Since this class extends the OpenAI gym, the mtenv API looks similar to the gym API.

import mtenv
env = mtenv.make('xxx')
env.reset()

Any multitask RL environment class should extend/implement this class.

Parameters
  • action_space (Space) –

  • env_observation_space (Space) –

  • task_observation_space (Space) –

sample_task_state()[source]

Sample a task_state.

task_state contains all the information that the environment needs to switch to any other task.

The subclasses, extending this class, should ensure that the task seed is set (by calling seed(int)) before invoking this method (for reproducibility). It can be done by invoking self.assert_task_seed_is_set().

Returns

For more information on task_state, refer Task State.

Return type

TaskStateType

class mtenv.envs.control.cartpole.MTCartPole[source]

Bases: mtenv.core.MTEnv

A cartpole environment with varying physical values (see the self._mu_to_vars function)

Main class for multitask RL Environments.

This abstract class extends the OpenAI Gym environment and adds support for return the task-specific information from the environment. The observation returned from the single task environments is encoded as env_obs (environment observation) while the task specific observation is encoded as the task_obs (task observation). The observation returned by mtenv is a dictionary of env_obs and task_obs. Since this class extends the OpenAI gym, the mtenv API looks similar to the gym API.

import mtenv
env = mtenv.make('xxx')
env.reset()

Any multitask RL environment class should extend/implement this class.

Parameters
  • action_space (Space) –

  • env_observation_space (Space) –

  • task_observation_space (Space) –

get_task_obs()[source]

Get the current value of task observation.

Environment returns task observation everytime we call step or reset. This function is useful when the user wants to access the task observation without acting in (or resetting) the environment.

Returns

Return type

TaskObsType

get_task_state()[source]

Return all the information needed to execute the current task again.

This function is useful when we want to set the environment to a previous task.

Returns

For more information on task_state, refer Task State.

Return type

TaskStateType

metadata = {'render.modes': ['human', 'rgb_array'], 'video.frames_per_second': 50}
reset(**args)[source]

Reset the environment to some initial state and return the observation in the new state.

The subclasses, extending this class, should ensure that the environment seed is set (by calling seed(int)) before invoking this method (for reproducibility). It can be done by invoking self.assert_env_seed_is_set().

Returns

For more information on multitask observation returned by the environment, refer MultiTask Observation.

Return type

ObsType

sample_task_state()[source]

Sample a task_state.

task_state contains all the information that the environment needs to switch to any other task.

The subclasses, extending this class, should ensure that the task seed is set (by calling seed(int)) before invoking this method (for reproducibility). It can be done by invoking self.assert_task_seed_is_set().

Returns

For more information on task_state, refer Task State.

Return type

TaskStateType

seed(env_seed)[source]

Set the seed for the environment’s random number generator.

Invoke seed_task to set the seed for the task’s random number generator.

Parameters

seed (Optional[int], optional) – Defaults to None.

Returns

Returns the list of seeds used in the environment’s random number generator. The first value in the list should be the seed that should be passed to this method for reproducibility.

Return type

List[int]

seed_task(task_seed)[source]

Set the seed for the task’s random number generator.

Invoke seed to set the seed for the environment’s random number generator.

Parameters

seed (Optional[int], optional) – Defaults to None.

Returns

Returns the list of seeds used in the task’s random number generator. The first value in the list should be the seed that should be passed to this method for reproducibility.

Return type

List[int]

set_task_state(task_state)[source]

Reset the environment to a particular task.

task_state contains all the information that the environment needs to switch to any other task.

Parameters

task_state (TaskStateType) – For more information on task_state, refer Task State.

step(action)[source]

Execute the action in the environment.

Parameters

action (ActionType) –

Returns

Tuple of multitask observation, reward, done, and info. For more information on multitask observation returned by the environment, refer MultiTask Observation.

Return type

StepReturnType

mtenv.envs.control.setup module

Module contents