AnyLogic
Expand
Font size

Reinforcement Learning experiment

The Reinforcement Learning experiment is a special type of experiment, specifically tailored for integrating RL-ready AnyLogic models onto platforms that specialize in training the AI brains.

Currently, the RL experiment allows for exporting AnyLogic models to Microsoft Project Bonsai. More options will be added in the future.

Please note that as of now, the RL experiment is not designed for running ML-driven experiments directly within your AnyLogic installation, being simply a tool for converting AnyLogic models that are prepared to be exported from AnyLogic and then imported to appropriate platforms.

To learn more about how to integrate AnyLogic into the AI agent training process, visit the Artificial Intelligence page on the AnyLogic website.

Demo model: Activity Based Cost Analysis (RL)

Model prerequisites

To prepare a model for reinforcement learning, that is, make it a valid foundation for an RL experiment, you need to make sure it meets certain requirements:

  • The model has an adjustable configuration as the initial state of each simulation run
  • The model is able to broadcast its current state to the AI agent
  • The model can implement the action the AI agent decides to perform

Some platforms — for example, Microsoft Project Bonsai — assume that the model’s logic contains certain points (in time), upon reaching which the AI agent has to make a decision on which some action should be performed. Should you decide to use these platforms for your RL process, you have to keep in mind that a model should contain such decision points. We need to associate these decision points with events that trigger the AI agent to take action on the model. Among examples of events that can be treated as decision points are:

  • Events that occur at specific time intervals (for example, every 2 days of model time)
  • Events that correspond to certain triggers occurring during the model run (think statechart transitions, condition-based events, the actions implemented via the process flow blocks)

These events do not trigger the AI agent actions directly. For the AI agent to act on them, you have to create a decision point that gets executed as the result of the event’s occurrence. See Creating a decision point below.

Understanding the logic behind the RL experiment

To configure your RL experiment properly, you need to understand 3 primary concepts behind the implementation of the RL experiment in AnyLogic. All of them describe the numeric variables of some kind, which are used in the procedure of the AI agent’s training.

  • Observations are key values passed to the AI agent for analysis during training: static values used by the model, the results of some calculations — raw or transformed model outputs, — and so on.

    AnyLogic: RL experiment

  • Actions are values that the AI agent determines at each step — and then assigns to the model’s variables (or functions that take the action) before proceeding with the next step in the simulation (an episode in terms of Project Bonsai).
  • Configuration is a set of values that define the initial state of the model before the simulation run starts. This code is run on the model’s setup and initializes each simulation run of the RL training. At this moment, the top-level agent of the model is already created, but the model is not started yet.

In the internal structure of the RL experiment in AnyLogic, all these values are represented as Java classes.

The RL experiment allows you to set up and manipulate these values within the model before starting the actual process of RL training in Project Bonsai.

To create an RL experiment

  1. In the Projects view, right-click (Mac OS: Ctrl + click) the model item and choose New > Experiment from the context menu.
  2. The New Experiment dialog box opens up.
    Select the  Reinforcement Learning option in the Experiment Type list.
  3. Specify the experiment name in the Name edit box.
  4. Choose the top-level agent of the experiment from the Top-level agent drop-down list.
  5. If you want to apply model time settings from another experiment, leave the Copy model time settings from checkbox selected and choose the experiment in the drop-down list to the right.
  6. Upon completing, click Finish.

The resulting experiment will appear in the Projects view. Click it to access its properties.

To export the RL-ready model and experiment

To export the model and experiment, do any of the following:

  • Select an item of the model in the Projects view and choose File >  Export >  Reinforcement learning from the main menu.
  • Right-click the model in the Projects view and choose  Export >  Reinforcement learning from the context menu.
    If the menu item appears inactive, make sure the model contains the Reinforcement Learning experiment.
  • The Export model wizard opens up. Select the platform you want to use for reinforcement learning in the RL platform edit box, and use the wizard to configure the necessary settings.
    For more information on each specific platform, see below.

Exporting to Microsoft Bonsai

  1. Configure your RL experiment using options available in the Observation, Action, and Configuration section.
  2. Click Export to Microsoft Bonsai in the topmost section of the experiment’s properties.
  3. In the resulting dialog, specify the path where you want to save the ZIP file containing the RL experiment, in the Destination ZIP file edit box, or
    Click Browse... and go to the desired folder. After that, specify the name of the resulting ZIP file using the File name edit box, then click Save.

    AnyLogic: The RL experiment export dialog

  4. Click Next.
  5. On the next page of the wizard, click the Bonsai platform link to open the Microsoft Project Bonsai website in the default browser of your system and follow the instructions provided there.
  6. When a ZIP file of your model is requested, click the Locate ZIP file link in the wizard to navigate to the file’s location on your computer.
  7. Click Finish to close the wizard.
  8. Proceed with the RL training on the Project Bonsai platform.

Properties

General

Name — The name of the experiment.
Since AnyLogic generates a Java class for each experiment, please follow Java naming guidelines and start the experiment’s name with an uppercase letter.

Ignore — If selected, the experiment is excluded from the model.

Export to Microsoft Bonsai — Click this link to start preparing the model for export to Microsoft Project Bonsai.
To learn more, see above.

Top-level agent — In this drop-down list, choose the top-level agent type for the experiment. The agent of this type will play a role of a root for the hierarchical tree of agents in your model.

Observation

Data fields passed to the Learning Platform on each step — Declares the variables that define the observation space.
The following types of variables are supported: int, double, double[]. To assign values to these variables, use the Fill 'observation' data fields from 'root' using code edit box below.

Fill 'observation' data fields from 'root' using code — Specifies the code that associates numeric values from the model with the data fields specified above. Allows for retrieving values from the top-level agent of your model (root). You can either point directly to quantitative values in the model or point to functions that return values.

Simulation run stop condition — Specifies the condition that stops the simulation’s execution when it is evaluated to true. This can be used, for example, to handle situations when the model starts in a non-desired state or terminating a run when continuing the simulation will not add any value to the learning process. Upon triggering the stop condition, the simulation ends.
Allows for addressing the top-level agent’s contents by using root.

Action

Action data returned from Learning Platform on each step — Specifies the variables that will be passed to the model after the AI agent performs some action.
The following types of variables are supported: int, double, double[]. To assign values to these variables, use the Apply 'action' data fields from 'root' using code edit box below.

Apply 'action' data fields to the 'root' using code — Specifies the code that assigns the values calculated by the AI agent to the associated model elements. Able to interact with the top-level agent of your model (root). You can either point directly to quantitative values in the model or point to functions that return values.

Configuration

Configuration data returned from Learning Platform for new simulation run — Declares the variables that are passed to the model before the simulation starts.
The following types of variables are supported: int, double, double[]. To assign values to these variables, use the Apply 'configuration' data fields from 'root' using code edit box below.

Apply 'configuration' data fields to the 'root' using code — Specifies the code that assigns the values calculated by the AI agent to the model elements. Able to interact with the top-level agent of your model (root). You can either point directly to quantitative values in the model or point to functions that return values.

Local training

This section of properties contains the download links that point to the software necessary for local RL training of AI agents.

Model time

Stop — Defines, whether the model will Stop at specified time, Stop at specified date, or it will Never stop. In the first two cases, the stop time is specified using the Stop time/Stop date controls.

Start time — The initial time for the simulation time horizon.

Start date — The initial calendar date for the simulation time horizon.

Stop time — The final time for the simulation time horizon (the number of model time units for the model to run before it will be stopped).

Stop date — The initial calendar date for the simulation time horizon.

Randomness

Random number generator — Here you specify, whether you want to initialize a random number generator for this model randomly or with some fixed seed. This makes sense for stochastic models. Stochastic models require a random seed value for the pseudorandom number generator. In this case model runs cannot be reproduced since the model random number generator is initialized with different values for each model run. Specifying the fixed seed value, you initialize the model random number generator with the same value for each model run, thus the model runs are reproducible. Moreover, here you can substitute AnyLogic default RNG with your own RNG.
In most RL training scenarios, it would make more sense to use the Random seed so that the AI agent can gain experience from an environment that exhibits its random nature. Fixed seed might be useful for testing purposes in a simplified scenario but may lack the variability that is needed for an approach that considers the real-world examples.

Description
Use the edit box in this section to specify an arbitrary description of the experiment.

Creating a decision point

To declare that a certain event triggers a decision point for the AI agent (and the experiment step should be performed), call the ExperimentReinforcementLearning.takeAction(agent) static function, passing any model agent as an agent argument. This argument will cause the function to access the top-level (Main) agent, forcing all RL-related data processing (for example, retrieving the observation data) to be done in the context of this agent.

For example, when the event is located within a certain agent, the following code specified as Action of this event would refer this:

ExperimentReinforcementLearning.takeAction(this)

How can we improve this article?