The Reinforcement Learning experiment is a special type of experiment, specifically tailored for integrating RL-ready AnyLogic models onto platforms that specialize in training the AI brains.
Currently, the RL experiment allows for exporting AnyLogic models to Microsoft Project Bonsai. More options will be added in the future.
Please note that as of now, the RL experiment is not designed for running ML-driven experiments directly within your AnyLogic installation, being simply a tool for converting AnyLogic models that are prepared to be exported from AnyLogic and then imported to appropriate platforms.
To learn more about how to integrate AnyLogic into the AI agent training process, visit the Artificial Intelligence page on the AnyLogic website.
Demo model: Activity Based Cost Analysis (RL)To prepare a model for reinforcement learning, that is, make it a valid foundation for an RL experiment, you need to make sure it meets certain requirements:
- The model has an adjustable configuration as the initial state of each simulation run
- The model is able to broadcast its current state to the AI agent
- The model can implement the action the AI agent decides to perform
Some platforms — for example, Microsoft Project Bonsai — assume that the model’s logic contains certain points (in time), upon reaching which the AI agent has to make a decision on which some action should be performed. Should you decide to use these platforms for your RL process, you have to keep in mind that a model should contain such decision points. We need to associate these decision points with events that trigger the AI agent to take action on the model. Among examples of events that can be treated as decision points are:
- Events that occur at specific time intervals (for example, every 2 days of model time)
- Events that correspond to certain triggers occurring during the model run (think statechart transitions, condition-based events, the actions implemented via the process flow blocks)
These events do not trigger the AI agent actions directly. For the AI agent to act on them, you have to create a decision point that gets executed as the result of the event’s occurrence. See Creating a decision point below.
To configure your RL experiment properly, you need to understand 3 primary concepts behind the implementation of the RL experiment in AnyLogic. All of them describe the numeric variables of some kind, which are used in the procedure of the AI agent’s training.
-
Observations are key values passed to the AI agent for analysis during training: static values used by the model, the results of some calculations — raw or transformed model outputs, — and so on.
- Actions are values that the AI agent determines at each step — and then assigns to the model’s variables (or functions that take the action) before proceeding with the next step in the simulation (an episode in terms of Project Bonsai).
- Configuration is a set of values that define the initial state of the model before the simulation run starts. This code is run on the model’s setup and initializes each simulation run of the RL training. At this moment, the top-level agent of the model is already created, but the model is not started yet.
In the internal structure of the RL experiment in AnyLogic, all these values are represented as Java classes.
The RL experiment allows you to set up and manipulate these values within the model before starting the actual process of RL training in Project Bonsai.
To create an RL experiment
- In the Projects view, right-click (Mac OS: Ctrl + click) the model item and choose New >
Experiment from the context menu.
-
The New Experiment dialog box opens up.
Select theReinforcement Learning option in the Experiment Type list.
- Specify the experiment name in the Name edit box.
- Choose the top-level agent of the experiment from the Top-level agent drop-down list.
- If you want to apply model time settings from another experiment, leave the Copy model time settings from checkbox selected and choose the experiment in the drop-down list to the right.
- Upon completing, click Finish.
The resulting experiment will appear in the Projects view. Click it to access its properties.
To export the RL-ready model and experiment
To export the model and experiment, do any of the following:
-
Select an item of the model in the Projects view and choose File >
Export >
Reinforcement learning from the main menu.
-
Right-click the model in the Projects view and choose
Export >
Reinforcement learning from the context menu.
If the menu item appears inactive, make sure the model contains the Reinforcement Learning experiment. -
The Export model wizard opens up. Select the platform you want to use for reinforcement learning in the RL platform edit box, and use the wizard to configure the necessary settings.
For more information on each specific platform, see below.
Exporting to Microsoft Bonsai
- Configure your RL experiment using options available in the Observation, Action, and Configuration section.
- Click Export to Microsoft Bonsai in the topmost section of the experiment’s properties.
-
In the resulting dialog, specify the path where you want to save the ZIP file containing the RL experiment, in the Destination ZIP file edit box, or
Click Browse... and go to the desired folder. After that, specify the name of the resulting ZIP file using the File name edit box, then click Save. - Click Next.
- On the next page of the wizard, click the Bonsai platform link to open the Microsoft Project Bonsai website in the default browser of your system and follow the instructions provided there.
- When a ZIP file of your model is requested, click the Locate ZIP file link in the wizard to navigate to the file’s location on your computer.
- Click Finish to close the wizard.
- Proceed with the RL training on the Project Bonsai platform.
- General
-
Name — The name of the experiment.
Since AnyLogic generates a Java class for each experiment, please follow Java naming guidelines and start the experiment’s name with an uppercase letter.Ignore — If selected, the experiment is excluded from the model.
Export to Microsoft Bonsai — Click this link to start preparing the model for export to Microsoft Project Bonsai.
To learn more, see above.Top-level agent — In this drop-down list, choose the top-level agent type for the experiment. The agent of this type will play a role of a root for the hierarchical tree of agents in your model.
- Observation
-
Data fields passed to the Learning Platform on each step — Declares the variables that define the observation space.
The following types of variables are supported: int, double, double[]. To assign values to these variables, use the Fill 'observation' data fields from 'root' using code edit box below.Fill 'observation' data fields from 'root' using code — Specifies the code that associates numeric values from the model with the data fields specified above. Allows for retrieving values from the top-level agent of your model (root). You can either point directly to quantitative values in the model or point to functions that return values.
Simulation run stop condition — Specifies the condition that stops the simulation’s execution when it is evaluated to true. This can be used, for example, to handle situations when the model starts in a non-desired state or terminating a run when continuing the simulation will not add any value to the learning process. Upon triggering the stop condition, the simulation ends.
Allows for addressing the top-level agent’s contents by using root. - Action
-
Action data returned from Learning Platform on each step — Specifies the variables that will be passed to the model after the AI agent performs some action.
The following types of variables are supported: int, double, double[]. To assign values to these variables, use the Apply 'action' data fields from 'root' using code edit box below.Apply 'action' data fields to the 'root' using code — Specifies the code that assigns the values calculated by the AI agent to the associated model elements. Able to interact with the top-level agent of your model (root). You can either point directly to quantitative values in the model or point to functions that return values.
- Configuration
-
Configuration data returned from Learning Platform for new simulation run — Declares the variables that are passed to the model before the simulation starts.
The following types of variables are supported: int, double, double[]. To assign values to these variables, use the Apply 'configuration' data fields from 'root' using code edit box below.Apply 'configuration' data fields to the 'root' using code — Specifies the code that assigns the values calculated by the AI agent to the model elements. Able to interact with the top-level agent of your model (root). You can either point directly to quantitative values in the model or point to functions that return values.
- Local training
-
This section of properties contains the download links that point to the software necessary for local RL training of AI agents.
- Model time
-
Stop — Defines, whether the model will Stop at specified time, Stop at specified date, or it will Never stop. In the first two cases, the stop time is specified using the Stop time/Stop date controls.
Start time — The initial time for the simulation time horizon.
Start date — The initial calendar date for the simulation time horizon.
Stop time — The final time for the simulation time horizon (the number of model time units for the model to run before it will be stopped).
Stop date — The initial calendar date for the simulation time horizon.
- Randomness
-
Random number generator — Here you specify, whether you want to initialize a random number generator for this model randomly or with some fixed seed. This makes sense for stochastic models. Stochastic models require a random seed value for the pseudorandom number generator. In this case model runs cannot be reproduced since the model random number generator is initialized with different values for each model run. Specifying the fixed seed value, you initialize the model random number generator with the same value for each model run, thus the model runs are reproducible. Moreover, here you can substitute AnyLogic default RNG with your own RNG.
In most RL training scenarios, it would make more sense to use the Random seed so that the AI agent can gain experience from an environment that exhibits its random nature. Fixed seed might be useful for testing purposes in a simplified scenario but may lack the variability that is needed for an approach that considers the real-world examples.- Random seed (unique simulation runs) — If selected, the seed value of the random number generator is random. In this case, random number generator is initialized with the same value for each model run, and the model runs are unique (non-reproducible).
- Fixed seed (reproducible simulation runs) — If selected, the seed value of the random number generator is fixed (specify it in the Seed value field). In this case, random number generator is initialized with the same value for each model run, and the model runs are reproducible.
-
Custom generator (subclass of Random) — If for any reason you are not satisfied with the quality of the default random number generator Random, you can substitute it with your own one. Just prepare your custom RNG (it should be a subclass of the Java class Random, e.g. MyRandom), choose this particular option and type the expression returning an instance of your RNG in the field on the right, for example: new MyRandom() or new MyRandom( 1234 ).
You can find more information here.
- Description
- Use the edit box in this section to specify an arbitrary description of the experiment.
To declare that a certain event triggers a decision point for the AI agent (and the experiment step should be performed), call the ExperimentReinforcementLearning.takeAction(agent) static function, passing any model agent as an agent argument. This argument will cause the function to access the top-level (Main) agent, forcing all RL-related data processing (for example, retrieving the observation data) to be done in the context of this agent.
For example, when the event is located within a certain agent, the following code specified as Action of this event would refer this:
ExperimentReinforcementLearning.takeAction(this)
-
How can we improve this article?
-