AnyLogic
Expand
Font size

Private Cloud: Troubleshooting

This section covers various issues you may encounter when administrating Private Cloud and describes how to handle them.

Issues can take place:

  • During or after Private Cloud installation
  • Upon restart of service components

In case an issue occurs, then to identify the reason for an issue you have encountered, consult Docker logs, logs of individual components, and installer logs.
To obtain the list of running Docker containers, execute the following command:

sudo docker ps
To collect Docker logs from an individual container running a specific Private Cloud component, you can execute the following command:
docker logs %component name%
If the command output is too extensive, try trimming it to the last 100 rows:
docker logs %component name% --tail 100
The installation script logs are located in /tmp/alc_installer.log by default.
If a problem occurs with the Private Cloud instance and none of the options described below helps, please contact our support team at support@anylogic.com and provide them with the list of running containers and their logs to find a solution.
If a problem occurs with the installation script, provide the list of running containers, and the logs of the controller component and installation script.

Timeout: Socket is not established

The log of the controller service component may report this issue for two primary reasons:

  • One of the ports required to run Private Cloud is closed
  • iptables chains don’t contain the DROP policy for SSH and Docker connections

To identify the problem:

  1. Open your Linux terminal.
  2. Check if all the required ports are open and available:
    22, 80, 5000, 5432, 5672, 9000, 9042, 9050, 9080, 9101, 9102, 9103, 9200, 9201, 9202
    To do that, run any of the following commands:
    sudo lsof -i -P -n | grep LISTEN
    sudo netstat -tulpn | grep LISTEN
    sudo nmap -sTU -O <%Private Cloud IP address%>
    For example, the lsof command returns a table consisting of lines that look something like this:
    sshd 9406 root 4u IPv6 10473818 0t0 TCP *:22 (LISTEN)
    In this line, sshd is the application name, 22 is the port, and 9406 is the process number. LISTEN means the port is open and accepts new connections.
  3. Check the iptables configuration by running the following command:
    iptables -L -v -n
    Should the record containing the Private Cloud IP address include DROP as target, this means the SSH connection is dropped and controller is unable to deploy Cloud properly.

Runtime error: Out of memory

Possible reasons for this error are the insufficient number of CPU cores on your host machine and a lack of available memory.

To avoid the issue, consider increasing the number of CPUs and memory, following Private Cloud system requirements.

Multiple service components (balancer, executor, executor-multirun) constantly restart

To identify the reason for this issue, collect logs from the Docker container running the executor service component by executing the following command:

docker logs executor

The command output should contain the following message: evaluation period has expired.

Consider requesting and re-entering the license key via Team License Server, or download a new evaluation build and install it from scratch.

The "Evaluation period has expired" message

Upon entering your Private Cloud instance via a web browser, you can see the Evaluation period has expired message:

Private Cloud: Evaluation period has expired

Provided you are sure you have an appropriate license, this may be a signal of an issue.

  1. Check the connection with your Team License Server. If Team License Server is unable to start after the host machine restart, and the message address is already in use appears in logs, make sure the server is running under a user who’s not root.
  2. Locate the alc/controller/teamlicense-server.conf file in the directory where your Private Cloud is installed (home/alcadm by default).
    Within the file, make sure the address of Team License Server is present in the following format: %the IP address of the server%:8443.
    If you modified the contents of the file, execute the following command: docker restart controller.
  3. Upon completing the checks described above, execute docker restart rest in the terminal on the host machine. Wait for a couple of minutes.

If these methods don’t help you, make sure Team License Server is available for connection from the Cloud container. To do that, try the netcat (nc) utility, or telnet. Say, execute the following command:

docker exec -it controller /bin/sh
nc %the IP address of the server% 8443
%message%

If the command returns P, this means Team License Server is available.

If none of this helps, retrieve the log of the controller component log by execute the following command on the machine that serves it:

docker logs controller --tail 100

After that, contact our support team at support@anylogic.com and attach the log to your message.

Some components are not starting after the rest component restart

This issue may occur on RedHat machines running Private Cloud. It leads to the following behavior:

  • Any attempt to run a model leads to the 503 error, visible in the web browser console
  • Upon execution of the docker ps command, the incomplete list of service components appears
  • Some services cannot be started due to a required port being occupied
  • Ports cannot be released even after a problematic service is stopped

Additional errors can be found in Docker logs, available by executing the following command:

journalctl -u docker

To solve this issue:

  1. Disable IPv6:
    sysctl -w net.ipv6.conf.all.disable_ipv6=1
    sysctl -w net.ipv6.conf.default.disable_ipv6=1
  2. Stop all Docker containers:
    docker stop $(docker ps -q)
    Be aware that this will affect all Docker containers, including those used for purposes other than Private Cloud. Use with caution.
  3. Kill Docker processes:
    systemctl docker stop
  4. Restart Docker:
    systemctl docker start
  5. Restart the Docker container storing controller service component:
    docker start controller
    Other Private Cloud containers will start automatically.

The frontend component does not start due to the authentication failure

This issue can be identified in the controller container logs, which you can access by executing the following command: docker logs controller. If the SSH authentication failure is the reason for the issue, you may start the frontend component manually:

docker start frontend

The other reason for this issue may be the restriction that forbids executing the scp command on the machine that serves Private Cloud. This, you may, too, identify in the controller container logs.

If this is the issue in case, manually copy the needed files to the cache folder:

sudo cp -r /home/alcadm/alc/controller/preload/frontend /home/alcadm/alc/cache
/home/alcadm is the default location of the Private Cloud files on the machine. If you are using a non-default location, modify the command to reflect this.

If even after that frontend does not start, do the following:

  1. Identify the command used for starting the frontend component, in the controller component logs, which you can open by executing the following:
    docker logs controller | grep "docker run " | grep frontend
  2. After locating the command, execute it. It should look similar to the following:
    docker run -d --name frontend --restart unless-stopped -v /apps/anylogic/alc/cache/frontend:/etc/nginx/conf.d -p 80:80/tcp -e ALC_CONTROLLER_HOST=10.0.103.111 -e ALC_CONTROLLER_PORT=9000 -e ALC_NODE_ID=WaowUc4p6c-5WFZAo2tZxyFGsht12EM4w2D_QTxiP-o local.cloud.registry:5000/com.anylogic.cloud/frontend:2.3.0

After you execute the command, the controller component will start the frontend component automatically.

The controller component cannot start other services

Sometimes the controller component is unable to start other service components after reboot, returning errors like Access Denied or Authentication Required, whereupon the manual start with the docker start command works for the services in question.

The reason for this is most likely a problem with the SSH connection and authentication. To avoid this, make sure that:

  • The SSH password of the Private Cloud administrator user (that is, alcadm by default) is not expired. If this is the case, update the configuration.
  • SSH connections are permitted to the administrator user. Their access to SSH can be revoked after multiple unsuccessful connection attempts.
  • The hosts.allow and hosts.deny configuration files are configured properly in /etc.

In addition, if controller is unable to start only some of the service components, make sure the ports for the problematic services are open and available. Refer to the list of the service components and ports associated with them, in the architecture description.

If it appears as if there is no problem with ports, try restarting the Docker daemon by executing the following command:

systemctl docker restart docker

Problem with the Docker image pulling

When a host machine connects to the web via a proxy server, it may incorrectly handle the connections to the Docker registry and Private Cloud local registry.

To avoid this, you have to configure Docker in a way that will allow it to use an HTTP or HTTPS proxy to connect to the registry-1.docker.io URL and connect to the Private Cloud registry without proxy. For that purpose, use the NO_PROXY flag:

[Service]
Environment="HTTP_PROXY=http://proxy.example.com:80/"
Environment="HTTPS_PROXY=http://proxy.example.com:80/"
"NO_PROXY=localhost,127.0.0.1,local.cloud.registry"

Activation is lost upon modification of the Private Cloud host name

To make sure the issue was not caused by the incorrect configuration of addresses, check the /alc/controller/conf/public.json and /opt/anylogic-team-license-server/conf/serv.properties files in the Private Cloud installation directory, as well as the information on the Team License Server web interface — all of these should use the identical address of the Private Cloud instance.

To properly re-configure the addresses and restart the services:

  1. Stop Team License Server by executing the following command (not under root):
    sudo service anylogic-tls stop
  2. Modify /alc/controller/conf/public.json in the Private Cloud directory by specifying the proper name of your Private Cloud host.
  3. Modify /opt/anylogic-team-license-server/conf/serv.properties in the Private Cloud directory by specifying the proper name of your Private Cloud host.
  4. Save all changes.
  5. Start {tls}} by executing the following command (not under root):
    sudo service anylogic-tls start
  6. Execute the following commands subsequently:
    docker stop controller
    docker stop rest
    docker start controller

The controller component log reports the exception after an update

This issue occurs after updates are performed on an existing Private Cloud instance. A migration issue in the controller component can force the rest component to restart very frequently. To identify it, check the log of the container that runs controller with the following command:

docker logs controller

Its output should report something like this:

2021-04-16 13:11:56:453 ERROR CONTROLLER - 2021-04-16T13:11:56.453 - migration REST: com.anylogic.cloud.migration.MigrationException: liquibase.exception.LockException: Could not acquire change log lock. Currently locked by 9ac02ab910d0 (172.17.0.8) since 4/8/21 11:02 PM

Additionally, the rest container log (run docker logs rest to open it) will report another error:

Error starting ApplicationContext. To display the auto-configuration report re-run your application with 'debug' enabled.
[main] ERROR org.springframework.boot.SpringApplication - Application startup failed

The most possible reason for this issue is a critical malfunction of some kind (for example, a power outage) that occurred during the execution of the update script.

To clean up affected files and solve the issue, do the following:

  1. In the container running postgres, clear databasechangeloglock:
    docker exec -ti -u postgres postgres psql anylogic_cloud -c 'truncate databasechangeloglock;'
  2. Restart controller.
    docker restart controller

Enabling the Elastic Load Balancer HTTPS proxy support in the Cloud instance

An Elastic Load Balancer is an automatic distributor of incoming traffic supplied by Amazon for their EC2 instances and containers. Among other things, it also has built-in HTTPS proxy support. When being used, this proxy may cause connection issues due to conflicts with the HTTPS implementation in Private Cloud.

Should you decide to use the Load Balancer for your Private Cloud instance, you can avoid these issues by doing the following:

  1. Start the normal installation routine for Private Cloud.
  2. When asked whether you want to enable the HTTPS support, type n (as in, disable it).
  3. Complete the installation routine and start Private Cloud.
  4. Configure your Load Balancer as you see fit using the instructions provided by the Amazon documentation.
  5. After this, manually enable the HTTPS support in the appropriate configuration file, public.json, by replacing the protocol in the value of the gatewayHost JSON field:
    {
      "gatewayHost" : "https://10.0.0.1:8080"
    }
  6. Restart Private Cloud:
    sudo docker restart controller rest

After this procedure, your Private Cloud instance should normally handle the traffic incoming from the Elastic Load Balancer.

To update Private Cloud that uses the Elastic Load Balancer, run the update script and specify that your instance does not use HTTPS:
sudo ./install.sh update --use_https n
After that, repeat steps 5-6 from the instruction above.

The "Failed to find free node" error

This issue may occur when you start the model’s animation (by clicking the Play button on the model screen or experiment toolbar).

The error message looks as follows:

AnyLogic Cloud: The free computational node error

The same issue may arise upon starting the model via the Cloud UI without animation.

The reason for this is one of the following:

  • The model tries to utilize more RAM than the machine that serves Private Cloud has
  • There are no available CPU cores on the machine that serves Private Cloud

To identify the issue, go to the Running Tasks section of the administrator panel. If all executor components there are busy, there are no free CPU cores available, and you need to wait until the execution of some other model is complete.

In case there are free executor components, this means you need to modify the amount of RAM required to run the model:

  1. Go to the experiment dashboard.
  2. Click settings in the left sidebar next to the needed experiment.
  3. In the Experiment settings section of Inputs, locate Maximum memory, MB.
    This option defines the maximum size of the Java heap allocated for the model and the built-in DB.
  4. Click visibility_off to make this option visible on the experiment dashboard.
  5. Click save on the left sidebar.
  6. Now, on the experiment dashboard, modify how much memory is allocated.
    The amount of allocated memory should be no more than half of all RAM available on the machine that serves Private Cloud minus RAM allocated for models being run currently.

Controller cannot start other components

This issue may be caused by the problems with the RSA-SHA1 keys.

To identify the issue:

  1. Run the following command:
    ssh: sudo systemctl status ssh
  2. If the issue is present, the following error message will appear: key type ssh-rsa not in PubkeyAcceptedAlgorithms [preauth].
  3. To fix the problem, modify sshd_config:
    cat /etc/ssh/sshd_config
  4. Add (or modify) the following lines:
    PubkeyAuthentication yes
    PubkeyAcceptedKeyTypes=+ssh-rsa
  5. Save sshd_config.
  6. Restart SSH:
    sudo systemctl restart ssh

An update of the SSL certificate is required

To replace the outdated SSL certificate used by Private Cloud for HTTPS connections:

  1. Go to the home directory of Private Cloud. Its default location is as follows:
    /home/alcadm
  2. When there, go to the following directory:
    alc/controller/preload/frontend
  3. Remove the alc.key and alc.cert files present there.
  4. Put your new key file and certificate file (if applicable) in the directory. Rename them to alc.key and alc.cert respectively.
  5. Restart the frontend component:
    • Go to the Services section of the administrator panel, select frontend there, then click Restart service, or
    • Execute the following command in your Linux terminal:
      sudo docker stop frontend

After some time, the frontend component will restart, and your Private Cloud will switch to the new HTTPS certificate.

How can we improve this article?