Performance issues with Posit Workbench Local Launcher and Open Source RStudio Server
Introduction
Users of Posit Workbench with Local Launcher and users of Open Source RStudio Server at some point will observe performance related issues.
slowness of execution grinding the server to a perceived halt
memory out of memory (OOM) events (in some cases leading to a crash of the whole server)
The main causes of this behavior are
the use of multithreaded execution (BLAS libraries, general multithreading, parallelisation)
the limited awareness on how memory a given R code will execute and the absence of efficient programming stategies.
While multithreading is a nice way to improve performance for many computations, in a multiuser system excessive use of multithreading can lead to the opposite in some cases.
This document will provide a solution to theses problems. The introduction of Cgroups will reduce general resource contention for both memory and cpus. Lastly and rather importantly, the limitation of OpenBLAS threads to a tolerable value via OMP_NUM_THREADS
is another important building block towards a more stable and resilient setup of Workbench and Open Source RStudio Server.
There is some efforts going on in Posit to address some of these issues in one of the next Posit Workbench Releases.
Cgroups
Cgroups are an efficient way of limiting processes on a server with regards to their resource usage (e.g. cpu and memory).
Posit Workbench at the moment uses User & group profiles to limit resource usage. This works quite well for many resource types but has significant disadvantages when it comes to memory and cpu limits. Memory is limited by Virtual Memory, not residential memory. This is a problem for applications like quarto
or R packages chromote
that launch processes that consume a fair amount of Virtual Memory but do not use the same amount of residential memory. Simply speaking, only residential memory is a relevant metric.
One of the reasons why Posit Workbench chose to limit virtual memory (instead of residential memory) is due to the chosen implementation. Posit Workbench uses limits.conf
(main configuration file of the pam_limits
module). Unfortunately it is precisely the residential memory that - while it can be defined - is silently ignored in any recent version of linux.
When it comes to cpu usage, Posit Workbench at the moment only supports setting nice levels (that will affect process priority) and cpu affinity (routing certain processes to certain fixed cores only).
This document provvides some simple instruction on how to implement cgroups
for memory and cpu:
memory limits defined below will set limits for residential memory only and hence remove the limitations with virtual memory discussed above.
cpu limits will enable a fair share policy where users will only get access to a certain percentage of the available cpu power (more information below).
Setup
Main configuration files
/etc/cgconfig.conf
defines the actual cgroup
. /etc/cgrules.conf
attaches the cgroups
defined in cgconfig.conf
to users and groups.
cgconfig.conf
The below example will apply the following memory cgroup named posit
. Any user or group attached to this cgroup will be able to use a maximum of 10 GB of RAM (10 GB = 10*1024^3 bytes).
CPU limit
Any process launched will get a maximum of 200000 microseconds cpu time during a 100000 microseconds time interval. This is equivalent with an effective use of 2 cpus. It is important to be aware that the processes launched by a user can still be scattered across multiple cpus - the entirety of all processes will however not be able to consume more than effectively 2 cpus. Let’s say we are on a 16 cpu server - a user will still be able to run a multithreaded binary with 16 threads and each thread will be bound to 1 of the 16 cpus. Given the limitations imposed, each process will however only be able to use 1/8, i.e. 12.5 % of each CPU. This will ensure enough capacity is available to other users as well.
# /etc/cgconfig.conf
group posit {
cpu {
cpu.cfs_quota_us = 200000;
}
memory {
memory.limit_in_bytes = 10737418240;
}
}
cgrules.conf
This config will will attach the previously defined cgroups to unix users and groups.
If we wanted to attach the above cgroup
to a group named posit_users
we would create
# /etc/cgrules.conf
@posit_users cpu,memory posit
The config file has three columns
Who will be affected by the cgroup. This can be done globally (use “*”) for every process or for specific users and groups (groups are prefixed with “@” to separate them from usersas in the workbench user and group profiles)
Which resource groups are confined (cpu and memory) in our case
The name of the cgroup as defined in
cgconfig.conf
In our example above we will attach the posit
cgroup to the unix group posit_users
and to the resource cpu
and memory
Systemctl setup
Ubuntu
We need two services, cgconfig.service
and cgred.service
On Ubuntu OS, we need to manually create systemctl
services
# create systemctl service files
cat << EOF > /usr/lib/systemd/system/cgconfig.service
[Unit]
Description=cgonfig parser
After=network.target
[Service]
User=root
Group=root
ExecStart=/usr/sbin/cgconfigparser -l /etc/cgconfig.conf
Type=oneshot
[Install]
WantedBy=multi-user.target
EOF
cat << EOF > /usr/lib/systemd/system/cgred.service
[Unit]
Description=cgrules generator
After=network.target cgconfig.service
[Service]
User=root
Group=root
Type=forking
EnvironmentFile=-/etc/cgred.conf
ExecStart=/usr/sbin/cgrulesengd
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
# Reload systemctl, enable and eventually start services
systemctl daemon-reload
Ubuntu and RHEL
On both Ubuntu and RHEL OS we need to enable and start the services
systemctl enable cgconfig.service
systemctl enable cgred.service
systemctl start cgconfig.service
systemctl start cgred.service
OS setup
Ubuntu Focal/Jammy
Install required packages
apt update && apt install cgroup-tools cgroup-lite libcgroup1 cgroupfs-mount
RedHat Enterprise Linux 7/8
Install required packages
yum install libcgroup-tools
Multithreaded OpenBLAS
Posit offers an opinionated version of R for users to download and install on their servers. This by default comes with multithreaded OpenBLAS libraries.
The use of OpenBLAS will by default trigger the spwaning of as many threads as the server has physical cores available. Depending on the type of R code run, those threads are more or less heavily utilized and can put strain on the system. Especially on multi-user Workbench ssystems with Local Launcher or Open Source RStudio Server systems this can become a problem leading to slow performance due to cpu oversubscription and also lead to OOM (out of memory) events.
A solution to this problem is the use of the OMP_NUM_THREADS
environment variable. This variable should be set to a value that is the rounded whole number as a result of the division of the number of available cores by the expected average number of concurrent sessions. On a 72 core system with on average 40 concurrent sessions you would set OMP_NUM_THREADS=2
.
OMP_NUM_THREADS
can be set in
A script in
/etc/profile.d
/etc/rstudio/launcher-env
Setting OMP_NUM_THREADS
in Renviron.site
does not take any effect. R is already running at that stage and can no longer influence the BLAS/LAPACK libraries (see remarks below for a workaround).
Overrides
While the above will significantly improve the situation on your workbench or rstudio OS server, some users may find this setting too limiting. By telling them that they must only use the override sparingly and especially duing times of less utilisation (e.g. during the weekend or nights for long-running jobs), they can set their own OMP_NUM_THREADS
value via the RhpcBLASctl package.
library(RhpcBLASctl)
omp_set_num_threads(4)
will for example set the number of threads for OpenBLAS to 4.
Setting OMP_NUM_THREADS
in a running R session does not have any effect. RhpcBLASctl
directly executes system calls in the loaded shared library while OMP_NUM_THREADS
is only considered when an R session starts.