Performance issues with Posit Workbench Local Launcher and Open Source RStudio Server

Author

Michael Mayer

Introduction

Users of Posit Workbench with Local Launcher and users of Open Source RStudio Server at some point will observe performance related issues.

  • slowness of execution grinding the server to a perceived halt

  • memory out of memory (OOM) events (in some cases leading to a crash of the whole server)

The main causes of this behavior are

  • the use of multithreaded execution (BLAS libraries, general multithreading, parallelisation)

  • the limited awareness on how memory a given R code will execute and the absence of efficient programming stategies.

While multithreading is a nice way to improve performance for many computations, in a multiuser system excessive use of multithreading can lead to the opposite in some cases.

This document will provide a solution to theses problems. The introduction of Cgroups will reduce general resource contention for both memory and cpus. Lastly and rather importantly, the limitation of OpenBLAS threads to a tolerable value via OMP_NUM_THREADS is another important building block towards a more stable and resilient setup of Workbench and Open Source RStudio Server.

Note

There is some efforts going on in Posit to address some of these issues in one of the next Posit Workbench Releases.

Cgroups

Cgroups are an efficient way of limiting processes on a server with regards to their resource usage (e.g. cpu and memory).

Posit Workbench at the moment uses User & group profiles to limit resource usage. This works quite well for many resource types but has significant disadvantages when it comes to memory and cpu limits. Memory is limited by Virtual Memory, not residential memory. This is a problem for applications like quarto or R packages chromote that launch processes that consume a fair amount of Virtual Memory but do not use the same amount of residential memory. Simply speaking, only residential memory is a relevant metric.

One of the reasons why Posit Workbench chose to limit virtual memory (instead of residential memory) is due to the chosen implementation. Posit Workbench uses limits.conf (main configuration file of the pam_limits module). Unfortunately it is precisely the residential memory that - while it can be defined - is silently ignored in any recent version of linux.

When it comes to cpu usage, Posit Workbench at the moment only supports setting nice levels (that will affect process priority) and cpu affinity (routing certain processes to certain fixed cores only).

This document provvides some simple instruction on how to implement cgroups for memory and cpu:

  • memory limits defined below will set limits for residential memory only and hence remove the limitations with virtual memory discussed above.

  • cpu limits will enable a fair share policy where users will only get access to a certain percentage of the available cpu power (more information below).

Setup

Main configuration files

/etc/cgconfig.conf defines the actual cgroup . /etc/cgrules.conf attaches the cgroups defined in cgconfig.conf to users and groups.

cgconfig.conf

The below example will apply the following memory cgroup named posit . Any user or group attached to this cgroup will be able to use a maximum of 10 GB of RAM (10 GB = 10*1024^3 bytes).

CPU limit

Any process launched will get a maximum of 200000 microseconds cpu time during a 100000 microseconds time interval. This is equivalent with an effective use of 2 cpus. It is important to be aware that the processes launched by a user can still be scattered across multiple cpus - the entirety of all processes will however not be able to consume more than effectively 2 cpus. Let’s say we are on a 16 cpu server - a user will still be able to run a multithreaded binary with 16 threads and each thread will be bound to 1 of the 16 cpus. Given the limitations imposed, each process will however only be able to use 1/8, i.e. 12.5 % of each CPU. This will ensure enough capacity is available to other users as well.

# /etc/cgconfig.conf

group posit {
    cpu { 
        cpu.cfs_quota_us = 200000; 
    }
    memory {
        memory.limit_in_bytes = 10737418240;
    }
}

cgrules.conf

This config will will attach the previously defined cgroups to unix users and groups.

If we wanted to attach the above cgroup to a group named posit_users we would create

# /etc/cgrules.conf

@posit_users      cpu,memory  posit

The config file has three columns

  1. Who will be affected by the cgroup. This can be done globally (use “*”) for every process or for specific users and groups (groups are prefixed with “@” to separate them from usersas in the workbench user and group profiles)

  2. Which resource groups are confined (cpu and memory) in our case

  3. The name of the cgroup as defined in cgconfig.conf

In our example above we will attach the posit cgroup to the unix group posit_users and to the resource cpu and memory

Systemctl setup

Ubuntu

We need two services, cgconfig.service and cgred.service

On Ubuntu OS, we need to manually create systemctl services

# create systemctl service files

cat << EOF > /usr/lib/systemd/system/cgconfig.service
[Unit]
Description=cgonfig parser
After=network.target
[Service]
User=root
Group=root
ExecStart=/usr/sbin/cgconfigparser -l /etc/cgconfig.conf
Type=oneshot

[Install]
WantedBy=multi-user.target
EOF

cat << EOF > /usr/lib/systemd/system/cgred.service
[Unit]
Description=cgrules generator
After=network.target cgconfig.service

[Service]
User=root
Group=root
Type=forking
EnvironmentFile=-/etc/cgred.conf
ExecStart=/usr/sbin/cgrulesengd
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

# Reload systemctl, enable and eventually start services
systemctl daemon-reload
Ubuntu and RHEL

On both Ubuntu and RHEL OS we need to enable and start the services


systemctl enable cgconfig.service
systemctl enable cgred.service

systemctl start cgconfig.service
systemctl start cgred.service

OS setup

Ubuntu Focal/Jammy

Install required packages

apt update && apt install cgroup-tools cgroup-lite libcgroup1  cgroupfs-mount
RedHat Enterprise Linux 7/8

Install required packages

yum install libcgroup-tools 

Multithreaded OpenBLAS

Posit offers an opinionated version of R for users to download and install on their servers. This by default comes with multithreaded OpenBLAS libraries.

The use of OpenBLAS will by default trigger the spwaning of as many threads as the server has physical cores available. Depending on the type of R code run, those threads are more or less heavily utilized and can put strain on the system. Especially on multi-user Workbench ssystems with Local Launcher or Open Source RStudio Server systems this can become a problem leading to slow performance due to cpu oversubscription and also lead to OOM (out of memory) events.

A solution to this problem is the use of the OMP_NUM_THREADS environment variable. This variable should be set to a value that is the rounded whole number as a result of the division of the number of available cores by the expected average number of concurrent sessions. On a 72 core system with on average 40 concurrent sessions you would set OMP_NUM_THREADS=2 .

OMP_NUM_THREADS can be set in

  • A script in /etc/profile.d

  • /etc/rstudio/launcher-env

Note

Setting OMP_NUM_THREADS in Renviron.site does not take any effect. R is already running at that stage and can no longer influence the BLAS/LAPACK libraries (see remarks below for a workaround).

Overrides

While the above will significantly improve the situation on your workbench or rstudio OS server, some users may find this setting too limiting. By telling them that they must only use the override sparingly and especially duing times of less utilisation (e.g. during the weekend or nights for long-running jobs), they can set their own OMP_NUM_THREADS value via the RhpcBLASctl package.

library(RhpcBLASctl)
omp_set_num_threads(4)

will for example set the number of threads for OpenBLAS to 4.

Note

Setting OMP_NUM_THREADS in a running R session does not have any effect. RhpcBLASctl directly executes system calls in the loaded shared library while OMP_NUM_THREADS is only considered when an R session starts.