Body
Installing R packages on Linux servers requires patience and sometimes persistence. All packages are downloaded as source code and compiled and linked and tested by the installer, which can take considerable time and produces a lot of diagnostic output. This article describes the 'native' package installation method for R. Other package management tools, e.g. conda, may also be used, and R can be bundled with sets of packages in containers.
Versions of R
The default version of R on RHEL8 systems is 4.4.1 (/usr/bin/R). This is up to date, but is built in a slightly different way than the versions we have available as modules (below). The package libraries at /optnfs/el7/Rlibs/ were all built using the R modules. You can check what version you are currently set up to use with the command R --version
Versions differing at the third level are considered compatible for the purposes of installed packages, e.g. if we upgrade 4.4.0 to 4.4.2, it will use the same packages, but 4.4 and 4.3 have distinct sets of packages, and if you switch between them, you will need to reinstall the packages you use. Additional versions of R on the Dartmouth Research Computing systems are installed using the modules system. Run module avail R
to see what is available, and module load
to use one of those versions.
The Library Search Path
An R library is a directory containing one or more packages, each of which is a subdirectory immediately inside the library.
Libraries are located in R by searching a set of directories. The core R installation only contains a very small set of packages, always in the default search path. R also checks for a personal R library in your home directory and includes that, if it exists. The location is ~/R/<<architecture>>/<<R-version>> e.g. ~/R/x86_64-pc-linux-gnu-library/4.2 (where "~" represents your home directory, "x86_64-pc-gnu-library" is an architecture label for 64-bit Intel, and "4.2" is the R version). The .libPaths()
function is used to view or modify the set of directories that will be searched, in order, for packages. You can add any library directory that you have read access to, so this is how a shared R library can be set up for a lab, or for a specific R application. The library search path can also be initialized using environment variables (R_LIBS), or a .Rprofile
file. Be careful not to mix packages built under different versions of R. The results will be very confusing.
For example, the default Red Hat R RPM installs to /usr/lib64/R/library, and looks for a personal library in ~/R/x86_64-redhat-linux-gnu-library/4.4, and it also creates a placeholder empty directory in /usr/share/R/library (which we don't generally use). This is how you could prepend a directory in DartFS to the front of the existing list of directories, using the R/4.4.0 module version:
$ module load R/4.4.0
$ R
R version 4.4.0 (2024-04-24) -- "Puppy Cup"
Copyright (C) 2024 The R Foundation for Statistical Computing
(startup messages omitted)
> .libPaths()
[1] "<<home directory>>/R/x86_64-pc-linux-gnu-library/4.4"
[2] "/dartfs-hpc/admin/opt/el8/R/4.4.0/lib/R/library"
> .libPaths(c("/dartfs-hpc/admin/opt/el8/Rlibs/4.4", .libPaths()))
> .libPaths()
[1] "/dartfs-hpc/admin/opt/el8/Rlibs/4.4"
[2] "<<home-directory>>/R/x86_64-pc-linux-gnu-library/4.4"
[3] "/dartfs-hpc/admin/opt/el8/R/4.4.0/lib/R/library"
Alternatively
export R_LIBS=/optnfs/el8/Rlibs/4.4
before starting R will have the same effect.
Research Computing R package libraries
The libraries below /optnfs/el8/Rlibs contain packages built under RHEL8 linux, for different versions of R, and are available for general use. They will not be searched by default, and must be added to the search path if you wish to use them, using the method above. The libraries contain all needed prerequisite packages.
Compilation options
R libraries may be coded in various languages, but C++ is most common. We occasionally see C and Fortran also. When a library is installed, R will use the C++ compiler it finds in your $PATH list. Since libraries may be written to a newer C++ standard than the default on RHEL, it is sometimes necessary to arrange for a newer compiler suite to be used. On the Dartmouth servers, this is done by activating one of the gcc-toolset packages using scl. However, on RHEL8 this is not necessary.
For example, to use the gcc-toolset-13 compilers
$ scl enable gcc-toolset-13 bash
This is needed only for installing, not for subsequent use.
Special options passed to the compilers during build are stored in the file ~/.R/Makevars Very occasionally it may be necessary to modify those options. A typical Makevars file contains the following, which tell the build process how to compile code labelled as, for example, C++17 standard.
CXX11=g++
CXX11FLAGS=-O2 -march=native -mtune=native -fPIC -std=c++11
CXX11STD=-std=c++11
CXX14=g++
CXX14FLAGS=-O3 -march=native -mtune=native -fPIC -std=c++14
CXX17=g++
CXX17STD=-std=c++17
CXX17FLAGS=-O3 -mtune=native -fPIC -std=c++17
An additional step you should take if you use conda, is to deactivate any conda environments, including the miniconda base environment. If you do not, R may try to use shared libraries from conda, which will not work. Once a package is installed into your R library, you can use conda without affecting the R package.
$ conda deactivate
install.packages() and CRAN
The default source for R packages is CRAN, the Comprehensive R Archive Network, a network of mirrored servers around the world with freely available code and documentation.
The R function to download, compile, test and install a new package from CRAN into a library, is install.packages() It has many optional features, but most often you will only need to specify the package name(s), where to get them from (the repository) and perhaps where to install them to. The default destination is the first directory in the library path (see above) which is writable. If you do not have a personal R library yet, install.packages() will prompt if it should create one. A package specifies what other packages it depends on, and a list of all the missing dependencies is created, and then they are downloaded and installed in the appropriate order. A complex package may have many dependencies, resulting in a long download and build cycle. install.packages() may also complain that some pre-existing packages are out of date, and offer to upgrade them. The prompting only happens if you are running R interactively. You can install packages with a single shell command, using Rscript, but in that case there is no prompting and some updates may be skipped, or it will fail to create a personal library. Another very useful option to speed things up is Ncpus=N where N is the number of CPUs that you want the installation process to use.
Example:
$ R
R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
(rest of startup messages omitted)
> install.packages("ggraph", repos="cloud.r-project.org", Ncpus=4)
Installing package into '/dartfs-hpc/rc/home/a/d31314a/R/x86_64-pc-linux-gnu-library/4.1'
(as 'lib' is unspecified)
trying URL 'cloud.r-project.org/src/contrib/ggraph_2.0.5.tar.gz'
Content type 'application/x-gzip' length 3217051 bytes (3.1 MB)
==================================================
downloaded 3.1 MB
* installing *source* package 'ggraph' ...
** package 'ggraph' successfully unpacked and MD5 sums checked
** using staged installation
** libs
(compilation and installation messages omitted)
quit()
or as a single command:
Rscript -e 'install.packages("ggraph", repos="https://cloud.r-project.org", Ncpus=4)'
(similar output omitted)
Example of installing into an explicitly named library. The directory must exist and be writable. Prerequisite packages found elsewhere in the library path will not be reinstalled, so beware of prerequisites satisfied by a personal library, while trying to set up a shared lab library.
Rscript -e 'install.packages("ggraph", repos="https://cloud.r-project.org", lib="/dartfs-hpc/rc/lab/X/XXX/shared/R/4.4" )'
Checking what packages are installed
Use command installed.packages() to list all packages that are available in each library listed in your .libPaths(). The output includes version, what library it came from, dependencies, and licensing information.
Use command sessionInfo() to show what packages have actually been loaded in this R session, versions and where they were loaded from, plus other information
update.packages()
Maintaining package libraries is the performed with the update.packages() function. Required arguments are the path to the library you wish to update, and a repository, as for install.packages(). All packages in the library are compared to the latest versions in the repository and updated as needed. You must have write access to the library directory, and no R sessions can be using it at the time. For a library in shared NFS space, it isn't clear if R can tell when others are using it.
e.g.
> update.packages("/dartfs-hpc/rc/lab/R/RCStaff/shared/R/4.0", ask=FALSE, repos="https://cloud.r-project.org")
The parameter "ask=FALSE
" will tell R that you want to update all eligible packages in the named library. Without that, R will prompt you for every package.
To update just a single package, reinstall it using install.packages(). This will update it if needed.
Bioconductor and other repositories
Another major repository is Bioconductor, aimed at bioinformatics and data scientists. Packages may also be downloaded from other collections, or individual git repositories (e.g. hosted on github). If the installation instructions for a package tell you to install from Bioconductor, you must first installl the BiocManager package from CRAN, and then use the custom installer provided by that, to install packages from Bioconductor.
e.g., to install "rtracklayer" from Bioconductor
install.packages("BiocManager", repos="https://cloud.r-project.org")
BiocManager::install("rtracklayer")
Other code sources use a package devtools from CRAN (which has many prerequisites of its own). This has an installer for github repositories
e.g. to install package "leidenbase" from a github repository named cole-trapnell-lab:
install.packages("devtools", repos="cloud.r-project.org")
devtools::install_github('cole-trapnell-lab/leidenbase')
Special Considerations
Always read the installation instructions written by the developer of the package you are trying to install. Sometimes there are known dependency errors, requiring you to manually install some packages before you attempt to install the primary target package. System libraries (normally installed as Red Hat RPMs or equivalent) can not be installed by R, and an install may fail because of a missing library or header file, but the error message should give a hint about what needs to be installed. If this is something supported by the Linux distribution, we may be able to add it.
There may also be a requirement for non-standard system libraries. If the special software has been installed and made available as a module, it must be loaded before you try to build an R package which depends on it. E.g. some R packages need the GDAL software, which can be made available by running module load gdal