Background

  • Research Software Engineer at University of Warwick
  • Sustainability and EDI in the R Project (with Heather Turner)


Fostering a larger, more diverse community of contributors to base R

This talk

What I’ll do

  • Encourage you to contribute to base R
  • Encourage you to learn C
  • Show you some C code in base R
  • Show you how to contribute to base R
  • Outline the work of the RCWG

What I won’t do

  • Assume you know any C
  • Try to teach you any C

The R Dev Container

Live demo - wish me luck!

Create codespace

Can take a few minutes to create codespace. Here’s one I setup earlier.

Build R from source

https://contributor.r-project.org/r-dev-env/tutorials/building_r/

  1. svn checkout (usually https://svn.r-project.org/R/trunk) (~1 min)
svn checkout https://svn.r-project.org/R/tags/R-4-4-3 $TOP_SRCDIR
  1. Download recommended packages (<1 min)
$TOP_SRCDIR/tools/rsync-recommended
  1. Change to build directory (<1 min)
mkdir -p $BUILDDIR
cd $BUILDDIR

Build R from source (continued)

  1. Configure the build (~1 min)
$TOP_SRCDIR/configure --with-valgrind-instrumentation=1
  1. Build R (5-10 mins)
make
  1. Check R (~3 mins)
make check
  1. Make R terminals use the built R (<1 min)
which_r

What is C and how does it relate to R?

  • C is a low-level, high-performance, compiled programming language
  • It provides fine-grained control over memory and hardware
  • Much of base R is written in C
  • R provides interfaces to compiled code
  • R has a C API to deal with R objects in C

Why C?

As R users/developers

  • Write efficient, portable code
    • for efficiency, almost certainly want C++ with Rcpp
    • for portability, C
  • Encounter C code when debugging

As R contributors

  • Find root cause of bug
  • Propose a patch to the C code to fix a bug or add a new feature

#66CCBB

#6CB

The original idea

The existing code

/* #RRGGBB[AA] String to Internal Color Code */
static rcolor rgb2col(const char *rgb)
{
    unsigned int r = 0, g = 0, b = 0, a = 0; /* -Wall */
    if(rgb[0] != '#')
          error(_("invalid RGB specification"));
    switch (strlen(rgb)) {
    case 9:
          a = 16 * hexdigit(rgb[7]) + hexdigit(rgb[8]);
    case 7:
          r = 16 * hexdigit(rgb[1]) + hexdigit(rgb[2]);
          g = 16 * hexdigit(rgb[3]) + hexdigit(rgb[4]);
          b = 16 * hexdigit(rgb[5]) + hexdigit(rgb[6]);
          break;
    default:
          error(_("invalid RGB specification"));
    }

    if(strlen(rgb) == 7) 
        return R_RGB(r, g, b);
    else
        return R_RGBA(r, g, b, a);
}

The fix: part 1

switch (strlen(rgb)) {
case 9:
    a = 16 * hexdigit(rgb[7]) + hexdigit(rgb[8]);
case 7:
    r = 16 * hexdigit(rgb[1]) + hexdigit(rgb[2]);
    g = 16 * hexdigit(rgb[3]) + hexdigit(rgb[4]);
    b = 16 * hexdigit(rgb[5]) + hexdigit(rgb[6]);
    break;
case 5: 
    // Equivalent to 16 * hexdigit(rgb[4]) + hexdigit(rgb[4]);
    a = (16 + 1) * hexdigit(rgb[4]);
case 4:
    r = (16 + 1) * hexdigit(rgb[1]);
    g = (16 + 1) * hexdigit(rgb[2]);
    b = (16 + 1) * hexdigit(rgb[3]);
    break;
default:
    error(_("invalid RGB specification"));
}

The fix: part 2

From

if(strlen(rgb) == 7) 
    return R_RGB(r, g, b);
else
    return R_RGBA(r, g, b, a);

to

switch(strlen(rgb)) {
case 7: 
case 4:
    return R_RGB(r, g, b);
default:
    return R_RGBA(r, g, b, a);
}

Check progress of dev container!

Hopefully ready to run make check

Irregularities in stem() display: stem width

https://bugs.r-project.org/show_bug.cgi?id=8934 - FIXED in R 4.5.0

Showing what happened <= R 4.4.3

a <- c(8.48, 9.58, 9.96)
stem(a)
  The decimal point is at the |

  8 | 
  8 | 5
  9 | 
  9 | 6
  10 | 0

Irregularities in stem() display: n = 1

stem(2)
stem(c(2, 2))
  The decimal point is at the |

  2 | 00

Check the code

stem
function (x, scale = 1, width = 80, atom = 1e-08) 
{
    if (!is.numeric(x)) 
        stop("'x' must be numeric")
    x <- x[is.finite(x)]
    n <- as.integer(length(x))
    if (is.na(n)) 
        stop("invalid length(x)")
    if (n == 0) 
        stop("no finite and non-missing values")
    if (scale <= 0) 
        stop("'scale' must be positive")
    .Call(C_StemLeaf, as.double(x), scale, width, atom)
    invisible(NULL)
}

There’s C!

C_StemLeaf
Error: object 'C_StemLeaf' not found

Where’s C?

github.com/r-devel/r-svn

Tip

Can also search in the R Dev Container codespace, after svn checkout

C_StemLeaf()

R’s C API

SEXP C_StemLeaf(SEXP x, SEXP scale, SEXP swidth, SEXP atom)
{
    if (TYPEOF(x) != REALSXP || TYPEOF(scale) != REALSXP) 
        error("invalid input");
#ifdef LONG_VECTOR_SUPPORT
    if (IS_LONG_VEC(x))
        error(_("long vector '%s' is not supported"), "x");
#endif
    int width = asInteger(swidth), n = LENGTH(x);
    if (n == NA_INTEGER) error(_("invalid '%s' argument"), "x");
    if (width == NA_INTEGER) error(_("invalid '%s' argument"), "width");
    double sc = asReal(scale), sa = asReal(atom);
    if (!R_FINITE(sc)) error(_("invalid '%s' argument"), "scale");
    if (!R_FINITE(sa)) error(_("invalid '%s' argument"), "atom");
    stem_leaf(REAL(x), n, sc, width, sa);
    return R_NilValue;
}

stem_leaf()

static Rboolean
stem_leaf(double *x, int n, double scale, int width, double atom)
{
    // <initialise variables>

    R_rsort(x,n);

    if (n <= 1) return FALSE;

    //<more code here>
    
    /* Find the print width of the stem. */

    lo = floor(x[0]*c/mu)*mu;
    hi = floor(x[n-1]*c/mu)*mu;
    ldigits = (lo < 0) ? (int) floor(log10(-(double)lo)) + 1 : 0;
    hdigits = (hi > 0) ? (int) floor(log10((double)hi)): 0;
    ndigits = (ldigits < hdigits) ? hdigits : ldigits;

    /* Starting cell */

    if(lo < 0 && floor(x[0]*c) == lo) lo = lo - mu;
    hi = lo + mu;
    if(floor(x[0]*c+0.5) > hi) {
          lo = hi;
          hi = lo + mu;
    }
  // <more code here>
}

Live demo: Fixing n = 1 case

  • Delete lines 60-61 in stem.c
  • Run make to rebuild R (only re-builds what’s changed)
  • (Should then run make check but won’t for time’s sake)
  • Re-attach R
  • See that our change worked

Live demo: Create a patch

https://contributor.r-project.org/r-dev-env/tutorials/patch_update/

  1. Create patch file
cd $TOP_SRCDIR
svn diff > $PATCHDIR/8934-n=1.diff
ls $PATCHDIR

Live demo: Download patch

Bugzilla

The other part of the stem() fix

Evaluated under R 4.5.0

a <- c(8.48, 9.58, 9.96)
stem(a)

  The decimal point is at the |

   8 | 
   8 | 5
   9 | 
   9 | 6
  10 | 0

The fix in this commit or on bugzilla.

Learning more

C Study Group

https://contributor.r-project.org/events/c-study-group-2025/

  • May run again January–June 2026, details TBC

  • Monthly meetings, weekly suggestions

  • Work through sessions 1-5 of Harvard’s CS50 course

    cs50.harvard.edu/x

  • R’s C API

  • Run by R Contribution Working Group (RCWG)

Resources: R’s C API

RCWG

https://hturner.github.io/useR2024/

RCWG

  • Created in July 2020
  • ~25 people: R Core, R Foundation, Forwards, R Consortium Diversity & Inclusion Group, R-Ladies, Python Core + wider R community.

How to foster a larger, more diverse community of contributors to base R?

  • Supportive infrastructure
  • Opportunities to learn
  • Opportunities to contribute

Supportive infrastructure

Improved communications

https://contributor.r-project.org

Weblate interface

https://translate.rx.studio

Screenshot from Weblate

R Dev Container

https://github.com/r-devel/r-dev-env/

Screenshot of R Dev Container running in GitHub Codespaces.

Opportunities to learn

R Development Guide

https://contributor.r-project.org/rdevguide/

Tutorials

Run at useR! 2021 and 2024

Opportunities to contribute

R Contribution Office Hours

https://meetup.com/r-contributors

  • Monthly, 2nd Thursday (next July 10th, 2025)
  • Two time-zones (9am UTC, 12:30pm EST)
  • Informal opportunity to work on any aspect of contributing to R
  • Has led to patches and bug fixes

R Dev Days

Upcoming:

contributor.r-project.org

meetup.com/r-contributors

hachyderm.io/@R_Contributors

Thank you! Questions?

ella.m.kaye@warwick.ac.uk

ellakaye.github.io/c-for-r-users-2025