C for R users

ellakaye.github.io/c-for-r-users

Ella Kaye
useR! 2024, Salzburg

Background

  • Research Software Engineer at University of Warwick
  • Sustainability and EDI in the R Project (with Heather Turner)


Fostering a larger, more diverse community of contributors to base R

This talk

What I’ll do

  • Encourage you to learn C
  • Show you some C code in base R
  • Encourage you to contribute to base R

What I won’t do

  • Assume you know any C
  • Try to teach you any C

What is C and how does it relate to R?

  • C is a low-level, high-performance, compiled programming language
  • It provides fine-grained control over memory and hardware
  • Much of base R is written in C
  • R provides interfaces to compiled code
  • R has a C API to deal with R objects in C

Why C?

As R users/developers

  • Write efficient, portable code
    • for efficiency, almost certainly want C++ with Rcpp
    • for portability, C
  • Encounter C code when debugging

As R contributors

  • Find root cause of bug
  • Propose a patch to the C code to fix a bug

Irregularity in stem() display

https://bugs.r-project.org/show_bug.cgi?id=8934

a <- c(8.48, 9.58, 9.96)
stem(a)

  The decimal point is at the |

  8 | 
  8 | 5
  9 | 
  9 | 6
  10 | 0
stem(2)
stem(c(2, 2))

  The decimal point is at the |

  2 | 00

Check the code

stem
function (x, scale = 1, width = 80, atom = 1e-08) 
{
    if (!is.numeric(x)) 
        stop("'x' must be numeric")
    x <- x[is.finite(x)]
    n <- as.integer(length(x))
    if (is.na(n)) 
        stop("invalid length(x)")
    if (n == 0) 
        stop("no finite and non-missing values")
    if (scale <= 0) 
        stop("'scale' must be positive")
    .Call(C_StemLeaf, as.double(x), scale, width, atom)
    invisible(NULL)
}

There’s C!

C_StemLeaf
Error in eval(expr, envir, enclos): object 'C_StemLeaf' not found

Where’s C?

github.com/r-devel/r-svn

C_StemLeaf()

R’s C API

SEXP C_StemLeaf(SEXP x, SEXP scale, SEXP swidth, SEXP atom)
{
    if (TYPEOF(x) != REALSXP || TYPEOF(scale) != REALSXP) 
        error("invalid input");
#ifdef LONG_VECTOR_SUPPORT
    if (IS_LONG_VEC(x))
        error(_("long vector '%s' is not supported"), "x");
#endif
    int width = asInteger(swidth), n = LENGTH(x);
    if (n == NA_INTEGER) error(_("invalid '%s' argument"), "x");
    if (width == NA_INTEGER) error(_("invalid '%s' argument"), "width");
    double sc = asReal(scale), sa = asReal(atom);
    if (!R_FINITE(sc)) error(_("invalid '%s' argument"), "scale");
    if (!R_FINITE(sa)) error(_("invalid '%s' argument"), "atom");
    stem_leaf(REAL(x), n, sc, width, sa);
    return R_NilValue;
}

stem_leaf()

static Rboolean
stem_leaf(double *x, int n, double scale, int width, double atom)
{
    // <initialise variables>

    R_rsort(x,n);

    if (n <= 1) return FALSE;

    //<more code here>
    
    /* Find the print width of the stem. */

    lo = floor(x[0]*c/mu)*mu;
    hi = floor(x[n-1]*c/mu)*mu;
    ldigits = (lo < 0) ? (int) floor(log10(-(double)lo)) + 1 : 0;
    hdigits = (hi > 0) ? (int) floor(log10((double)hi)): 0;
    ndigits = (ldigits < hdigits) ? hdigits : ldigits;

    /* Starting cell */

    if(lo < 0 && floor(x[0]*c) == lo) lo = lo - mu;
    hi = lo + mu;
    if(floor(x[0]*c+0.5) > hi) {
          lo = hi;
          hi = lo + mu;
    }
  // <more code here>
}

Interfaces / pryr

We’ve seen .Call(). In base R, there’s also .Internal() and .Primitive()

e.g. the source code for tabulate includes:

.Internal(tabulate(bin, nbins))

We can find the underlying code on GitHub with

pryr::show_c_source(.Internal(tabulate(bin, nbins)))

We can also use pryr to find the SEXTYPE:

pryr::sexp_type(c(8.48, 9.58, 9.96))
[1] "REALSXP"

#66CCBB

#6CB

The original idea

The existing code

/* #RRGGBB[AA] String to Internal Color Code */
static rcolor rgb2col(const char *rgb)
{
    unsigned int r = 0, g = 0, b = 0, a = 0; /* -Wall */
    if(rgb[0] != '#')
          error(_("invalid RGB specification"));
    switch (strlen(rgb)) {
    case 9:
          a = 16 * hexdigit(rgb[7]) + hexdigit(rgb[8]);
    case 7:
          r = 16 * hexdigit(rgb[1]) + hexdigit(rgb[2]);
          g = 16 * hexdigit(rgb[3]) + hexdigit(rgb[4]);
          b = 16 * hexdigit(rgb[5]) + hexdigit(rgb[6]);
          break;
    default:
          error(_("invalid RGB specification"));
    }

    if(strlen(rgb) == 7) 
        return R_RGB(r, g, b);
    else
        return R_RGBA(r, g, b, a);
}

The fix: part 1

switch (strlen(rgb)) {
case 9:
    a = 16 * hexdigit(rgb[7]) + hexdigit(rgb[8]);
case 7:
    r = 16 * hexdigit(rgb[1]) + hexdigit(rgb[2]);
    g = 16 * hexdigit(rgb[3]) + hexdigit(rgb[4]);
    b = 16 * hexdigit(rgb[5]) + hexdigit(rgb[6]);
    break;
case 5: 
  // Equivalent to 16 * hexdigit(rgb[4]) + hexdigit(rgb[4]);
  a = (16 + 1) * hexdigit(rgb[4]);
case 4:
  r = (16 + 1) * hexdigit(rgb[1]);
  g = (16 + 1) * hexdigit(rgb[2]);
  b = (16 + 1) * hexdigit(rgb[3]);
  break;
default:
    error(_("invalid RGB specification"));
}

The fix: part 2

From

if(strlen(rgb) == 7) 
    return R_RGB(r, g, b);
else
    return R_RGBA(r, g, b, a);

to

switch(strlen(rgb)) {
case 7: 
case 4:
    return R_RGB(r, g, b);
default:
    return R_RGBA(r, g, b, a);
}

Learning more

C study group

https://contributor.r-project.org/events/c-study-group-2024/

  • Will run again January–June 2025, details TBC

  • Monthly meetings, weekly suggestions

  • Work through sessions 1-5 of Harvard’s CS50 course

    cs50.harvard.edu/x

  • R’s C API

  • Run by R Contribution Working Group (RCWG)

RCWG

Fosters a larger, more diverse community of contributors to base R.

Resources: R’s C API

Thank you! Questions?

ella.m.kaye@warwick.ac.uk

ellakaye.github.io/c-for-r-users