Automatic Differentiation in R

Automatic Differentiation (AD) specifically reverse-mode AD takes a function, defined by a computer program and computes the gradient exactly (up to machine precision) and at the same time complexity as the original program.

Here I will show how one can use the Reverse-Mode Automatic Differentiation Library provided by Stan Math Library in R.

Setting Up

The hardest part is making sure that dependencies are resolved during compilation. The relevant headers are in the Rcpp, RcppEigen and StanHeaders packages, so first these packages must be installed:


The headers must be included during compilation, to this end we set PKG_CXXFLAGS system variable.

Sys.setenv("PKG_CXXFLAGS"='-I"YOUR_LIB_PATH/Rcpp/include" -I"YOUR_LIB_PATH/RcppEigen/include" -I"YOUR_LIB_PATH/StanHeaders/include"')

Where YOUR_LIB_PATH should be replaced with the directory where R packages are installed on your system, which can be found by callind .libPaths(). Note that if you are developing a package it is enough to add the following line to the DESCRIPTION file:

LinkingTo: Rcpp (>= 0.12.7), RcppEigen, StanHeaders (>= 2.12.0)

Reverse-Mode AD

Now we are ready to see AD in action. The following function computes the log density function of a normal distrubted variable , and its derivative with respect to and .

#include <Rcpp.h>
#include <stan/math.hpp>

using namespace Rcpp;

// [[Rcpp::export]]
NumericVector normLogLikAD(double y_, double mu_, double sigma_) {
  NumericVector ret(4);
  stan::math::var y = y_;
  stan::math::var mu = mu_;
  stan::math::var sigma = sigma_;
  stan::math::var lp = 0;
  lp += -0.5* ((y-mu)/sigma) * ((y-mu)/sigma) - log(sigma) - 0.5 * log(2*PI);
  // Compute Function Value
  ret[0] = lp.val();
  // Compute Gradient
  ret[1] = y.adj();
  ret[2] = mu.adj();
  ret[3] = sigma.adj();
  // Memory is allocated on a global stack
  return ret;

The function is now callable from R.

## [1] -1.737086  0.250000 -0.250000 -0.375000

Compare this with the gradient found using numerical differentiation:

grad(func = function(par){dnorm(par[1],par[2],par[3], log=T)}, c(2,3,2))
## [1]  0.250 -0.250 -0.375

The C++ code is mostly straightforward: Variables are declared as stan::math::var all other variables are treated as constants. To compute the actual value of a variable we call the member-function val(), to compute the gradient of a variable y with respect to some variables x1,x2 we call y’s member function grad(), and the partial derivative can be obtained by calling x1’s member function adj(). Finally it should be noted that stan::math allocates memory for variables on a global stack, that means memory is not automatically freed up when the variables goes out of scope. The function recover_memory() clears this stack and ChainableStack::memalloc_.free_all() frees the memory associated with it, both should be called after finish computation.


Stan Math Library

blog comments powered by Disqus