
Automatic Differentiation (AD) specifically reverse-mode AD takes a function, defined by a computer program and computes the gradient exactly (up to machine precision) and at the same time complexity as the original program. Here I will show how one can use the Reverse-Mode Automatic Differentiation Library provided by Stan Math Library in R.
The hardest part is making sure that dependencies are resolved during compilation.
The relevant headers are in the Rcpp, RcppEigen and StanHeaders packages, so first these packages must be installed:
install.packages(c("Rcpp","RcppEigen","StanHeaders"))The headers must be included during compilation, to this end we set PKG_CXXFLAGS system variable.
Sys.setenv("PKG_CXXFLAGS"='-I"YOUR_LIB_PATH/Rcpp/include" -I"YOUR_LIB_PATH/RcppEigen/include" -I"YOUR_LIB_PATH/StanHeaders/include"')Where YOUR_LIB_PATH should be replaced with the directory where R packages are installed on your system, which can be found by calling .libPaths().
Note that if you are developing a package it is enough to add the following line to the DESCRIPTION file:
LinkingTo: Rcpp (>= 0.12.7), RcppEigen, StanHeaders (>= 2.12.0)Now we are ready to see AD in action. The following function computes the log density function of a normal distrubted variable , and its derivative with respect to and .
#include <Rcpp.h>
#include <stan/math.hpp>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector normLogLikAD(double y_, double mu_, double sigma_) {
  NumericVector ret(4);
  
  stan::math::var y = y_;
  stan::math::var mu = mu_;
  stan::math::var sigma = sigma_;
  stan::math::var lp = 0;
  
  lp += -0.5* ((y-mu)/sigma) * ((y-mu)/sigma) - log(sigma) - 0.5 * log(2*PI);
  
  // Compute Function Value
  ret[0] = lp.val();
  
  // Compute Gradient
  lp.grad();
  ret[1] = y.adj();
  ret[2] = mu.adj();
  ret[3] = sigma.adj();
  
  // Memory is allocated on a global stack
  stan::math::recover_memory();
  stan::math::ChainableStack::memalloc_.free_all();
  
  return ret;
}The function is now callable from R.
normLogLikAD(2,3,2)## [1] -1.737086  0.250000 -0.250000 -0.375000Compare this with the gradient found using numerical differentiation:
library(numDeriv)
grad(func = function(par){dnorm(par[1],par[2],par[3], log=T)}, c(2,3,2))## [1]  0.250 -0.250 -0.375The C++ code is mostly straightforward: Variables are declared as stan::math::var all other variables are treated as constants.
To compute the actual value of a variable we call the member-function val(),
to compute the gradient of a variable y with respect to some variables x1,x2 we call y’s member function grad(),
and the partial derivative  can be obtained by calling x1’s member function adj().
Finally it should be noted that stan::math allocates memory for variables on a global stack,
that means memory is not automatically freed up when the variables goes out of scope.
The function recover_memory() clears this stack and ChainableStack::memalloc_.free_all() frees the memory associated with it,
both should be called after finish computation.