r - How to elegantly + robustly cache external script in knitr rmd document? -


say, have external r script external.r:

df.rand <- data.frame(rnorm(n = 100), rnorm(n = 100)) 

then there's main.rmd:

\documentclass{article}  \begin{document}  <<setup, include = false>>= library(knitr) library(ggplot2) # global chunk options opts_chunk$set(cache=true, autodep=true, concordance=true, progress=true, cache.extra = tools::md5sum("external.r")) @  <<source, include=false>>= source("external.r") @   <<plot>>= ggplot(data = df.rand, mapping = aes(x = x, y = y)) + geom_point() @  \end{document} 

it's helpful have in external script, because in reality, it's bunch of import, data cleaning , simulation tasks pollute main.rmd.

any chunks in main.rmd depend on changes in external script. account dependency added above cache.extra = tools::md5sum("external.r").

that seems work ok.

i'm looking best practices.

  • is robust (enough)?
  • is there more elegant way this? (for example, it's unfortunate any change in external.r trigger complete cache invalidation, rather invalidating objects actually change).

there no side effects (except library()calls, can move them main.rmd).

i'm worried i'm somehow doing wrong.

there should better approaches do-it-yourself caching use. start with, split external.r chunks:

# ---- createrandomdfs---- df.rand1 <- data.frame(rnorm(n = 100), rnorm(n = 100)) df.rand2 <- data.frame(rnorm(n = 100), rnorm(n = 100))  # ---- createotherobjects----  # stuff 

in main.rmd, add (in uncached chunk!) read_chunk(path = 'external.r'). execute chunks:

<<createrandomdfs>>= @ <<createotherobjects>>= @ 

if autodep doesn't work, add dependson chunks. chunk uses df.rand1 , df.rand2 gets dependson = "createrandomdfs"; when other objects used, set dependson = c("createrandomdfs", "createotherobjects").

you may invalidate chunk's cache when object changes: cache.whatever = quote(df.rand1).

this way, avoid invalidating whole cache any change in external.r. crucial how split code in file chunks: if use many chunks, have list many dependencies; if use few chunks, cache gets invalidated more/too often.


Comments