Non-standard evaluation

﹏ヽ暗。殇╰゛Y 2022-07-24 07:16 254阅读 0赞

Dplyr uses non-standard evaluation (NSE) in all of the most important single table verbs: filter(), mutate(),summarise(), arrange(), select() and group_by(). NSE is important not only to save you typing, but for database backends, is what makes it possible to translate your R code to SQL. However, while NSE is great for interactive use it’s hard to program with. This vignette describes how you can opt out of NSE in dplyr, and instead rely only on SE (along with a little quoting).

Behind the scenes, NSE is powered by the lazyeval package. The goal is to provide an approach to NSE that you can learn once and then apply in many places (dplyr is the first of my packages to use this approach, but over time I will adopt it everywhere). You may want to read the lazyeval vignettes, if you like to learn more about the underlying details, or if you’d like to use this approach in your own packages.

Standard evaluation basics

Every function in dplyr that uses NSE also has a version that uses SE. There’s a consistent naming scheme: the SE is the NSE name with _ on the end. For example, the SE version of summarise() is summarise_(), the SE version of arrange() is arrange_(). These functions work very similarly to their NSE cousins, but the inputs must be “quoted”:

  1. # NSE version:
  2. summarise(mtcars, mean(mpg))
  3. #> mean(mpg)
  4. #> 1 20.09062
  5. # SE versions:
  6. summarise_(mtcars, ~mean(mpg))
  7. #> mean(mpg)
  8. #> 1 20.09062
  9. summarise_(mtcars, quote(mean(mpg)))
  10. #> mean(mpg)
  11. #> 1 20.09062
  12. summarise_(mtcars, "mean(mpg)")
  13. #> mean(mpg)
  14. #> 1 20.09062

There are three ways to quote inputs that dplyr understands:

  • With a formula, ~ mean(mpg).
  • With quote(), quote(mean(mpg)).
  • As a string: "mean(mpg)".

It’s best to use a formula, because a formula captures both the expression to evaluate, and the environment in which it should be a evaluated. This is important if the expression is a mixture of variables in the data frame and objects in the local environment:

  1. constant1 <- function(n) ~n
  2. summarise_(mtcars, constant1(4))
  3. #> n
  4. #> 1 4
  5. # Using anything other than a formula will fail because it doesn't
  6. # know which environment to look in
  7. constant2 <- function(n) quote(n)
  8. summarise_(mtcars, constant2(4))
  9. #> Error in eval(expr, envir, enclos): binding not found: 'n'

Setting variable names

If you also want to output variables to vary, you need to pass a list of quoted objects to the .dots argument:

  1. n <- 10
  2. dots <- list(~mean(mpg), ~n)
  3. summarise_(mtcars, .dots = dots)
  4. #> mean(mpg) n
  5. #> 1 20.09062 10
  6. summarise_(mtcars, .dots = setNames(dots, c("mean", "count")))
  7. #> mean count
  8. #> 1 20.09062 10

Mixing constants and variables

What if you need to mingle constants and variables? Use the handy lazyeval::interp():

  1. library(lazyeval)
  2. # Interp works with formulas, quoted calls and strings (but formulas are best)
  3. interp(~ x + y, x = 10)
  4. #> ~10 + y
  5. interp(quote(x + y), x = 10)
  6. #> 10 + y
  7. interp("x + y", x = 10)
  8. #> [1] "10 + y"
  9. # Use as.name if you have a character string that gives a variable name
  10. interp(~ mean(var), var = as.name("mpg"))
  11. #> ~mean(mpg)
  12. # or supply the quoted name directly
  13. interp(~ mean(var), var = quote(mpg))
  14. #> ~mean(mpg)

Because every action in R is a function call you can use this same idea to modify functions:

  1. interp(~ f(a, b), f = quote(mean))
  2. #> ~mean(a, b)
  3. interp(~ f(a, b), f = as.name("+"))
  4. #> ~a + b
  5. interp(~ f(a, b), f = quote(`if`))
  6. #> ~if (a) b

If you already have a list of values, use .values:

  1. interp(~ x + y, .values = list(x = 10))
  2. #> ~10 + y
  3. # You can also interpolate variables defined in the current
  4. # environment, but this is a little risky becuase it's easy
  5. # for this to change without you realising
  6. y <- 10
  7. interp(~ x + y, .values = environment())
  8. #> ~x + 10

发表评论

表情:
评论列表 (有 0 条评论,254人围观)

还没有评论,来说两句吧...

相关阅读

    相关 C/C++编程:lazy evaluation

    > 从效率的观点来看,最佳的计算就是根本不计算,那好,不过如果你根本就不用进行计算的话,为什么还在程序开始处加入代码进行计算呢?并且如果你不需要进行计算,那么如何必须执行