R’s Scoping

我不是女神ヾ 2022-06-17 05:47 321阅读 0赞

By Breckbaldwin

[*Update: 10 September 2010 I didn’t study Radford Neal’s example closely enough before making an even bigger mess of things. I’d like to blame it on HTML formatting, which garbled Radford’s formatting and destroyed everyone else’s examples, but I was actually just really confused about what was going on in R. So I’m scratching most of the blog entry and my comments, and replacing them with Radford’s example and a pointer to the manual.*]

A Better Mousetrap

There’s been an ongoing discussion among computational statisticians about writing something better than R, in terms of both speed and comprehensibility:

  • Andrew Gelman: The Future of R
  • Julien Cornebise (via Christian Robert): On R Shortcomings
  • Radford Neal: Two Surprising Things about R (following up his earlier series, Design flaws in R)

Radford Neal’s Example

Radford’s example had us define two functions,

  1. > f = function () {
  2. + g = function () a+b
  3. + a = 10
  4. + g()
  5. + }
  6. > h = function () {
  7. + a = 100
  8. + b = 200
  9. + f()
  10. + }
  11. > b=3
  12. > h()
  13. [1] 13

This illustrates what’s going on, assuming you can parse R. I see it, I believe it. The thing to figure out is why a=10 was picked up in the call to g() in f, but b=200 was not picked up in the call to f() in h. Instead, the global assignment b=3 was picked up.

RTFM

Even after I RTFM-ed, I was still confused.

  • Venables, W. N., D. M. Smith and the R Core Development Team. 2010. Introduction to R 2.11.1.

It has a section 10.7 titled “Scope”, but I found their example

  1. cube <- function(n) {
  2. sq <- function() n*n
  3. n*sq()
  4. }

and the following explanation confusing,

The variable n in the function sq is not an argument to that function. Therefore it is a free variable and the scoping rules must be used to ascertain the value that is to be associated with it. Under static scope (S-Plus) the value is that associated with a global variable named n. Under lexical scope (R) it is the parameter to the function cube since that is the active binding for the variable n at the time the function sq was defined. The difference between evaluation in R and evaluation in S-Plus is that S-Plus looks for a global variable called n while R first looks for a variable called n in the environment created when cube was invoked.

I was particularly confused by the “environment created when cube was invoked” part, because I couldn’t reconcile it with Radford’s example.

Let’s consider a slightly simpler example without nested function calls.

  1. > j =10
  2. > f = function(x) j*x
  3. > f(3)
  4. [1] 30
  5. > j =12
  6. > f(3)
  7. [1] 36

This shows it can’t be the value of j at the time f is defined, because it changes when I change j later. I think it’s actually determining how it’s going to find j when it’s defined. If there’s a value of j that’s lexically in scope (not just defined in the current environment), it’ll use that value. If not, it’ll use the environment of the caller. And things that go on in subsequent function definitions and calls, as Radford’s example illustrates, don’t count.

Am I the only one who finds this confusing? At least with all your help, I think I finally understand what R’s doing.

Share this:

  • Share

Marginalizing Latent Variables in EMIn “Bioinformatics”

Contextual Effects and Read Quality in a Probabilistic AlignerIn “Bioinformatics”

Probability Measures and Random VariablesIn “Carp’s Blog”

This entry was posted on September 9, 2010 at 1:12 pm and is filed under Carp’s Blog, Java, Statistics. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

13 Responses to “R’s Scoping”

  1. d582b760aff3a7d7ce1197505fb57837_s_72_d_identicon_r_GAndrew Gelman Says:
    September 9, 2010 at 1:21 pm | Reply

    Hey, Bob–you should be posting this stuff on our main blog now!

    • 77f083909d955b715846250a33340a14_s_72_d_identicon_r_Glingpipe Says:
      September 10, 2010 at 12:27 pm | Reply

      HQ’s still working out brand management issues. I think a post like this one would’ve made sense on your blog. I’ll start posting there soon.

      I’m both excited and intimidated by the size of your audience.

      Luckily, I don’t mind being wrong in public (once per topic). Especially when I can get tutelage from the likes of Radford Neal!

  2. 235138813db5b4332520bc44824aa48e_s_72_d_identicon_r_GKen Williams Says:
    September 9, 2010 at 2:11 pm | Reply

    I believe you’re incorrect about scoping in R, as the following example shows:

    f <- function(x) { y g f(4)
    Error in g(x) : object ‘y’ not found

    As in most languages, it’s possible to create global variables in R, which is what your example shows. However, functions effectively use lexical scope, if you define that as ‘called functions won’t accidentally see my variables’.

    Personally I *love* the R language. I know there’s a lot of talk about redesigning it or replacing it somehow, but I’m skeptical that it’s a good idea.

    • 77f083909d955b715846250a33340a14_s_72_d_identicon_r_Glingpipe Says:
      September 10, 2010 at 11:56 am | Reply

      Thanks. I updated the body of the blog post to point to the comments.

      I think the function definition got garbled somehow (or maybe it’s just an unfamiliar R syntax convention).

  3. 404baaa07624515ba953e48542f77c17_s_72_d_identicon_r_GRadford Neal Says:
    September 9, 2010 at 3:32 pm | Reply

    You’re wrong about R’s scoping rules. It uses lexical scoping.

    Here’s an example demonstrating this:

    f = function ()
    + { g = function () a+b
    + a = 10
    + g()
    + }

    h = function ()
    + { a = 100
    + b = 200
    + f()
    + }

    b = 3
    print(h())
    [1] 13

    The expression a+b is evaluated with b from the global environment, and a from the lexically enclosing environment of g. The b inside h is not seen even though with dynamic scoping it would take precedence over the global b.

  4. 478e998264ce53671f054b5a742bd692_s_72_d_identicon_r_GRob V. Says:
    September 10, 2010 at 1:25 am | Reply

    Looks like you’ve tripped over lambda calculus and closures, things that are extremely common in many languages (particularly functional languages) but NOT in the world of Java and C derivatives. This is one of the best features of Javascript, in my opinion far more useful than the prototyping that gets more attention. And one of the most obvious shortcomings in Java (although generics was a nice alternative that reduced the need for closures in some cases). Even Java’s granddaddy, Smalltalk, has these features. Perhaps the confusion (between your interpretation of the problem and Radford’s) stems from something akin to Javascripts slightly flawed implementation of closures whereby variables in the topmost scope are actually global but all other variables are properly scoped.

    • 77f083909d955b715846250a33340a14_s_72_d_identicon_r_Glingpipe Says:
      September 10, 2010 at 11:54 am | Reply

      Ironic, given that I used to teach programming language theory and write about denotational semantics! And I got my feet wet in professional programming by integrating the C implementation of Javascript (ECMAScript, technically) into SpeechWorks’s semantic interpreter!!!

      As you say, there’s really nothing like a closure in C or Java. About as close as I get is writing search algorithms with a continuation-passing style.

  5. 235138813db5b4332520bc44824aa48e_s_72_d_identicon_r_GKen Williams Says:
    September 10, 2010 at 10:04 am | Reply

    Here’s an even simpler example:

    f <- function(x) { y g f(4)
    Error in g(x) : object ‘y’ not found

  6. 484f70e21a3d3480e013519f8236bb86_s_72_d_identicon_r_GNick Says:
    September 10, 2010 at 4:06 pm | Reply

    Super-simple example of lexical scoping in R:

    x g f <- function() {x f()
    [1] “A”

    If R was dynamically scoped, the ‘x’ in g() would take its value from the calling environment, where it is ‘B’. However, because R is lexically scope, it comes from the environment where g() is defined, where it is ‘A’.

  7. 484f70e21a3d3480e013519f8236bb86_s_72_d_identicon_r_GNick Says:
    September 10, 2010 at 6:43 pm | Reply

    This is also why I’m still unclear about Radford’s example, becuase the a=10
    was part of the environment when g() was called in h, but b=200 was not part
    of the environment when f() was called in h.

    The difference is that a=10 is part of the environment where g() was DEFINED in f. But the b=200 is not part of the environment where f() is DEFINED. That unbound variables take their values from the defining, rather than calling, environment is what makes R (and most other languages) lexically scoped.

  8. 484f70e21a3d3480e013519f8236bb86_s_72_d_identicon_r_GNick Says:
    September 10, 2010 at 6:53 pm | Reply

    This shows it can’t be the value of j at the time f is defined, because
    it changes when I change j later. I think it’s actually determining how
    it’s going to find j when it’s defined.

    Right. This example is no more mysterious than referencing an instance variable in java. If the variable’s value is changed, then subsequent references will see this change. In your example, f() and j are defined in the same environment. This is where the free variable j in f() is bound. When you change j’s value in that environment, f() picks it up.

    • 77f083909d955b715846250a33340a14_s_72_d_identicon_r_Glingpipe Says:
      September 12, 2010 at 6:45 pm | Reply

      Thanks for the explanation in the previous comment.

      Java’s bit more restrictive. For instance, you can’t copy the R style and write:

      1. interface Foo { public int foo(); }
      2. public static void main(String[] args) {
      3. Foo f = new Foo() {
      4. public int foo() {
      5. return a;
      6. };
      7. };
      8. int a = 10;
      9. System.out.println(f.foo());
      10. }

      You have to declare the variable a to be a static class variable, or you have to define a local variable before the anonymous inner class and declare it final.

      And there’s no way to do the equivalent of R’s attaching a list, which promotes a data structure to local variable. Turns out that doesn’t quite work the way I was thinking it did in R, either. For instance,

      1. > f = function() { a }
      2. > f()
      3. Error in f() : object 'a' not found
      4. > a = 12
      5. > f()
      6. [1] 12
      7. > b = list(a = 5)
      8. > attach(b)
      9. The following object(s) are masked _by_ .GlobalEnv :
      10. a
      11. > f()
      12. [1] 12

      but it works if there’s not already a value.

      1. > k = function() { m }
      2. > k()
      3. Error in k() : object 'm' not found
      4. > j = list(m = 5)
      5. > attach(j)
      6. > k()
      7. [1] 5
      8. > m = 10
      9. > k()
      10. [1] 10
      11. > attach(j)
      12. The following object(s) are masked _by_ .GlobalEnv :
      13. m
  1. The following object(s) are masked from j ( position 3 ) :
  2. m
  3. > k()
  4. [1] 10
  1. 77f083909d955b715846250a33340a14_s_72_d_identicon_r_Glingpipe Says:
    September 13, 2010 at 12:09 pm | Reply

    From Christian Robert’s latest blog post on R, Simply Start Over and Build Something Better, I found this amazing snippet:

    One of the worst problems is scoping. Consider the following little gem.

    1. f =function() {
    2. if (runif(1) > .5)
    3. x = 10
    4. x
    5. }

    The x being returned by this function is randomly local or global.

    Cool!

发表评论

表情:
评论列表 (有 0 条评论,321人围观)

还没有评论,来说两句吧...

相关阅读

    相关 RS485

    RS485是差分信号,A、B电压差值代表0或1,这也意味着发送(接收),都需要A、B同时参与,它是半双工的。 硬件上,可以电路上实现自动收发切换,也可通过一个GPIO控制,实