R’s Scoping-蒲公英云

By Breckbaldwin

[*Update: 10 September 2010 I didn’t study Radford Neal’s example closely enough before making an even bigger mess of things. I’d like to blame it on HTML formatting, which garbled Radford’s formatting and destroyed everyone else’s examples, but I was actually just really confused about what was going on in R. So I’m scratching most of the blog entry and my comments, and replacing them with Radford’s example and a pointer to the manual.*]

A Better Mousetrap

There’s been an ongoing discussion among computational statisticians about writing something better than R, in terms of both speed and comprehensibility:

Andrew Gelman: The Future of R
Julien Cornebise (via Christian Robert): On R Shortcomings
Radford Neal: Two Surprising Things about R (following up his earlier series, Design flaws in R)

Radford Neal’s Example

Radford’s example had us define two functions,

> f = function () { 
+     g = function () a+b
+     a = 10
+     g()
+ }
> h = function () { 
+     a = 100
+     b = 200
+     f()
+ }
> b=3
> h()
[1] 13

This illustrates what’s going on, assuming you can parse R. I see it, I believe it. The thing to figure out is why a=10 was picked up in the call to g() in f, but b=200 was not picked up in the call to f() in h. Instead, the global assignment b=3 was picked up.

RTFM

Even after I RTFM-ed, I was still confused.

Venables, W. N., D. M. Smith and the R Core Development Team. 2010. Introduction to R 2.11.1.

It has a section 10.7 titled “Scope”, but I found their example

cube <- function(n) {
    sq <- function() n*n
    n*sq()
}

and the following explanation confusing,

The variable n in the function sq is not an argument to that function. Therefore it is a free variable and the scoping rules must be used to ascertain the value that is to be associated with it. Under static scope (S-Plus) the value is that associated with a global variable named n. Under lexical scope (R) it is the parameter to the function cube since that is the active binding for the variable n at the time the function sq was defined. The difference between evaluation in R and evaluation in S-Plus is that S-Plus looks for a global variable called n while R first looks for a variable called n in the environment created when cube was invoked.

I was particularly confused by the “environment created when cube was invoked” part, because I couldn’t reconcile it with Radford’s example.

Let’s consider a slightly simpler example without nested function calls.

> j =10
> f = function(x) j*x
> f(3)
[1] 30
> j =12
> f(3)
[1] 36

This shows it can’t be the value of j at the time f is defined, because it changes when I change j later. I think it’s actually determining how it’s going to find j when it’s defined. If there’s a value of j that’s lexically in scope (not just defined in the current environment), it’ll use that value. If not, it’ll use the environment of the caller. And things that go on in subsequent function definitions and calls, as Radford’s example illustrates, don’t count.

Am I the only one who finds this confusing? At least with all your help, I think I finally understand what R’s doing.

Marginalizing Latent Variables in EMIn “Bioinformatics”

Contextual Effects and Read Quality in a Probabilistic AlignerIn “Bioinformatics”

Probability Measures and Random VariablesIn “Carp’s Blog”

This entry was posted on September 9, 2010 at 1:12 pm and is filed under Carp’s Blog, Java, Statistics. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

13 Responses to “R’s Scoping”

Andrew Gelman Says:
September 9, 2010 at 1:21 pm | Reply

Hey, Bob–you should be posting this stuff on our main blog now!
- lingpipe Says:
  September 10, 2010 at 12:27 pm | Reply
  
  HQ’s still working out brand management issues. I think a post like this one would’ve made sense on your blog. I’ll start posting there soon.
  
  I’m both excited and intimidated by the size of your audience.
  
  Luckily, I don’t mind being wrong in public (once per topic). Especially when I can get tutelage from the likes of Radford Neal!
Ken Williams Says:
September 9, 2010 at 2:11 pm | Reply

I believe you’re incorrect about scoping in R, as the following example shows:

f <- function(x) { y g f(4)
Error in g(x) : object ‘y’ not found

As in most languages, it’s possible to create global variables in R, which is what your example shows. However, functions effectively use lexical scope, if you define that as ‘called functions won’t accidentally see my variables’.

Personally I *love* the R language. I know there’s a lot of talk about redesigning it or replacing it somehow, but I’m skeptical that it’s a good idea.
- lingpipe Says:
  September 10, 2010 at 11:56 am | Reply
  
  Thanks. I updated the body of the blog post to point to the comments.
  
  I think the function definition got garbled somehow (or maybe it’s just an unfamiliar R syntax convention).
Radford Neal Says:
September 9, 2010 at 3:32 pm | Reply

You’re wrong about R’s scoping rules. It uses lexical scoping.

Here’s an example demonstrating this:

f = function ()
+ { g = function () a+b
+ a = 10
+ g()
+ }

h = function ()
+ { a = 100
+ b = 200
+ f()
+ }

b = 3
print(h())
[1] 13

The expression a+b is evaluated with b from the global environment, and a from the lexically enclosing environment of g. The b inside h is not seen even though with dynamic scoping it would take precedence over the global b.
Rob V. Says:
September 10, 2010 at 1:25 am | Reply

Looks like you’ve tripped over lambda calculus and closures, things that are extremely common in many languages (particularly functional languages) but NOT in the world of Java and C derivatives. This is one of the best features of Javascript, in my opinion far more useful than the prototyping that gets more attention. And one of the most obvious shortcomings in Java (although generics was a nice alternative that reduced the need for closures in some cases). Even Java’s granddaddy, Smalltalk, has these features. Perhaps the confusion (between your interpretation of the problem and Radford’s) stems from something akin to Javascripts slightly flawed implementation of closures whereby variables in the topmost scope are actually global but all other variables are properly scoped.
- lingpipe Says:
  September 10, 2010 at 11:54 am | Reply
  
  Ironic, given that I used to teach programming language theory and write about denotational semantics! And I got my feet wet in professional programming by integrating the C implementation of Javascript (ECMAScript, technically) into SpeechWorks’s semantic interpreter!!!
  
  As you say, there’s really nothing like a closure in C or Java. About as close as I get is writing search algorithms with a continuation-passing style.
Ken Williams Says:
September 10, 2010 at 10:04 am | Reply

Here’s an even simpler example:

f <- function(x) { y g f(4)
Error in g(x) : object ‘y’ not found
Nick Says:
September 10, 2010 at 4:06 pm | Reply

Super-simple example of lexical scoping in R:

x g f <- function() {x f()
[1] “A”

If R was dynamically scoped, the ‘x’ in g() would take its value from the calling environment, where it is ‘B’. However, because R is lexically scope, it comes from the environment where g() is defined, where it is ‘A’.
Nick Says:
September 10, 2010 at 6:43 pm | Reply

This is also why I’m still unclear about Radford’s example, becuase the a=10
was part of the environment when g() was called in h, but b=200 was not part
of the environment when f() was called in h.

The difference is that a=10 is part of the environment where g() was DEFINED in f. But the b=200 is not part of the environment where f() is DEFINED. That unbound variables take their values from the defining, rather than calling, environment is what makes R (and most other languages) lexically scoped.
Nick Says:
September 10, 2010 at 6:53 pm | Reply

This shows it can’t be the value of j at the time f is defined, because
it changes when I change j later. I think it’s actually determining how
it’s going to find j when it’s defined.

Right. This example is no more mysterious than referencing an instance variable in java. If the variable’s value is changed, then subsequent references will see this change. In your example, f() and j are defined in the same environment. This is where the free variable j in f() is bound. When you change j’s value in that environment, f() picks it up.
- lingpipe Says:
  September 12, 2010 at 6:45 pm | Reply
  
  Thanks for the explanation in the previous comment.
  
  Java’s bit more restrictive. For instance, you can’t copy the R style and write:
```
interface Foo { public int foo(); }
    public static void main(String[] args) {
        Foo f = new Foo() {
                public int foo() {
                    return a;
                };
            };
        int a = 10;
        System.out.println(f.foo());
    }
```
  You have to declare the variable a to be a static class variable, or you have to define a local variable before the anonymous inner class and declare it final.
  
  And there’s no way to do the equivalent of R’s attaching a list, which promotes a data structure to local variable. Turns out that doesn’t quite work the way I was thinking it did in R, either. For instance,
```
> f = function() { a }
> f()
Error in f() : object 'a' not found
> a = 12
> f()
[1] 12
> b = list(a = 5)
> attach(b)
        The following object(s) are masked _by_ .GlobalEnv :
         a 
> f()
[1] 12
```
  but it works if there’s not already a value.
```
> k = function() { m }
> k()
Error in k() : object 'm' not found
> j = list(m = 5)
> attach(j)
> k()
[1] 5
> m = 10
> k()
[1] 10
> attach(j)
        The following object(s) are masked _by_ .GlobalEnv :
         m 
```

                The following object(s) are masked from j ( position 3 ) :
                 m 
        > k()
        [1] 10

lingpipe Says:
September 13, 2010 at 12:09 pm | Reply

From Christian Robert’s latest blog post on R, Simply Start Over and Build Something Better, I found this amazing snippet:
One of the worst problems is scoping. Consider the following little gem.
```
f =function() {
      if (runif(1) > .5)
        x = 10
      x
    }
```
The x being returned by this function is randomly local or global.
Cool!