Abadi, Lampson, and Levy - Analysis and Caching of Dependencies

Abstract

We address the problem of dependency analysis and caching in the context of the A-calculus. The dependencies of a A-term are (roughly) the parts of the A-term that contribute to the result of evaluating it. We introduce a mechanism for keeping track of dependencies, and discuss how to use these dependencies in caching.

1 Introduction

Suppose that we have evaluated the function application I(1, 2), and that its result is 7. If we cache the equality I(1,2) = 7, we may save ourselves the work of evaluating I(1, 2) in the future. Suppose further that, in the course of evaluating I(1,2), we noticed that the first argument of I was not accessed at all. Then we can make a more general cache entry: I(ri, 2) = 7 for all ii. In call-by-name evaluation, we may not even care about whether n is defined or not. Later, if asked about the result of I(2,2), for example, we may match I(2,2) against our cache entry, and deduce that I(2,2) = 7 without having to compute I.

There are three parts in this caching scheme: (i) the dependency analysis (in this case, noticing that I did not use its first argument in the course of the computation); (ii) writing down dependency information, in some way, and caching it; (iii) the cache lookup. Each of the parts can be complex. However, the caching scheme is worthwhile if the computation of I is expensive and if we expect to encounter several similar inputs (e.g., I(1,2), I(2,2), . . . ).

We address the problem of dependency analysis and caching in the context of the A-calculus. We introduce a mechanism for keeping track of dependencies, and show how to use these dependencies in caching. (However, we stop short of considering issues of cache organization, replacement policy, etc.) Our techniques apply to programs with higher-order functions, and not just to trivial first-order examples like I(1, 2). The presence of higher-order functions creates the need for sophisticated dependency propagation.

As an example, consider the higher-order function:

I c =Ax.Ay.fst(x(fst(y))(snd(y)))

where pairs are encoded as usual:

(a,b> c =Ax.x(a)(b) fst c =Ap.p(An.Az.n) snd c =Ap.p(An.Az.z)

The function I takes two arguments x and y; presumably x is a function and y is a pair. The function I applies x to the first and second components of y, and then extracts the first component of the result. A priori, it may seem that I depends on x and on all of y. Consider now the following arguments for I:

=An.Az.(z,n> r c

c =(1,2>

g^'c= An.Az.(z,(n,z>> r' c = (2,2>

Both functions g and g^' seem to depend on their respective arguments. However, all these a priori expectations are too coarse. After evaluating I(g)(r) to 2, we can deduce that I(g^')(r^') also yields 2. For this we need to express that I accesses only part of the pair that g produces, that g accesses only part of the pair that I feeds it, and that g and g^' look sufficiently similar. We develop a simple way of capturing and of exploiting these fairly elaborate dependencies.

Our approach is based on a labelled A-calculus [Lév78]. Roughly, our labelled A-calculus is like a A-calculus with names for subexpressions. In the course of computation, the names propagate, and some of them end up in the result. If a reduces to v, then v will contain the names of the subexpressions of a that contribute to producing v. Then, if we are given a^' that coincides with a on those subexpressions, we may deduce that a^' reduces to v.

In our example, we would proceed as follows. First, when given the expression I(g)(r), we would label some of its subexpressions. The more labels we use, the more information we obtain. In this example, which is still relatively simple, we label only components of g and r:

g^A =An.Az.(e0:z,e1:n> rA c

c =(e2:1,e3:2>

where e0, e1, e2, and e3 are distinct labels. We extend the reduction rules of the A-calculus to handle labels; in this case, I(^Ag)(^Ar) reduces to e0:e3:2. Stripping off all the labels, we can deduce that I(g)(r) reduces to 2. Studying the labels, we may notice that e1 and e2 do not appear in the result. As we will prove, this means that I(g^* ) (r^*) reduces to 2 for any expressions g^* and r^* of the forms:

g* c =An.Az.(z, > r* c = ( ,2>

Obviously, g^' and r^' match this pattern, and hence f(g^')(r^') reduces to 2. As this small example suggests, our techniques for dependency analysis are effective, reasonably efficient, and hence potentially practical.

In the next section we review the background for our work and some related work. In section 3, we study dependency analysis and caching in the pure A-calculus. In sections 4, we extend our techniques to a more realistic language; this language includes records and has a weak operational semantics based on explicit substitutions [ACCL91, Fie9O].

2 Motivation and Related Work

The motivation for this work arose in the context of a system-modelling system called Vesta [LM93, HL93]—roughly a replacement for tools like make and rcs. In Vesta, the analogue of a makefile is a program written in a specialized, untyped, higher-order, functional, lazy, interpreted language. The functional character of the language guarantees that the results of system building are predictable and reproducible.

In Vesta, the basic computation steps are expensive calls to functions like compile and link; hence it is important to avoid unnecessary recomputations. The programs can be reasonably large; it is therefore desirable to notice cache hits for large subexpressions rather than for individual calls to primitives (e.g., individual compilations). Furthermore, irrelevant changes in parameters are expected to be frequent; when there are such changes, a simple memoisation [Mic68, Hug85] does not suffice for avoiding recomputations, and a more savvy caching strategy is necessary.

This paper, however, is not about Vesta. There has been some research on caching in Vesta [HL93], and more is currently in progress. Here we discuss techniques for the A-calculus; these are somewhat simpler, easier to explain, and perhaps of more general interest.

In the A-calculus, the work that seems most closely related to ours is that of Field and Teitelbaum [FT9O]. They have investigated the problem of reductions of similar expressions (which may not even yield the same result). Their approach is based on a A-calculus with a new "fork" primitive (Lx) rather than on a labelled A-calculus. For example, they can represent the two similar expressions b(a) and b^' (a) as the single expression Lx(b, b^')(a), with a rule for duplication, namely Lx(b, b^')(a) = Lx(b(a), b^'(a)). Their algorithm seems particularly appropriate for dealing with pairs of expressions that differ at only one or a few predictable subexpressions.

There has been much other work on incremental computation, and some of it is related to ours. In particular, Pugh's dissertation concerns incremental evaluation of functional programs; it raises the issue of caching for functions that do not depend on all of their arguments [Pug88

, pp. 70—

71].

Dependency analysis is also similar to traditional analyses such as strictness analysis (e.g., [BHA86]). There is even a recent version of strictness analysis that relies on a labelled A-calculus [GVS95]. Strictness analysis is concerned with what parts of a program must be evaluated; in contrast, for doing cache lookups, we need to know what parts of a program may affect the result. Furthermore, we do not use approximate abstract interpretations, but rather rely on previous, actual executions of programs similar to the one being analyzed.3 Dependencies in the Pure A-calculus

In this section we consider incremental computation in the context of the pure A-calculus. This is a minimal system, but it enables us to illustrate our ideas. First we review some classical results that suggest an approach to dependency analysis; then we describe a labelled calculus, a basic scheme for caching, some examples, and a more sophisticated scheme for caching; finally, we consider call-by-value evaluation.

3.1 The A-calculus

The standard A-calculus has the following grammar for expressions:

a,b,c ::== terms

M x variable (x E V)

M Ax.a abstraction (x E V)

M b(a) application

where V is a set of variables.
The /3 rule is, as usual:

(Ax.b)a -* b{a/x}

where b{a/x} is the result of replacing x with a in b. When C is a context (a term with a hole), we write C{a} for the result of filling C's hole with a (possibly with variable captures). We adopt the following congruence rule:

a-*b

C{a} -* C{b}

The reduction relation -*" is the reflexive, transitive closure of -*. A computation stops when it reaches a normal form.

We can now reformulate the problem posed in the introduction. Suppose that a is a term and a -*" v. When can we say that b -*" v simply by comparing a and b? In order to address this question, we recall a few known theorems.

Theorem 1 (Church-Rosser) The calculus is confluent.

Theorem 2 (Normalization) If a term can be reduced to a normal form, then its leftmost outermost reduction (which reduces the leftmost outermost redex at each step) reaches this normal form.

Clearly the leftmost outermost reduction reduces only subexpressions necessary to get to the normal form.

A prefix is an expression possibly with several missing subexpressions:

a, b, c ::==prefixes M hole

M x variable (x E V)

M Ax.a abstraction (x E V)

Mb(a) application

A prefix a is prefix of another prefix (or expression) b if a matches b except in some holes; we write a -< b. For instance, we have that (x)( )(Ay. (y)) -< y(x)(x)(Ay. (y)). For the purposes of reduction, we treat like a free variable; for example, (Ax.x( ))(a) -* a( ).

The following three results concern the prefix ordering and reduction:

Proposition 1 (Maximality of terms) If b -< d and b is a term, then b = d.

Proposition 2 (Monotonicity) If a, b, and c are prefixes, a —*" b, and a -< c, then there exists a prefix d such

that c —*" d and b -< d.

Theorem 3 (Stability) If a is a term, v is a term in normal form, and a —*" v, then there is a minimum prefix

ao -< a such that ao —*" v.

Proof The stability theorem follows from the stability of Böhm trees [Ber78]. Here we sketch a simple, alternative proof.

First we show that if a and b are compatible (have a common upper bound) in the prefix ordering, and a and b reduce to a term v in normal form, then the greatest lower bound of a and b (written a A b) also reduces to v. The proof is by induction on the lengths of the leftmost outermost reductions to v, and secondarily on the sizes of a and b. We proceed by cases on the form of a.

· Ifa=x,thenv=xandb=x,so(aAb)=v.

· If a = Ax.al, then b is of the form Ax.bl, with al and bl compatible. The result follows by induction hypothesis.

· If a = x(al). .. (an), then b is of the form x(bl). .. (bn) with a, and b, compatible for each i E 1. .n. The result follows by induction hypothesis.

· The cases where a = or a = (al) ... (an) are impossible, since a reduces to a term in normal form.

· Finally, if a = (Ax.al)(a2).. . (an), then b is of the form (Ax.bl)(b2). .. (bn). Let a^' = al{a2/x}(a3).. . (an) and b' = bl{b2/x}(b3)... (bn); a^' and b^' are compatible, and they reduce to v with shorter leftmost outermost reductions than a and b. By induction hypothesis, a' _Ab^' reduces to v. Since a A b reduces to a^' A b^', we obtain that a A b reduces to v by transitivity.

Now suppose that a and v are as indicated in the statement of the theorem. The prefixes of a that reduce to v are compatible, since they have a as upper bound; their greatest lower bound is the prefix ao described in the statement. E

We can give a first solution to our problem, as follows. Suppose that a —*" v and v is a term in normal form. Let ao be the minimum prefix of a such that ao —*" v, given by Theorem 3. By Proposition 2, if ao is a prefix of b then b —*" v' for some v^' such that v is a prefix of v^'; by Proposition 1, v' is v. Therefore, if ao is a prefix of b then we can reuse the computation a —*" v and conclude that b —*" v.

It remains for us to compute ao. As we will show, this computation can be performed at the same time as we evaluate a, and does not require much additional work. Intuitively, we will mark every subexpression of a necessary to compute v along the leftmost outermost reduction.

3.2 A labelled A-calculus

In order to compute minimum prefixes as discussed above, we follow the underlined method of Barendregt [Bar84], generalized by use of labels as in the work of Field, Levy, or Maranget [Fie90, Lév78, Mar91]. Our application of this

method gives rise to a new labelled calculus, which we define next.

We consider a A-calculus with the following extended grammar for expressions:

a,b,c ::== terms

M . . . as in section 3.1

M e:a labelled term (e E E)

where E is a set of labels.

There is one new one-step reduction rule:

(e:b)(a) —* e:(b(a))

The essential purpose of this rule is to move labels outwards as little as possible in order to permit /3 reduction. For example, (eo:(Ax.x(x)))(el :y) reduces to eo:((Ax.x(x))(el :y)) via the new rule, and then yields eo:((el:y)(el:y)) by the /3 rule.

There are clear correspondences between the unlabelled calculus and the labelled calculus. When a^' is a labelled term, let strip (a') be the unlabelled term obtained by removing every label in a^'. We have:

Proposition 3 (Simulation) Let a, b be terms, and let a^', b' be labelled terms.

· If a^' —* b^', then strip (a')—*^" strip(b').

· If a = strip(a') and a —* b, then a^' —*" b^' for some ^b'such that b = strip(b').

The labelled calculus enjoys the same fundamental theorems as the unlabelled calculus: confluence, normalization, and stability. The confluence theorem follows from Klop's dissertation work, because the labelled calculus is a regular combinatory reduction systems [Klo80]; the labelled calculus is left-linear and without critical pairs. The normalization theorem can also be derived from Klop's work; alternatively it can be obtained from results about abstract reductions systems [GLM92], via O'Donnell's notion of left systems [O'D77]. The proof of the stability theorem is similar to the one in [HL91].

3.3 Basic caching

Suppose that a —*" v, where a is a term and v is its normal form. Put a different label on every subexpression of a, obtaining a labelled term a^'. By Proposition 3, a^' —*" v^' for some v^' such that v = strip(v'). Consider all the labels in v^'; to each of these labels corresponds a subterm of a^' and thus of a. Let G(a) be a prefix obtained from a by replacing with each subterm whose label does not appear in v^'. We can prove that G(a) is well-defined. In particular, the value of G(a) does not depend on the choice of a^' or v^'; and if the label for a subterm of a appears in v^' then so do the labels for all subterms that contain it.

Whena —*" v, wemaycachethepair (G(a),v). Whenwe consider a new term b, it is sufficient to check that G(a) -< b in order to produce v as the result of b. As G(a) is the part of a sufficient to get v (what we called ao in section 3.1), we obtain the following theorem:

Theorem 4 If a is a term, v is a term in normal form,

a —*" v, and G(a) -< b, then b —*" v.

Theorem 4 supports a simple caching strategy. In this strategy, we maintain a cache with the following invariants:

· the cache is a set of pairs (ao, v), consisting each of an unlabelled prefix ao and an unlabelled term v in normal form;

· if (ao,v) is in the cache and ao -< b then b * v.

Therefore, whenever we know that v is the normal form of a, we may add to the cache the pair (G(a), v). Theorem 4 implies that this preserves the cache invariants.

Suppose that a is a term without labels. In order to evaluate a, we do:

· if there is a cache entry (ao, v) such that ao -< a, then return v;

· otherwise:

- let a^' be the result of adding distinct labels to a, at every subexpression;

- suppose that, by reduction, we find that ^a'* _v'for v^' in normal form;

- let v = strip(v') and ao = G(a);

- optionally, add the entry (ao, v) to the cache;

- return v.

Both cases preserve the cache invariants. In both, the v returned is such that a * v.

In a refinement of this scheme, we may put labels at only some subexpressions of a. In this case, we replace with a subexpression of a only if this subexpression was labelled before reduction. The more labels we use, the more general the prefix obtained; this results in better cache entries, at a moderate cost. However, in examples, we prefer to use few labels in order to enhance readability.

Another refinement of the scheme consists in caching pairs of labelled prefixes and results. The advantage of not stripping the labels is that the cache records the precise dependencies of results on prefixes. We return to this subject in section 3.5.

3.4 Examples

The machinery that we have developed so far handles the example of the introduction (the term I(g)(r)). We leave the step-by-step calculation for that example as an exercise to the reader. As that example illustrates, pairing behaves nicely, in the sense that fst(a, b> depends only on a, as one would expect.

As a second example, we show that the Church booleans behave nicely too. The encoding of booleans is as usual:

true =Ax.Ay.x false c = Ax.Ay.y if a then b else c c =a(b)(c)

In the setting of the labelled A-calculus, we obtain as a derived rule that:

if (e:a) then b else c * e:(if a then b else c) It follows from this rule that, for example,

(Ax. if eo:x then ei:y else e2:z)(e3:true) * eo:e3:ei:y

We obtain the unlabelled prefix:

(Ax. if x then y else )(true)

and we can deduce that any expression that matches this prefix reduces to y.

Similar examples arise in the context of Vesta (see section 2 and [HL93]). A simple one is the term:

(if isC(file) then Ccompile else M3compile)(file)

where isC(f) returns true whenever I is a C source file, and file is either a C source file or an M3 source file. If isC(file) returns true, then that term yields Ccompile(file). Using labels, we can easily discover that this result does not depend on the value of M3compile, and hence that it need not be recomputed when that value changes. In fact, even isC(file) and the conditional need not be reevaluated.

In a higher-order variant of this example, the conditional compilation function is passed as an argument:

(Ax. x(file))

(Ay. if isC(y) then Ccompile(y) else M3compile(y))

Our analysis is not disturbed by the higher-order abstraction, and yields the same information.

3.5 Limitations of the basic caching scheme

The basic caching scheme of section 3.3 has some limitations, illustrated by the following two concrete examples. Suppose that we have the cache entry:

((Ax. (snd(x), fst(x)> )((true, false>), (false, true>) Suppose further that we wish to evaluate the term: fst((Ax.(snd(x),fst(x)>)((true, false>))

Immediately the cache entry enables us to reduce this term to fst((false, true>)), and eventually we obtain false. However, in the course of this computation, we have not learned how the result depends on the input. We are unable to make an interesting cache entry for the term we have evaluated. Given the new, similar term

fst((Ax.(snd(x),fst(x)>)((false, false>))

we cannot immediately tell that it yields the same result. As a second example, suppose that we have the cache entry:

(if true then true else ,true)

and that we wish to evaluate the term: not(if true then true else true)

In our basic caching scheme, we would initially label this term, for example as:

not(if true then eo:true else ei:true)

Then we would have to reduce this term, and as part of that task we would have to reduce the subterm

(if true then eo:true else ei:true)

At this point our cache entry would tell us that the sub-term yields true, modulo some labels. We can complete the

reduction, obtaining false, and we can make a trivial cache entry:

(not(if true then true else true), false)

However, we have lost track of which prefix of the input determines the result, and we cannot make the better cache entry:

(not(if true then true else ), false)

The moral from these examples is that cache entries should contain dependency information that indicates how each part of the result depends on each part of the input. One obvious possibility is not to strip the labels of prefixes and results before making cache entries; after all, these labels encode the desired dependency information. We have developed a refinement of the basic caching scheme that does precisely that, but we omit its detailed description in this paper. Next we give another solution to the limitations of the basic caching scheme.

3.6 A more sophisticated caching scheme

In this section we describe another caching scheme. This scheme does not rely directly on the labelled A-calculus, but it can be understood or even implemented in terms of that calculus.

With each reduction a —*" v of a term a, we associate a function d from prefixes of v to prefixes of a. Very roughly, if vO is a prefix of a then d(vO) is the prefix of a that yields vO in the reduction a —*" v. We write a —*^"G v to indicate the function d. This annotated reduction relation is defined by the following rules.

· Reflexivity:

a —* " Ld a

where id is the identity function on prefixes.

· Transitivity:

a—*" _Gb b—*^" G,c

a—*" ~G^, ~G~ c

where d^'; d is the function composition of d^' and d.

· Congruence: Given a function d from prefixes of b to prefixes of a, we define a function C{d} from prefixes of C{b} to prefixes of C{a}. If CO -< C then C{d}(cO) = CO; otherwise, there exists a unique bO -< b such that CO = C{bO}, and we let C{d}(CO) = C{d(bO)}. We

obtain the rule:

a —* " G b

C{a}—*" _C{d}C{b}

· /3: (Ax.b)(a)—*"Gn b{a/x}

where d3( ) = and, for CO =and ~ CO -< b{a/x}, d3(CO) = (Ax.bO)(aO) where aO and bO are the least prefixes such that aO -< a, bO -< b, and CO -< bO{aO/x}.

These rules are an augmentation of the reduction rules of section 3.1, in the following sense:

Proposition 4 • If a —* " G b then a —*" b.

· If a —*" b then a —* " G b for some d.

The rules may seem a little mysterious, but they can be understood in terms of labels. Imagine that every subexpression of a is labelled (with an invisible label), that a —*" G v, and that vO -< v; then d(vO) is the least prefix of a that contains all of the labels that end up in vO.

As an example, consider the term (Ax.x(x))(a), where a is arbitrary. By /3, we have

(Ax.x(x))(a)—*"Gn a(a)

where d3 is such that, for instance, d3( ) is , d3(a(a)) is the entire (Ax.x(x))(a), and d3(a( )) is (Ax.x( ))(a). If we had labelled the initial term (Ax.x(x))(a) before reduction, then the labels that would decorate the result prefix a( ) would be all those of the initial term except for the label of the argument occurrence of x; this justifies that d3 (a( )) be (Ax.x( ))(a).

We obtain:

Theorem 5 If a is a term, a —* " G v, and d(v) -< b, then

b —* " G v.

This theorem gives rise to a new caching scheme. The cache entries in this scheme consist of judgements a —* " G v, where a and v are terms and d is a function from prefixes of v to prefixes of a. The representation of d can be its graph (i.e., a set of pairs of prefixes) or a formal expression (written in terms of id, d3, etc.); it can even be the pair of a labelling of a and a corresponding labelling of v. According to the theorem, whenever we encounter a term b such that d(v) -< b, we may deduce that b —* " G v.

This caching scheme does not suffer from the limitations of the basic one. In particular, each cache entry contains dependency information for every part of the result, rather than for the whole result. Moreover, the rules of inference provide a way of combining dependency information for sub-computations; therefore, we can make an interesting cache entry whenever we do an evaluation, even if we used the cache in the course of the evaluation.

3.7 Call-by-value

So far, we have considered only call-by-name evaluation. Here we define a call-by-value version of the labelled A-calculus, showing that we can adapt our approach to callby-value evaluation. The move from a call-by-name to a call-by-value labelled A-calculus does not affect the basic caching scheme of section 3.3, which remains sound.

The syntax of the call-by-value labelled A-calculus is that given in section 3.2. The /3 rule is restricted to:

(Ax.b)v —* b{v/x}

where v ranges over terms of the form x, x(al) ... (an), or Ax.a; such terms are called values. As in section 3.2, we have a rule for moving labels outwards from the left-hand side of applications:

(e:b)(a) —* e:(b(a))

We have an additional rule for the right-hand side of applications:

(Ax.b)(e:a) —* e:((Ax.b)(a))

One might be tempted to adopt a stronger rule, namely b(e:a) —* e:(b(a)), but this rule creates critical pairs.

With call-by-name evaluation, the term (Ax.b)(a) does not depend on a if x does not occur free in b. In particular, any term that has (Ax.b)( ) as prefix yields the same result. With call-by-value evaluation, on the other hand, the behavior of (Ax.b)(a) depends on a: it depends on whether a can be reduced to a value or not. Therefore, a term may match the prefix (Ax.b)( ) but not yield the same result as (Ax.b)(a).

The treatment of labels in our rules for call-by-value takes into account that (Ax.b)(a) depends on a even if x does not occur free in b. Suppose that a can be reduced to a value v. To see how (Ax.b)(a) depends on a, we add a label in front of a, obtaining (Ax.b)(e:a). Labelled reduction yields e:((Ax.b)(a)), then e:((Ax.b)(v)), and finally e:b. The label e does not disappear, as it would in call-by-name evaluation.

For instance, let us consider the case where a is the value y(z). From a labelled reduction we obtain the prefix (Ax.b)( ( )). Any term that matches this prefix reduces to the same result as (Ax.b)(y(z)). In fact, the evaluation of (Ax.b)(y(z)) reveals that this term depends on the "valueness" of y(z), but does not depend on y(z) in any other way. The prefix (Ax.b)( ( )) does not express this information with full accuracy, but approximates it.

4 Dependencies in a Weak A-calculus with Records

The techniques developed in the previous section are not limited to the pure A-calculus. In this section, we demonstrate their applicability to a more realistic language, with primitive booleans, primitive records, and explicit substitutions. The operational semantics of this language is weak (so function and record closures are not reduced).

4.1 A weak calculus with records

We consider an extended A-calculus with the grammar for terms and for substitutions given in Figure 1. In the grammar, L is a set of names (field names for records) and else is a keyword used for "else" clauses in records and in substitutions. As we show below, these "else" clauses are useful in dependency analysis. We typically think of the "else" clauses as corresponding to run-time errors (missing fields, unbound variables). The term a _+lin an "else" clause can be arbitrary; a term that represents a run-time error will do.

We use the following notation for extending substitutions. Let s be xl = al,.. . , x = a , else = a +l; then (x = a) ^.s is x= a,xl= al,... ,x = a , else = a _+lif x is not among the variables xl,. . . , x , and it is x = a, xl =

al,. . . ,xi_l = a^j_l,xL+l = aL+l,. . . ,x = a , else = a +l

if x is x',.

The one-step reduction rules now use explicit substitutions. They are given in Figure 2. In particular, the analogue for the /3 rule is ((Ax.b)[s])a —* b[(x = a) . s], which extends an explicit substitution; the replacement of a for x happens gradually, through other rules which push the substitution inwards in b.

An active context is a context generated by the grammar of Figure 3. We adopt the following congruence rule: for any active context C,

a—*b

C{a} —* C{b}

Notice that this rule allows us to compute inside substitutions, but not under A, inside records, or in the term part of

closures (since Ax.C, (. . . ,l- = C,. . .>, (. . . , else = C>, and C[s] are not listed as active contexts). The relation —*" is the reflexive, transitive closure of —*.

The prefix ordering for this language is interesting. Let s be the substitution xl = al,. . . x = a , else = a +l, let r be the record (ll = al,. . . , l = a , else = a +l>. We associate with s and r the following functions from variables or field names to prefixes:

[s](x) = { aL if x = xj for some i

a +l otherwise

aL if l = li for some i

{

[r](l) = a +l otherwise

Intuitively, [s](x) is the image of x through s, and [r](l) is the image of l through r. The prefix ordering is as before except for substitutions and records where s -< s^'if [s](x) -< [s^'](x) for all x E V, and r -< r^' if [r](l) -< [r^'](l) for all l E L.

According to this definition, the order of the components of substitutions and records does not matter. In addition, we obtain that, if the "else" clause has a hole, then any other holes can be collapsed into it; for example, the prefixes

(ll = a,l2 = ,else = >, (l3 = ,ll = a, else = >, and (ll = a, else = > are all equivalent.

This A-calculus enjoys the same theorems as the pure A-calculus of section 3.1 (modulo that now -< is actually a pre-order, not an order). These theorems should not be taken for granted, however. Their proofs are less easy, but they can be done by using results on abstract reduction systems [GLM92]. The stability theorem ensures that there is a minimum prefix for obtaining any result; moreover, the maximality and monotonicity propositions are the basis for a caching mechanism.

Finally, we should note that, in this calculus, closures may contain irrelevant bindings. For example, consider the function closure (Ay.y)[x = z, else = w], where z is a variable and w is an arbitrary normal form. This closure reduces only to itself; the irrelevant substitution does not disappear. In this case, we will consider that the result depends on the substitution. We could add rules for minimizing substitutions but, for the sake of simplicity, we do not.

4.2 A weak labelled calculus with records

Following the same approach as in section 3, we define a labelled calculus:

a,b,c ::== terms

M . . . as in section 4.1

M e:a labelled term (e E E)

There are new one-step reduction rules in addition to those of section 4.1:

(e:b)(a) —* e:(b(a)) (e:b)[s] —* e:(b[s])

(e:b).l —* e:(b.l)
if (e:a) then b else c —* e:(if a then b else c)

The grammar for active contexts is extended with one clause:

C ::==active contexts M . . . as in section 4.1

M e:C labelled context (e E E)

a,b,c ::== terms

I x variable (x E V)

I Ax.a abstraction (x E V)

I b(a) application

Ia[s] closure
(l1 = a1, . .. ,l = a , else = a +1) record (l'. EL, distinct)

a.l selection (l E L)

true true

false false

if a then b else c conditional

s ::==x1 = a1,. .. , x = a , else = a +1 substitutions

(x'. E V, distinct)

Figure 1. Grammar for the weak A-calculus.

x[x1 = a1, . .. ,x = a , else = a +1] -* a'. (x = x'.)

x[x1 = a1, . .. ,x = a , else = a +1] -* a +1 (x=all ~x'.)

(b(a))[s] -* b[s](a[s])

((Ax.b)[s])a -* b[(x = a) s]

(b.l)[s] -* (b[s]).l

((l1 = a1, ... , l = a , else = a +1))[s].l -* a'.[s] (l = l'.)

((l1 = a1, . . . ,l = a , else = a +1))[s].l -* a +1[s] (l=all ~l'.)

true[s] -* true false[s] -* false

(if a then b else c)[s] -* if a[s] then b[s] else c[s]

if true then b else c -* b if false then b else c -* c

Figure 2. One-step reduction rules for the weak A-calculus.

C ::== active contexts

I hole

IC(a) application (left)

I b(C) application (right)

Ia[S] closure

I C.l selection (l E L)

I if C then b else c conditional (guard)

I if a then C else c conditional (then)

I if a then b else C conditional (else)

S ::== x1 = a1,. . . , x'. = C'.,. . . x = a , else = a +1 substitutions

As usual, the congruence rule permits reduction in any ac‑

tive context:

a—*b

C{a} —* C{b}

for any active context C.

4.3 Dependency analysis and caching (by example)

The labelled calculus provides a basis for dependency analysis and caching. The sequence of definitions and results would be much as in section 3. We do not go through it, but rather give one instructive example.

We consider the term:

((Ax.x.li)(li =yi,l2 =y2,else= w)) [yi = zi,y2 = z2, else = w]

This term yields zi. We label the term, obtaining:

((Ax. x.li)(li = (ei:yi), l2 = (e2:y2), else = (e3:w)))

[yi = (e4:zi),y2 = (e5:z2), else = (e6:w)]

This labelled term yields ei :e4 :zi, so we immediately conclude that the following prefix also yields zi:

((Ax. x.li)(li = yi,l2 = ,else =))

[yi = zi, y2 =, else =]

Thanks to our definition of the prefix ordering, this prefix is equivalent to:

((Ax.x.li)(li =yi,else= ))[yi =zi,else= ]

Suppose that, in our cache, we record this prefix with the associated result zi; and suppose that later we are given the term:

((Ax.x.li)(li =yi,l3 =yi7(yi7),else= w^')) [yi7 = zi,yi = zi, else = w^']

This term matches the prefix in the cache entry, so we immediately deduce that it reduces to zi.

As this example illustrates, the labelled reductions help us identify irrelevant components of both substitutions and records. The prefix ordering and the use of else then allow us to delete those irrelevant components and to add new irrelevant components.

In some applications, irrelevant components may be common. For example, in the context of Vesta, a large record may bundle compiler switches, environment variables, etc.; for many computations, most of these components are irrelevant. In such situations, the ability to detect and to ignore irrelevant components is quite useful—it means more cache hits.

5 Conclusions

We have developed techniques for caching in higher-order functional languages. Our approach relies on using dependency information from previous executions in addition to the outputs of those executions. This dependency information is readily available and easy to exploit (once the proper tools are in place); it yields results that could be difficult to obtain completely statically. The techniques are based on a labelled A-calculus and, despite their pragmatic simplicity, benefit from a substantial body of theory.

Acknowledgements

We thank Yanhong Liu and Tim Teitelbaum for discussions about related work; Allan Heydon, Jim Horning, Roy Levin, and Yuan Yu for discussions about Vesta; Alain Deutsch and Simon Peyton Jones for comments on strictness analysis; and Georges Gonthier for assorted remarks on a draft of this paper.

References

[ACCL91] M. Abadi, L. Cardelli, P.-L. Curien, and J.-J. Levy. Explicit substitutions. Journal of Functional Programming, 1(4):375-416, 1991.

[Bar84] Henk P. Barendregt. The Lambda Calculus.
North Holland, Revised edition, 1984.

[Ber78] G. Berry. Stable models of typed lambda-calculi. In Proc. 5th Coll. on Automata, Languages and Programming, Lectures Notes in Computer Science, pages 72-89. Springer-Verlag, 1978.

[BHA86] G. L. Burn, C. Hankin, and S. Abramsky. Strictness analysis for higher-order functions. Science of Computer Programming, 7:249-278, 1986.

[Fie90] John Field. On lariiness and optimality in lambda interpreters: Tools for specification and analysis. In Proceedings of the Seventeenth Annual ACM Symposium on Principles of Programming Languages, pages 1-15, 1990.

[FT90] John Field and Tim Teitelbaum. Incremental reduction in the lambda calculus. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming, pages 307-322. ACM, 1990.

[GLM92] Georges Gonthier, Jean-Jacques Levy, and PaulAndré Melli^tes. An abstract standardisation theorem. In Seventh Annual IEEE Symposium on Logic in Computer Science, 1992.

[GVS95] Milind Gandhe, G. Venkatesh, and Amitabha Sanyal. Labeled A-calculus and a generalised notion of strictness. In Asian Computing Science Conference, Lecture Notes in Computer Science. Springer-Verlag, December 1995.

[HL91] Gerard Huet and Jean-Jacques Levy. Computations in Orthogonal Term Rewriting Systems. MIT Press, 1991.

[HL93] Chris Hanna and Roy Levin. The Vesta language for configuration management. Research Report 107, Digital Equipment Corporation, Systems Research Center, June 1993. Available from http://www.research.digital.com/SRC.

[Hug85] John Hughes. Lariy memo-functions. In Jean-Pierre Jouannaud, editor, Functional Programming Languages and Computer Architecture, pages 129-146, September 1985.

[Klo80] Jan Willem Klop. Combinatory Reduction Sys‑
tems. PhD thesis, CWI, 1980.