This document was made by OCR
from a scan of the technical report. It has not been edited or proofread and is
not meant for human consumption, but only for search engines. To see the
scanned original, replace OCR.htm with Abstract.htm or Abstract.html in the URL
that got you here.
Analysis and Caching of Dependencies
Martin Abadi Butler Lampson Jean-Jacques Luevy
Digital
Systems Research Center Microsoft INRIA Rocquencourt
rna@pa . dec. corn blarnpson@rnicrosoft.
corn Jean-Jacques.Levy@inria.fr
Abstract
We address the problem of dependency analysis and caching
in
the context of the A-calculus. The dependencies of a A-term are
(roughly) the parts of the A-term that contribute to the result of
evaluating it. We introduce a mechanism for keeping track of dependencies, and
discuss how to use these dependencies
in caching.
1 Introduction
Suppose that we have evaluated the function application
I(1, 2), and that its result is 7. If we cache the equality I(1,2) = 7, we
may save ourselves the work of evaluating I(1, 2) in the future. Suppose
further that, in the course of evaluating I(1,2), we noticed that the first argument
of I was
not accessed at all. Then we can make a more general cache entry:
I(ri, 2) = 7 for all ii. In call-by-name evaluation, we may not even care about
whether n is defined or not. Later, if asked about the result of I(2,2), for
example, we may match I(2,2) against our cache entry, and deduce that I(2,2) = 7 without having to
compute I.
There are three parts in this caching scheme: (i) the dependency
analysis (in this case, noticing that I did not use its first argument in the
course of the computation); (ii) writing down dependency information, in some
way, and caching it; (iii) the cache lookup. Each of the parts can be complex.
However, the caching scheme is worthwhile if the computation of I is expensive
and if we expect to encounter several similar inputs (e.g., I(1,2), I(2,2), . . . ).
We address the problem of dependency analysis and caching
in the context of the A-calculus. We introduce a mechanism for keeping track
of dependencies, and show how to use these dependencies in caching. (However,
we stop short of considering issues of cache organization, replacement policy,
etc.) Our techniques apply to programs with higher-order functions, and
not just to trivial first-order examples like I(1, 2). The presence of
higher-order functions creates the need for sophisticated dependency propagation.
As an example,
consider the higher-order function:
I c =Ax.Ay.fst(x(fst(y))(snd(y)))
where pairs are encoded as usual:
(a,b> c =Ax.x(a)(b) fst c =Ap.p(An.Az.n) snd
c =Ap.p(An.Az.z)
The
function I takes two arguments x and y; presumably x is a function and
y is a pair. The function I applies x to the first and second components of y, and
then extracts the first component of the result. A priori, it may seem that I depends on x and
on all of y. Consider now the following arguments for I:
g |
=An.Az.(z,n> r c
c =(1,2>
g'c= An.Az.(z,(n,z>> r' c = (2,2>
Both
functions g and g' seem to depend on their respective arguments.
However, all these a priori expectations are too coarse. After evaluating I(g)(r)
to 2, we can deduce that I(g')(r') also yields 2. For this
we need to express that I accesses only part of the pair that g produces, that
g accesses only part of the pair that I feeds it, and that g and g' look
sufficiently
similar. We develop a simple way of capturing and of exploiting these fairly
elaborate dependencies.
Our approach is based on a labelled A-calculus [Lév78]. Roughly, our
labelled A-calculus is like a A-calculus with names for subexpressions. In the
course of computation, the names propagate, and some of them end up in the
result. If a reduces to v, then v will contain the names of the subexpressions of a
that contribute to producing v. Then, if we are given a' that coincides
with a on those subexpressions, we may deduce that a' reduces to v.
In our example, we would proceed as follows. First, when given the
expression I(g)(r), we would label some of its subexpressions. The more labels we
use, the more information we obtain. In this example, which is still
relatively simple, we
label only components of g and r:
gA =An.Az.(e0:z,e1:n> rA c
c =(e2:1,e3:2>
where e0, e1, e2, and e3 are distinct labels. We extend the reduction rules
of the A-calculus to handle labels; in this case, I(Ag)(Ar)
reduces to e0:e3:2. Stripping off all the labels, we can deduce
that I(g)(r) reduces to 2. Studying the labels, we may notice that e1 and e2 do not appear in
the result.
As we will prove, this means that I(g* ) (r*) reduces to 2 for any expressions g*
and r* of the forms:
g* c =An.Az.(z, > r* c = ( ,2>
Obviously, g' and r' match this
pattern, and hence f(g')(r') reduces to 2. As
this small example suggests, our techniques for dependency analysis are effective,
reasonably efficient, and
hence potentially practical.
In the next section we review the background for our work and some
related work. In section 3, we study dependency analysis and caching in the
pure A-calculus. In sections 4, we extend our techniques to a more realistic
language; this language includes records and has a weak operational semantics
based on explicit substitutions [ACCL91, Fie9O].
2 Motivation and
Related Work
The motivation for this work arose in the context of a system-modelling system
called Vesta [LM93, HL93]—roughly a replacement for tools like make and rcs. In
Vesta, the analogue of a makefile is a program written in a specialized, untyped,
higher-order, functional, lazy, interpreted language. The functional
character of the language guarantees that the results of system building are
predictable and reproducible.
In Vesta, the basic computation steps are expensive
calls to
functions like compile and link; hence it is important to avoid unnecessary
recomputations. The programs can be reasonably large; it is therefore
desirable to notice cache hits for large subexpressions rather than for
individual calls to primitives (e.g., individual compilations).
Furthermore, irrelevant changes in parameters are expected to be frequent; when
there are such changes, a simple memoisation [Mic68, Hug85] does not suffice
for avoiding recomputations, and a more savvy caching strategy is necessary.
This paper, however, is not about Vesta. There has been some research on
caching in Vesta [HL93], and more is currently in progress. Here we discuss
techniques for the A-calculus; these are somewhat simpler, easier to explain,
and perhaps of more
general interest.
In the A-calculus, the work that seems most closely related to ours is
that of Field and Teitelbaum [FT9O]. They have investigated the problem of
reductions of similar expressions (which may not even yield the same result).
Their approach
is based on a A-calculus with a new "fork" primitive (Lx) rather
than on a labelled A-calculus. For example, they can represent the two similar
expressions b(a) and b' (a) as the single expression Lx(b, b')(a),
with a rule for duplication, namely Lx(b, b')(a) = Lx(b(a), b'(a)).
Their algorithm seems particularly appropriate for dealing with pairs of
expressions that differ at only one or a few predictable subexpressions.
There has been much other work on incremental computation, and
some of it is related to ours. In particular, Pugh's dissertation concerns
incremental evaluation of functional programs; it raises the issue of caching for
functions that do not
depend on all of their arguments [Pug88
, pp. 70—
71].
Dependency analysis is also similar to
traditional analyses such as strictness analysis (e.g., [BHA86]). There
is even
a recent version of strictness analysis that relies on a labelled
A-calculus [GVS95]. Strictness analysis is concerned with what parts of a
program must be evaluated; in contrast, for doing cache lookups, we need to
know what parts of a program may affect the result. Furthermore, we do not use
approximate abstract interpretations, but rather rely on previous, actual
executions of programs similar to the one being analyzed.3
Dependencies in the Pure A-calculus
In this section we consider incremental computation in the
context
of the pure A-calculus. This is a minimal system, but it enables us
to illustrate our ideas. First we review some classical results that suggest an
approach to dependency analysis; then we describe a labelled calculus, a
basic scheme
for caching, some examples, and a more sophisticated scheme for caching; finally,
we consider call-by-value evaluation.
3.1 The A-calculus
The
standard A-calculus has the following grammar for expressions:
a,b,c ::== terms
M x variable
(x E V)
M Ax.a
abstraction (x E V)
M b(a) application
where V is a set of variables.
The /3 rule is, as usual:
(Ax.b)a -* b{a/x}
where b{a/x} is the result of replacing x with a in b. When C
is a context (a term with a hole), we write C{a} for the result of filling C's
hole with a (possibly with variable captures). We adopt the following congruence rule:
a-*b
C{a} -* C{b}
The
reduction relation -*" is the reflexive, transitive closure of -*. A computation
stops when it reaches a normal form.
We can now reformulate the problem posed in the introduction. Suppose
that a is a term and a -*" v. When can we say that b -*" v simply by
comparing a and b? In order to address this question, we recall a few known
theorems.
Theorem
1 (Church-Rosser) The calculus is confluent.
Theorem 2 (Normalization) If a term can be reduced to a normal form,
then its leftmost outermost reduction (which reduces the leftmost outermost
redex at each step) reaches this normal form.
Clearly
the leftmost outermost reduction reduces only subexpressions necessary to get to the
normal form.
A
prefix is an expression possibly with several missing subexpressions:
a, b, c
::==prefixes M hole
M x variable
(x E V)
M Ax.a
abstraction (x E V)
Mb(a) application
A prefix a is prefix of another prefix (or expression) b
if a matches
b except in some holes; we write a -< b. For instance, we have that (x)(
)(Ay. (y)) -< y(x)(x)(Ay. (y)). For the purposes of reduction, we treat like a
free variable; for example,
(Ax.x( ))(a) -* a( ).
The
following three results concern the prefix ordering and reduction:
Proposition
1 (Maximality of terms) If b -< d and b is a term, then b = d.
Proposition
2 (Monotonicity) If a, b, and c are prefixes, a —*" b, and a -< c, then there
exists a prefix d such
that c —*" d and b -< d.
Theorem
3 (Stability) If a is a term, v is a term in normal form, and a —*" v, then there is a minimum prefix
ao -< a such that ao —*" v.
Proof The stability theorem follows from the stability of
Böhm trees [Ber78]. Here we sketch a simple, alternative proof.
First we show that if a and b are compatible (have a common upper
bound) in the prefix ordering, and a and b reduce to a term v in normal form, then
the greatest lower bound of a and b (written a A b) also reduces
to v. The proof is by induction on the lengths of the leftmost outermost reductions to v,
and secondarily on the sizes of a and b. We proceed by cases on the form of a.
·
Ifa=x,thenv=xandb=x,so(aAb)=v.
·
If a = Ax.al, then b is of the
form Ax.bl, with al and bl compatible. The
result follows by induction hypothesis.
·
If a = x(al). .. (an), then
b is of the form x(bl). .. (bn) with a, and b,
compatible for each i E 1. .n. The result follows by induction hypothesis.
· The cases where a
= or a = (al) ... (an) are
impossible,
since a reduces to a term in normal form.
·
Finally, if a = (Ax.al)(a2).. . (an), then
b is of the form (Ax.bl)(b2). .. (bn). Let a' = al{a2/x}(a3).. . (an) and b' = bl{b2/x}(b3)... (bn); a' and b' are
compatible, and they reduce to v with shorter leftmost outermost reductions than a
and b. By induction hypothesis, a' A b' reduces to
v. Since a A b reduces to a' A b', we obtain that a A b reduces to v by transitivity.
Now suppose that a and v are as indicated
in the statement of the theorem. The prefixes of a that reduce to v are compatible, since
they have a as upper bound; their greatest lower bound is the prefix ao described in the statement. E
We can give a first solution to our
problem, as follows. Suppose that a —*" v and v is a term in normal form.
Let ao be the minimum prefix of a such that ao —*" v, given by Theorem 3. By Proposition 2,
if ao is a prefix of b
then b —*"
v'
for some v' such that v is a prefix of v'; by Proposition
1, v' is v. Therefore, if ao is a prefix of b
then we can reuse the computation
a —*" v and conclude that b —*" v.
It remains for us to compute ao. As we will show, this computation can
be performed at the same time as we evaluate a, and does not require much
additional work. Intuitively, we will mark every subexpression of a necessary
to compute v along the
leftmost outermost reduction.
3.2 A labelled A-calculus
In order to
compute minimum prefixes as discussed above, we follow the underlined method of
Barendregt [Bar84], generalized by use of labels as in the work of Field,
Levy, or Maranget [Fie90, Lév78, Mar91]. Our application of this
method
gives rise to a new labelled calculus, which we define next.
We
consider a A-calculus with the following extended grammar for expressions:
a,b,c ::== terms
M . . . as
in section 3.1
M e:a labelled term (e E E)
where E is a set of
labels.
There is one new one-step reduction rule:
(e:b)(a)
—* e:(b(a))
The essential purpose of this rule is to move labels
outwards as little as possible in order to permit /3 reduction. For example, (eo:(Ax.x(x)))(el :y) reduces to eo:((Ax.x(x))(el :y)) via the new rule, and then yields eo:((el:y)(el:y)) by the /3 rule.
There are clear correspondences between the unlabelled calculus and the
labelled calculus. When a' is a labelled term, let strip
(a') be the unlabelled term obtained by removing every label in a'. We have:
Proposition
3 (Simulation) Let a, b be terms, and let a', b' be labelled terms.
· If a' —* b', then strip (a')—*"
strip(b').
· If a = strip(a')
and a —*
b,
then a' —*" b' for some b' such that b = strip(b').
The labelled calculus enjoys the same
fundamental theorems as the unlabelled calculus: confluence,
normalization, and stability. The confluence theorem follows from Klop's dissertation
work, because the labelled calculus is a regular combinatory reduction systems
[Klo80]; the labelled calculus is left-linear and without critical pairs. The
normalization theorem can also be derived from Klop's work; alternatively it
can be obtained from results about abstract reductions systems [GLM92], via
O'Donnell's notion of left systems [O'D77]. The proof of the stability theorem
is similar to the one in
[HL91].
3.3
Basic caching
Suppose that a —*" v, where a is a term and v is its
normal form. Put a different label on every subexpression of a, obtaining a
labelled term a'. By Proposition 3, a' —*" v' for
some
v' such that v = strip(v'). Consider all the labels in v'; to each of
these labels corresponds a subterm of a' and thus of a. Let
G(a) be a prefix obtained from a by replacing with each subterm whose label
does not appear in v'. We can prove that G(a) is well-defined. In
particular, the value of G(a) does not depend on the choice of a'
or v'; and if the label for a subterm of a appears in v'
then so do the labels for
all subterms that contain it.
Whena —*" v, wemaycachethepair (G(a),v). Whenwe consider a new
term b, it is sufficient to check that G(a) -< b in order to produce v as the result of
b. As G(a) is the part of a sufficient to get v (what we called ao in section 3.1), we obtain the following theorem:
Theorem 4 If a is a term, v is a
term in normal form,
a —*" v,
and G(a) -< b, then b —*" v.
Theorem 4 supports a simple caching strategy. In this strategy, we
maintain a cache with the following invariants:
·
the cache is a set of pairs (ao, v), consisting each of an unlabelled
prefix ao and an
unlabelled term v in normal
form;
·
if
(ao,v) is in the cache and ao -< b
then b * v.
Therefore, whenever we know that v is the normal form of a, we may add to
the cache the pair (G(a), v). Theorem 4 implies that this preserves the cache invariants.
Suppose that a is a term
without labels. In order to evaluate a, we do:
· if there is a
cache entry (ao, v) such that ao -< a, then return v;
· otherwise:
-
let a' be the result of adding distinct
labels to a, at every
subexpression;
-
suppose
that, by reduction, we find that a' * v'
for v'
in normal form;
- let v = strip(v') and ao = G(a);
- optionally, add the entry (ao, v) to
the cache;
-
return
v.
Both
cases preserve the cache invariants. In both, the v returned is such
that a * v.
In a refinement of this scheme, we may put labels at only
some subexpressions of a. In this case, we replace with a subexpression of
a only if this subexpression was labelled before reduction. The more labels we
use, the more general the prefix obtained; this results in better cache
entries, at a moderate cost. However, in examples, we prefer to use few labels in order to enhance
readability.
Another refinement of the scheme
consists in caching pairs of labelled prefixes and results. The advantage of
not stripping
the labels is that the cache records the precise dependencies of results on
prefixes. We return to this subject in section 3.5.
3.4
Examples
The machinery that we have developed so far handles the example of the
introduction (the term I(g)(r)). We leave the step-by-step calculation for that
example as an exercise to the reader. As that example illustrates, pairing
behaves nicely, in the sense that fst(a, b> depends only on a, as one would expect.
As a
second example, we show that the Church booleans behave nicely too. The encoding
of booleans is as usual:
c
true =Ax.Ay.x false c = Ax.Ay.y if a then b else c c =a(b)(c)
In
the setting of the labelled A-calculus, we obtain as a derived rule that:
if (e:a) then b else c * e:(if a then b else c) It follows from this rule that, for example,
(Ax. if eo:x then ei:y else e2:z)(e3:true) * eo:e3:ei:y
We obtain the unlabelled prefix:
(Ax. if x then y else
)(true)
and
we can deduce that any expression that matches this prefix reduces to y.
Similar
examples arise in the context of Vesta (see section 2 and [HL93]). A simple one is the term:
(if isC(file) then Ccompile else M3compile)(file)
where isC(f) returns true whenever I is a C source file, and file is
either a C source file or an M3 source file. If isC(file) returns true, then
that term yields Ccompile(file). Using labels, we can easily discover that
this result does not depend on the value of M3compile, and hence that it need not be recomputed
when that value changes. In fact, even isC(file) and the conditional need not be
reevaluated.
In a
higher-order variant of this example, the conditional compilation function is passed as an
argument:
(Ax. x(file))
(Ay. if isC(y) then Ccompile(y) else
M3compile(y))
Our
analysis is not disturbed by the higher-order abstraction, and yields the same information.
3.5
Limitations of the basic caching scheme
The basic caching scheme of section 3.3 has some
limitations, illustrated
by the following two concrete examples. Suppose that we have the cache entry:
((Ax. (snd(x), fst(x)> )((true, false>), (false, true>) Suppose further
that we wish to evaluate the term: fst((Ax.(snd(x),fst(x)>)((true, false>))
Immediately
the cache entry enables us to reduce this term to fst((false, true>)), and
eventually we obtain false. However, in the course of this computation,
we have not learned how the result depends on the input. We are unable to
make an
interesting cache entry for the term we have evaluated. Given the new, similar term
fst((Ax.(snd(x),fst(x)>)((false, false>))
we cannot immediately tell that it yields the same result.
As a second example, suppose that we have the cache entry:
(if true then true else ,true)
and that we wish to evaluate the term: not(if true then
true else true)
In our basic
caching scheme, we would initially label this term, for example as:
not(if true then eo:true else ei:true)
Then
we would have to reduce this term, and as part of that task we would have to reduce the
subterm
(if true then eo:true else ei:true)
At
this point our cache entry would tell us that the sub-term yields
true, modulo some labels. We can complete the
reduction,
obtaining false, and we can make a trivial cache entry:
(not(if true then true else true), false)
However, we have lost track of which prefix of the input
determines the result, and we cannot make the better cache entry:
(not(if true then true else ), false)
The moral from these examples is that
cache entries should contain dependency information that indicates how each part of the
result depends on each part of the input. One obvious possibility is not to strip
the labels of prefixes and results before making cache entries; after all,
these labels encode the desired dependency information. We have developed a
refinement of the basic caching scheme that does precisely that, but we omit its
detailed description in this paper. Next we give another solution to the
limitations of the basic
caching scheme.
3.6
A more sophisticated caching scheme
In this section we describe another caching scheme. This scheme does not
rely directly on the labelled A-calculus, but it can be understood or even
implemented in terms of that calculus.
With each reduction a —*" v of a term a, we
associate a function d from prefixes of v to prefixes of a. Very roughly, if vO is a prefix of a then d(vO) is the prefix of a that yields vO in the reduction a —*" v. We write a —*"G v to indicate the function d. This annotated reduction
relation is defined by
the following rules.
· Reflexivity:
a —* " Ld a
where id is the identity
function on prefixes.
· Transitivity:
a—*" G b b—*" G,c
a—*" ~G, ~G~ c
where
d'; d is the
function composition of d' and d.
·
Congruence: Given a function d from prefixes of b to
prefixes of a, we define a function C{d} from prefixes of C{b} to prefixes of C{a}. If CO -<
C then C{d}(cO) = CO; otherwise, there exists a unique bO -< b such that CO = C{bO},
and we let C{d}(CO) = C{d(bO)}. We
obtain the rule:
a
—* " G b
C{a}—*" C{d} C{b}
· /3: (Ax.b)(a)—*"Gn b{a/x}
where d3( ) = and, for CO =and ~ CO -< b{a/x}, d3(CO) = (Ax.bO)(aO) where aO and bO are the least prefixes such
that aO -< a, bO -< b, and CO -< bO{aO/x}.
These
rules are an augmentation of the reduction rules of section 3.1, in the following sense:
Proposition
4 • If a —* " G b then a —*" b.
· If a —*" b then a —* " G b for some d.
The rules may seem a little mysterious,
but they can be understood in terms of labels. Imagine that every subexpression of a is
labelled (with an invisible label), that a —*" G v, and that vO -< v; then d(vO) is the least prefix of a that contains all of the labels that end up
in vO.
As
an example, consider the term (Ax.x(x))(a), where a is arbitrary. By /3, we have
(Ax.x(x))(a)—*"Gn a(a)
where
d3 is such that, for instance, d3( ) is , d3(a(a)) is the entire (Ax.x(x))(a), and d3(a( )) is (Ax.x( ))(a). If we had labelled the
initial term (Ax.x(x))(a) before reduction, then the labels that would decorate the
result prefix a( ) would be all those of the initial term except for the
label of the argument occurrence of x; this justifies that d3 (a( )) be (Ax.x( ))(a).
We obtain:
Theorem
5 If a is a term, a —*
" G v, and d(v) -< b, then
b —* " G v.
This theorem gives rise to a new caching
scheme. The cache entries in this scheme consist of judgements a —* " G v, where a and v are terms and d is a
function from prefixes of v to prefixes of a. The representation of d can be its
graph
(i.e., a set of pairs of prefixes) or a formal expression (written in terms
of id, d3, etc.); it can
even be the pair of a labelling of a and a corresponding labelling of v.
According to the theorem, whenever we encounter a term b such that d(v) -< b, we may deduce that b —* " G v.
This caching scheme does not suffer from
the limitations of the basic one. In particular, each cache entry
contains dependency information for every part of the result, rather than for the
whole result. Moreover, the rules of inference provide a way of combining
dependency information for sub-computations; therefore, we can make an interesting
cache entry
whenever we do an evaluation, even if we used the cache in the course of the evaluation.
3.7
Call-by-value
So far, we have considered only call-by-name evaluation.
Here we define a call-by-value version of the labelled A-calculus, showing that
we can adapt our approach to callby-value evaluation. The move from a
call-by-name to a call-by-value labelled A-calculus does not affect the
basic caching
scheme of section 3.3, which remains sound.
The syntax of the call-by-value labelled A-calculus is
that given in section
3.2. The /3 rule is restricted to:
(Ax.b)v —* b{v/x}
where v ranges over terms of the form x, x(al) ... (an), or Ax.a; such terms are called values. As in section 3.2, we have a rule for
moving labels outwards from the left-hand side of applications:
(e:b)(a) —* e:(b(a))
We
have an additional rule for the right-hand side of applications:
(Ax.b)(e:a) —* e:((Ax.b)(a))
One
might be tempted to adopt a stronger rule, namely b(e:a) —* e:(b(a)), but this rule creates critical pairs.
With call-by-name evaluation, the term
(Ax.b)(a) does not depend on a if x does not occur free in b. In particular, any term that
has (Ax.b)( ) as prefix yields the same result. With call-by-value evaluation, on
the other hand, the behavior of (Ax.b)(a) depends on a: it depends on whether a
can
be reduced to a value or not. Therefore, a term may match the prefix
(Ax.b)( ) but not yield the same result as (Ax.b)(a).
The treatment of labels in our rules for
call-by-value takes into account that (Ax.b)(a) depends on a even if x does not occur
free in b. Suppose that a can be reduced to a value v. To see how (Ax.b)(a)
depends on a, we add a label in front of a, obtaining (Ax.b)(e:a). Labelled
reduction
yields e:((Ax.b)(a)), then e:((Ax.b)(v)), and finally e:b. The label e does
not disappear, as it would in call-by-name evaluation.
For instance, let us consider the case where a is the value y(z). From
a labelled reduction we obtain the prefix (Ax.b)( ( )). Any term that matches
this prefix reduces to the same result as (Ax.b)(y(z)). In fact, the
evaluation of (Ax.b)(y(z)) reveals that this term depends on the "valueness" of
y(z), but does not depend on y(z) in any other way. The prefix (Ax.b)( ( )) does
not express this information
with full accuracy, but approximates it.
4 Dependencies in a Weak A-calculus with
Records
The techniques developed in the previous section are not
limited to the pure A-calculus. In this section, we demonstrate their
applicability to a more realistic language, with primitive booleans, primitive
records, and explicit substitutions. The operational semantics of this language is
weak (so function and
record closures are not reduced).
4.1 A weak calculus with records
We consider an extended A-calculus with the grammar for terms and for
substitutions given in Figure 1. In the grammar, L is a set of names (field
names for records) and else is a keyword used for "else" clauses in
records and in substitutions. As we show below, these "else" clauses
are useful in dependency analysis. We typically think of the "else" clauses as
corresponding to run-time errors (missing fields, unbound
variables). The term a +l in an "else" clause can be arbitrary; a
term that represents a run-time error will do.
We use the following notation for extending substitutions. Let s be xl = al,.. . , x = a ,
else = a +l; then (x = a) .s
is x= a,xl= al,... ,x = a , else = a +l if x is not among the
variables xl,. . . , x , and
it is x = a, xl =
al,. . . ,xi_l = aj_l,xL+l = aL+l,. . . ,x = a , else = a +l
if x is x',.
The one-step reduction rules now use explicit substitutions. They are
given in Figure 2. In particular, the analogue for the /3 rule is ((Ax.b)[s])a —* b[(x = a) . s],
which extends
an explicit substitution; the replacement of a for x happens
gradually, through other rules which push the substitution inwards in b.
An active context is a context generated by the grammar of Figure 3. We
adopt the following congruence rule: for any active context C,
a—*b
C{a} —* C{b}
Notice
that this rule allows us to compute inside substitutions, but not
under A, inside records, or in the term part of
closures (since Ax.C, (. . . ,l- = C,. . .>, (. . . , else = C>, and C[s] are not
listed as active contexts). The relation —*" is the reflexive, transitive closure of —*.
The prefix ordering for this language is interesting.
Let s
be the substitution xl = al,. . . x = a , else = a +l, let r be the record (ll = al,. . . , l = a ,
else = a +l>. We associate with s and r the following functions from
variables or field names
to prefixes:
[s](x)
= { aL if x = xj for some i
a +l otherwise
aL if l = li for some i
{
[r](l)
= a +l otherwise
Intuitively, [s](x) is the image of x through s, and
[r](l) is the image of l through r. The prefix ordering is as before except for
substitutions and records where s -< s'if [s](x) -< [s'](x)
for all x E V, and r -< r' if [r](l) -< [r'](l) for all l E L.
According to this definition, the order
of the components of substitutions and records does not matter. In
addition, we obtain that, if the "else" clause has a hole, then any other holes can
be collapsed into it; for example, the prefixes
(ll = a,l2 = ,else = >, (l3 = ,ll = a, else = >, and (ll = a, else = > are
all equivalent.
This A-calculus enjoys the same theorems as the pure A-calculus of
section 3.1 (modulo that now -< is actually a pre-order, not
an order). These theorems should not be taken for granted, however. Their proofs
are less easy, but they can be done by using results on abstract reduction
systems
[GLM92]. The stability theorem ensures that there is a minimum prefix
for obtaining any result; moreover, the maximality and monotonicity propositions
are the basis for a
caching mechanism.
Finally, we should note that, in this calculus, closures may contain
irrelevant bindings. For example, consider the function closure (Ay.y)[x = z,
else = w], where z is a variable and w is an arbitrary normal form. This closure
reduces only to itself; the irrelevant substitution does not disappear. In this case, we
will consider that the result depends on the substitution. We could add rules
for minimizing substitutions
but, for the sake of simplicity, we do not.
4.2 A weak labelled calculus with
records
Following
the same approach as in section 3, we define a labelled calculus:
a,b,c
::== terms
M . . . as in section 4.1
M e:a labelled term (e E E)
There are new one-step reduction rules in addition to those of section 4.1:
(e:b)(a)
—*
e:(b(a)) (e:b)[s] —* e:(b[s])
(e:b).l —* e:(b.l)
if (e:a) then b else c —* e:(if a then b else c)
The grammar for active contexts is extended with one clause:
C ::==active
contexts M . . . as in section 4.1
M e:C labelled context (e E E)
a,b,c ::== terms
I x variable
(x E V)
I Ax.a abstraction
(x E V)
I b(a) application
Ia[s] closure
(l1 = a1, . .. ,l = a ,
else = a +1) record (l'. EL, distinct)
a.l selection
(l E L)
true true
false false
if a then b else c conditional
s ::==x1 = a1,. .. , x = a , else = a +1 substitutions
(x'. E V, distinct)
Figure 1. Grammar for the weak A-calculus.
x[x1 = a1, . .. ,x = a , else = a +1] -* a'. (x = x'.)
x[x1 = a1, . .. ,x = a ,
else = a +1] -* a +1 (x=all ~x'.)
(b(a))[s]
-* b[s](a[s])
((Ax.b)[s])a -* b[(x = a) s]
(b.l)[s]
-* (b[s]).l
((l1 = a1, ... , l = a , else = a +1))[s].l -* a'.[s] (l = l'.)
((l1 = a1, . . . ,l = a , else = a +1))[s].l -* a +1[s] (l=all ~l'.)
true[s] -* true false[s] -* false
(if a then b else c)[s] -* if
a[s] then b[s] else c[s]
if true then b else c -* b if false then b
else c -*
c
Figure 2. One-step reduction rules for the weak A-calculus.
C ::== active
contexts
I hole
IC(a) application
(left)
I b(C) application
(right)
Ia[S] closure
I C.l selection
(l E L)
I if C then b else
c conditional
(guard)
I if a then C else
c conditional
(then)
I if a then b else C conditional
(else)
S
::== x1 = a1,. . . , x'. = C'.,. . . x = a ,
else = a +1 substitutions
As
usual, the congruence rule permits reduction in any ac‑
tive context:
a—*b
C{a} —* C{b}
for any active context C.
4.3
Dependency analysis and caching (by example)
The labelled calculus provides a basis for dependency
analysis and caching. The sequence of definitions and results would be much as
in section 3. We do not go through it, but rather give one instructive example.
We consider the term:
((Ax.x.li)(li =yi,l2 =y2,else= w)) [yi = zi,y2 = z2, else = w]
This term yields zi. We label the term, obtaining:
((Ax.
x.li)(li = (ei:yi), l2 = (e2:y2), else = (e3:w)))
[yi = (e4:zi),y2 = (e5:z2), else = (e6:w)]
This
labelled term yields ei :e4 :zi, so we
immediately conclude
that the following prefix also yields zi:
((Ax.
x.li)(li = yi,l2 = ,else =))
[yi = zi, y2 =, else =]
Thanks
to our definition of the prefix ordering, this prefix is equivalent to:
((Ax.x.li)(li =yi,else= ))[yi =zi,else= ]
Suppose that, in our cache, we record this prefix with
the associated
result zi; and suppose
that later we are given the term:
((Ax.x.li)(li =yi,l3 =yi7(yi7),else= w')) [yi7 = zi,yi = zi, else = w']
This
term matches the prefix in the cache entry, so we immediately deduce that it reduces to zi.
As this example illustrates, the labelled reductions
help us
identify irrelevant components of both substitutions and records. The
prefix ordering and the use of else then allow us to delete those irrelevant
components and to add new irrelevant components.
In some applications, irrelevant
components may be common. For example, in the context of Vesta, a large record
may
bundle compiler switches, environment variables, etc.; for many
computations, most of these components are irrelevant. In such situations, the
ability to detect and to ignore irrelevant components is quite useful—it
means more cache hits.
5 Conclusions
We have developed
techniques for caching in higher-order functional languages. Our approach relies
on using dependency information from previous executions in addition to the outputs of
those executions. This dependency information is readily available and easy to
exploit (once the proper tools are in place); it yields results that could be
difficult to obtain completely statically. The techniques are based on a labelled
A-calculus and, despite their pragmatic simplicity, benefit from a substantial body of
theory.
Acknowledgements
We thank Yanhong Liu and Tim Teitelbaum for discussions
about related work; Allan Heydon, Jim Horning, Roy Levin, and Yuan Yu for
discussions about Vesta; Alain Deutsch and Simon Peyton Jones for comments on
strictness analysis; and Georges Gonthier for assorted remarks on a draft of this paper.
References
[ACCL91] M. Abadi, L. Cardelli, P.-L.
Curien, and J.-J. Levy. Explicit substitutions. Journal of Functional Programming, 1(4):375-416,
1991.
[Bar84] Henk
P. Barendregt. The Lambda Calculus.
North Holland, Revised edition, 1984.
[Ber78] G. Berry. Stable models of typed
lambda-calculi. In Proc. 5th Coll. on Automata, Languages and Programming,
Lectures Notes in Computer Science, pages 72-89. Springer-Verlag, 1978.
[BHA86] G. L. Burn, C. Hankin, and S.
Abramsky. Strictness analysis for higher-order functions. Science of Computer
Programming, 7:249-278, 1986.
[Fie90] John Field. On lariiness and
optimality in lambda interpreters: Tools for specification and analysis. In Proceedings of
the Seventeenth Annual ACM Symposium on Principles of Programming Languages, pages 1-15, 1990.
[FT90] John Field and Tim Teitelbaum.
Incremental reduction in the lambda calculus. In Proceedings of the 1990 ACM
Conference on LISP and Functional Programming, pages 307-322. ACM, 1990.
[GLM92]
Georges Gonthier, Jean-Jacques Levy, and PaulAndré Mellites. An abstract standardisation theorem. In Seventh
Annual IEEE Symposium on Logic
in Computer Science, 1992.
[GVS95] Milind Gandhe, G. Venkatesh, and
Amitabha Sanyal. Labeled A-calculus and a generalised notion of
strictness. In Asian Computing Science Conference, Lecture Notes in Computer
Science. Springer-Verlag,
December 1995.
[HL91] Gerard Huet and Jean-Jacques Levy.
Computations
in Orthogonal Term Rewriting Systems. MIT Press, 1991.
[HL93] Chris Hanna and Roy Levin. The
Vesta language for configuration management. Research Report 107, Digital Equipment
Corporation, Systems Research Center, June 1993. Available from http://www.research.digital.com/SRC.
[Hug85] John Hughes. Lariy
memo-functions. In Jean-Pierre Jouannaud, editor, Functional Programming Languages
and Computer Architecture, pages 129-146, September 1985.
[Klo80] Jan
Willem Klop. Combinatory Reduction Sys‑
tems. PhD
thesis, CWI, 1980.
[Lév78] Jean-Jacques
Levy. Rueductions Correctes et Optimales dans le Lambda Calcul. PhD thesis, University of Paris 7, 1978.
[LM93] Roy Levin and Paul R. McJones. The Vesta approach to precise
configuration of large software systems.
Research Report 105, Digital Equipment Corporation, Systems Research
Center, June 1993. Available from http://www.research.digital.com/SRC.
[Mar91] Luc Maranget. Optimal derivations in weak lambda-calculi and in orthogonal term rewriting
systems. In Proceedings of the Eighteenth Annual ACM Symposium on Principles of
Programming Languages, 1991.
[Mic68] D. Michie. 'Memo' functions and machine
learn‑
ing. Nature, 218:19-22,
1968.
[O'D77] Michael O'Donnell. Computing in
Systems described by Equations. PhD thesis, Cornell University, 1977.
[Pug88] William Pugh. Incremental
Computation and the Incremental Evaluation of Functional Programs. PhD thesis, Cornell University, 1988.