Declarative vs. Imperative


IMPERATIVE programming is programming where you "tell the computer
what to do" using imperative statements.  You directly write an
algorithm that the compiler or interpreter translates more or less
straightforwardly to machine code.  All common general-purpose
languages are imperative languages.  This includes Lisp and Scheme,
and object-oriented languages such as Smalltalk, Eiffel and C++.

DECLARATIVE programming is programming by telling the computer WHAT,
not HOW.  In a declarative programming language, you tell the computer
facts, and ask it questions.  It figures out the answers to the
questions, based on the facts you told it.  There are no purely
declarative programming languages that are general-purpose (i.e.,
suitable for writing a large variety of typical application programs).
Pure declarative programming languages are a holy grail that is unlikely
ever to be achieved, unless artificial intelligence results in computer
programs that are smarter than computer programmers, and that's a long,
long way off.  (At that point, instead of having a compiler, you'll just
have a very smart AI program that you ask questions.)  Until then,
sometimes you'll have to tell the computer HOW to solve a problem, as
well as WHAT the problem is.  Or you'll at least have to give it hints.

Likewise, imperative languages aren't purely imperative, either.
They have little declarative aspects that simplify the programmer's
task.  She can declare certain simple and tedious things, and rely on
the compiler to figure out how to do them. 

---

Arithmetic Expressions as Slightly-Declarative Programming

Even in imperative languages, we often throw in simple declarative
features, as long as they don't cost much.  For example, when you
write a arithmetic expression in languages like FORTRAN or Pascal
(or, most of the time, C or C++), you do get to write something that
looks like an arithmetic expression, i.e., you DECLARE that the value
should be the equal to the value of the expression.  It so happens that
there's a mindless procedural rule for doing this kind of arithmetic,
as we've seen---evaluate the subexpressions, and then use those results
to compute the values of the expressions they occur in.

On the other hand, real compilers often do NOT do that---they take the
arithmetic expression as a general statement of what is to be computed,
and come up with an equivalent procedure that produces the same results,
but which may be faster.  So if you write

   foo := (a * 2) + (b * 2);
   
the compiler may convert that to the equivalent

   foo := (a + b) * 2;
   
or whatever equivalent expression it thinks will be fastest.

---

Type Declarations as Slightly-Declarative Programming

Another aspect of normal programming languages that's declarative is
the use of strong or static typing.  When we DECLARE (get it?) the
type of a variable, we're making a declarative statement that the
value of that variable will always be of that type.  Based on those
declarations of the types of values at particular points in the
code, the compiler uses a built-in TYPE-CHECKING ALGORITHM to INFER
whether the types of variables and expressions are consistent.  If so,
it draws the conclusion that the actual types of the variables will
never violate their declarations at run time.

In essence, when you submit a program to the compiler, you're asking
a question:  "is it true that none of the variables in this program
will never take on the wrong type of value at run time?"  The type
checking algorithm attempts to construct a simple "proof" that the
answer is YES.  If it can't prove it, the compilation fails. 

---

Class definitions as Slightly-Declarative Programming

Another aspect of normal programming that's declarative is the
construction of classes.  When you define a class, the compiler must
infer things from what you explicitly say.  It must infer a layout
of the class, perhaps based on the inheritance relationships you've
declared and the structure of the superclasses.  It must infer which
methods are applicable to the class, based on the superclasses and
the methods they inherit.  At each point where you send a message
(or call a generic/virtual function) the language implementation
must infer what the appropriate method is.  (Some of this inference
happens at compile time, and some of it happens at run time.  The
compiler computes a partial answer, and the runtime code infers
the details.)

---

SERIOUSLY DECLARATIVE Programming

The declarative aspects mentioned above are so commonplace that we
take them for granted, and we still call common languages imperative.
The programmer still has to say HOW to solve the real problems,
even if the compiler can infer little details.

Declarative Math

One kind of serious, but simple declarative programming is a system
that simple math problems.  For example, we might declare that
one variable equals the sum of two others:

   A = B + C
   
Then we might declare the values of two of the variables, and ask
the system (compiler or interpreter) to infer the value of the third:

   B = 2
   C = 15
   A = ?
   
The system will recognize that the value of a depends on B and C
and compute the value of A:

   Answer: A = 17

We can use the expression A = B + C to infer other things than the
value of A: if we know A and B, we can infer the value of C.

   A = B + C
   A = 2
   B = 3
   C = ?
   
   Answer: C = -1
   
The expression A = B + C declares a relationship between the values
of A, B, and C, not a procedure for computing the value of any of
them.  Based on that relationship, the system can figure out how
to compute the value of any of the variables based on the values of
the other two.

Equality is Not Assignment

Notice that in this framework, assignment statements don't make sense.
If you try to use an equality statement as an assignment statement,
e.g., "X = X + 1", you're doing something that doesn't make logical
sense.

For what value of X is X equal to X + 1?  None.  So it's not true that
"X = X + 1".  If you tell a declarative system something that's not
true, all bets are off---it may infer lots of false things from one
false statement.  In many systems, a false statement may lead the
system to infer weird and surprising (and false) statements about
seemingly unrelated things.

Typically, in declarative programming, we're making statements
about a static state of the world, and the system infers other
statements about the SAME state of the world.  It does not transform
the state of the world, it simply "elaborates", filling in missing
detail that's implied by what was already said.


More Advanced Mathematics, and the
Limits of Declarative Programming

There are advanced declarative programming systems that can solve
much more advanced math problems than the simple example we've
described here.  There are some that can solve large systems of
equations, do certain kinds of calculus, etc.  These systems are
limited however---there is no system that can solve all math
problems, because there is no algorithm to do that.  (It's not
even possible, much less likely.) 

Part of the art of designing declarative systems is to come up
with a well-defined class of problems that a system will address,
and ensure that it is capable of solving problems of that type
(at least most of the time).

Another important part of designing such systems is to make it
clear to the users what the system CANNOT do, and when it's up
to the programmer to solve parts of the problem procedurally.

--------------------------------------------------------------------

Theorem Provers

The closest thing to pure declarative programming is AUTOMATIC
THEOREM PROVING.  A theorem prover provides a language in which
the user can make statements in a LOGIC LANGUAGE, and it has
an INFERENCE ENGINE, which is an algorithm that infers things
from what it is told.

A theorem prover stores representations of statements in a
DATABASE.  (We don't mean a general-purpose database system
here---we're using the general sense of "database" which just
means a data structure that you use to store data, especially
declarative data.)

The inference engine operates on the database, looking up statements,
and using facts in combination to infer more statements.  These new
statements may be added to the database, so that further statements
can be inferred from them.

The items in the database are the ones that the system views
as "TRUE".

---

Propositional Logic

A simple logic language for a simple theorem prover is PROPOSITIONAL
LOGIC.  (This is a weak logic that you probably wouldn't use for
complicated problems, but it's a special case of more interesting
logic languages.)

In propositional logic, we have some kind of symbol that we use
to represent facts.  If we implement a theorem prover in Scheme
(or Lisp) these might be Scheme (or Lisp) symbols.  We just need
some unique symbol to represent a particular statement.


Facts and Rules

In propositional logic, we will use individual symbols to represent
simple facts.  So, for example, the statement that it's humid
might be represented by the symbol HUMID, and the statement that
it's raining might be represented by the symbol RAINING;  we
might represent the fact that the humidity is 100% by the symbol
HUMID100.

We will also use RULES that say how one fact might be inferred
from another fact or set of facts.  We can represent the fact
that if it's raining, the humidity is 100% by a statement of
IMPLICATION; we'll use a right-arrow symbol (->) to denote
that one fact implies another:

RAINING -> HUMID100

Given the rule RAINING -> HUMID100, and the fact RAINING,
our theorem prover is allowed to infer that the humidity
is 100%.  It can add the symbol HUMID100 to its database.

Similarly, we can represent the fact that if the humidity
is 100%, it's definitely HUMID.

HUMID100 -> HUMID

We can also represent NEGATION (the fact that something is
NOT true by having a logical operator NOT.

We can represent the fact that if the humidity is not
100%, it's not raining.

(NOT HUMID100) -> (NOT RAINING)

and the fact that if the humidity is 100%, it's humid

HUMID100 -> HUMID

Notice that we have to assign some interpretation to HUMID.
How high must the humidity be before we consider it humid?
We might decide that 50% humidity or higher is HUMID.  We
don't necessarily have to tell the theorem prover that, though,
as long as all of our rules are consistent with that.  (It's
difficult to represent precise arithmetic with simple
propositional logic.  For that, we'll want to use a more
powerful logic.)


Implication is not causality

A -> B does NOT mean that A CAUSES B, or that B is true BECAUSE
A is true.  It just means that "in all possible states of affairs,
B is true if A is true---even if there's no direct or interesting
connection or causal relationship between those facts".

(Of course, the rule itself could be wrong, but to do theorem proving
we have to assume that our statements are right, or we're lost from
the start.)

For example, if we know that Bill Clinton jogs only on Tuesdays, we
might write

  BILL_IS_JOGGING -> TUESDAY
  
and that doesn't mean that it's Tuesday BECAUSE Bill is jogging.


---

Exploiting Implications:  Modus Ponens and Modus Tollens

The inference engine of our theorem prover will encode some
very basic inference rules;  these are different from the rules
we give it to work with---they are so basic that they are encoded
in the algorithm used by the inference engine itself, and they're
what lets the inference engine do something with the rules we
give it.

The basic inference rule that we used to infer things in the
examples above is called Modus Ponens.  It says that if the
thing to the left of the arrow is true, then the thing to the
right of the arrow is true as well.

There's another basic inference rule for implications, which
allows us to infer that if the thing to the right of the arrow
is false, then the thing to the left is false as well.  This is
simple reasoning by contradiction:  we know that if the statement
to the left is true, then the statement to the right is true,
but that can't be the case because we also know that the thing
to the right is false.  Ergo, the thing to the left is false
as well.

If we can count on our inference engine to use modus tollens,
and we say that RAINING -> HUMID100, we don't have to also say
that (NOT HUMID100) -> (NOT RAINING).  Modus tollens lets us
infer both kinds of things from the first rule.

This illustrates an important principle---what we have to
explicitly say depends on the inference strategy of our inference
engine. 

It also illustrates an important relationship between statements,
called SUBSUMPTION.  From RAINING -> HUMID100, we can infer
everything we can infer from (NOT HUMID100) -> (NOT RAINING),
plus more things.  Thus if our inference strategy is powerful
enough, we don't have to record the latter fact to be able to
use it.  Modus Tollens will allow us to infer the same things
from just the first statement.


---

A simple FORWARD-CHAINING inference strategy

A simple design for a theorem prover is to have a simple list
of statements and an inference engine that just goes through
this list and tries to match things up so that it can draw
inferences.

Our inference engine can just go through the list and take
each rule, and for each rule see if there's another statement
that it can use to infer something new.

   FOR each statement
      IF the statement is a rule, THEN
         FOR each statement         
            IF the statement enables an inference via
               modus ponens or modus tollens, THEN
                  IF we inferred a new fact, THEN
                     add the fact to the database
                   
We thus go through the list finding things we can infer, and
adding them to the database.

We can put this pair of nested loops inside another loop, and
just do this as long as we keep inferring new things.

This is called FORWARD CHAINING.  It's called FORWARD CHAINING
because we go forward from things we already know to infer new
things.

This implementation is slow for two reasons.  One reason is
that keeping simple lists means that we spend a lot of time
going through the lists looking for matches.  The pair
of nested loops that looks for matches gets very expensive
(n squared) if there are a lot of statements in the database.
The performance of the algorithm can be improved dramatically
if we use INDEXING to make searches faster.  For example,
we might keep some hash tables that hold lists of statements
categorized by what's on their left-hand sides.  Then we
don't have to use a linear search to find matches.  But that's
not enough to make simple forward chaining practical.
 
A more basic reason that it's inefficient is that mindless
forward-chaining is likely to infer lots of things that don't
have anything to do with solving a particular problem.
Undirected forward chaining will eventually infer all of the
statements can be inferred, but it may take a very, very long
time before the fact you're interested in pops out.  If you
want to get a useful answer in your lifetime, you'll usually
have to use a more directed strategy.

Forward-chaining inference engines therefore usually use
HEURISTICS to decide which of the many inferences are worth
making.  The heuristics (rules of thumb) are supposed to help
it order its "search" for answers, so that it works on the
things that are most likely to be fruitful first.

---

COMPLETENESS and EFFICIENCY

This illustrates a basic tension in theorem proving.  Some
inference strategies are COMPLETE, in that they are able to
infer anything that can be inferred from the information they're
given.  Unfortunately, COMPLETE strategies are INEFFICIENT,
because they're likely to produce a lot of useless stuff before
the answer you want pops out.

There are two basic strategies for coping with this:

   1. Use heuristics to guide the complete strategy, to increase
      the probability of getting the answer you want within
      the time you're willing to wait.
      
   2. Use an incomplete strategy, but ensure that the information
      you put in the database is encoded in such a way that all
      of the useful inferences will get made anyway.
      
Both of these strategies are fine arts, and underscore an important
fact about theorem proving:  just because something is logically
correct doesn't mean it's useful.  Either way, you have to have
a problem-solving strategy that tells you how to code the inference
strategy you need, or how to encode the information operated on
by a simpler strategy to get a similar effect.

There's also a third, common strategy used by many Prolog
programmers:

   3. Encode the information in a way that is not logically
      correct, but be careful to ensure that the inferences
      that are drawn don't actually affect the correctness
      of the answer.

No matter what you do, you can't guarantee that what the inference
system will do is both correct and efficient without knowing what
the inference engine will do with the information you encode.
That's why "logic programming" IS programming---you're encoding
a problem solution strategy one way or the other:  by controlling
the inference engine's algorithm, or by encoding the data with
careful regard to how the inference engine will react.

---

BACKWARD CHAINING

An alternative to FORWARD CHAINING is BACKWARD CHAINING.  Rather
than starting from what you know and trying to infer something
useful, you start from something it would be useful to know and
work backwards, trying to see if you can infer it from what
you already know.

Here's an example.

Suppose we have the following statements in our database:

 RAINING -> HUMID100
 RAINING -> STREETS_ARE_WET
 HUMID 
 TUESDAY -> (NOT MONDAY)
 RAINING
 THURSDAY

Now we ask our backward chaining inference engine if it's HUMID100.
The inference engine looks through the database to see if there's
a rule it can use to infer that it's HUMID100.  It finds the
rule RAINING->HUMID100 and sees if it can find something that
proves what's on the left-hand side, i.e., RAINING.  It finds
the fact RAINING in the database, which satisfies the rule without
any further rule applications.  It then draws the inference that
HUMID100 is true, and says YES.

This backward search for supporting evidence can continue if
the desired fact is not in the database, but a rule is present
that can confirm what's on the left-hand side.  The inference
engine may go through many chains of rules to find a sequence
of rules that leads from known facts to the desired fact.

The advantage of backward chaining is that it's DIRECTED---it
only searches through rules that are (at least superficially)
relevant to answering the question you pose.  (Notice, however,
at the search for a chain of rules may still be slow---if
the database holds many rules about the statement in question,
the inference engine may backtrack through a lot of seemingly
relevant rules that turn out to be dead ends because they're not
enabled by any facts, before getting to the right chain of
rules that does lead back to known facts.) 
 
---

RESOLUTION-based theorem proving

Recall that logical implication is can be used to infer that the
left-hand statement is false if the right hand statement is false,
as well as being used to infer that the right one is true if the
left one is true. 

Given this interpretation of A -> B (logical implication), it turns
out that A -> B is exactly equivalent to B OR (NOT A).  Think about
that for a while if it's not obvious.

This means that we can convert rules into logical disjunctions
(OR statements).  We can use this as part of a strategy for
converting every statement into an equivalent disjunction.

Suppose we have these two statements

  A -> B
  (NOT A) -> C
  
We can convert them to the form

  B OR (NOT A)
  C OR (NOT (NOT A))

The second one can be trivially simplified so that we have

  B OR (NOT A)
  C OR A
  
(This simplification will be performed automatically by a theorem
prover.)

There is a basic inference rule that says that if you have two
disjunctions, and one contains a statement and the other contains
the negation of that statement, then you can infer a new disjunction
that combines the remaining components of each disjunction into
a single disjunction. 

In this case, we can combine these two disjunctions to get

  B OR C

This is a very important technique called RESOLUTION.
  
It turns out that if we convert all of our implications into
disjunctions, then this inference rule gives the effect of both
modus tollens and modus ponens---anything that can be inferred
using modus ponens and modus tollens on implications can also
be inferred using resolution.

Forward-chaining using only RESOLUTION is a complete inference strategy.
As long as we convert all of our statements to disjunctions (and we
can), then anything we can infer using other sound inference techniques
can also be inferred using resolution alone.

RESOLUTION-BASED theorem provers are based on this idea.  All of the
statements in the database are represented as DISJUNCTIVE CLAUSES
(OR statements) of propositions or the negations of propositions,
and all the inference engine does is to combine disjunctions using
resolution.

To make this work, we need to be able to convert all kinds of
statements involving implications and AND into disjunctions.  There
are some simple rules for this.

When we convert our initial database to a set of disjunctive clauses,
it may be much harder for humans to understand than if we used
implications.  We not only convert the implications into disjunctions,
but we have to convert ANDs into ORs somehow as well.

Consider a clause like

  A OR (B AND C)
 
There's another conversion which lets us split this into two clauses.
Notice that if A OR (B AND C) is true, then if we know A is false,
we know that B is true and we also know that C is true.  We can split
this "and" up into two separate clauses to which give the same
effect:

  A OR B
  A OR C
  
(Since all clauses in the database are supposed to be true, there's
implicitly an AND between them.)

By using these simplification rules, we can convert complex expressions
in terms of AND and -> into disjunctions.

For example,

  (A AND B AND C) -> D
  
first gets turned into

  (NOT D) OR (A AND B AND C)

and then gets turned into

  (NOT D) OR A
  (NOT D) OR B
  (NOT D) OR C
  
This has the advantage of presenting a simple representation that
a simple algorithm can work with, but it has the disadvantage that
the resulting clauses are hard for humans to understand.  It's
very difficult at times to figure out what's going on with a 
resolution theorem prover, and what the individual inference steps
mean.  

---

Like any other complete inference strategy, resolution can result
in the generation of a lot of useless statements clogging up
the database.  Resolution-based theorem provers typically have
several control features to help the programmer tailor the inference
engine's search strategy so that it's likely to work for a particular
problem or kind of problem.

For example, the inference engine is likely to favor resolutions
involving short clauses rather than long ones, because short
disjunctions are more powerful statements than long ones.  (E.g.,
"it's raining or it's tuesday" is a weaker statement than "it's
raining".)

These heuristics are weak, however, in that they're superficial;
they don't "know" anything about the meanings of the clauses being
computed over.  There is no way to get around the fact that the
programmer will often have to take advantage of deep knowledge
about the domain and "tell the system what to do", one way or
another.

One common way of doing this is for the programmer to propose
lemmas, to guide the theorem prover's search.  Rather than expecting
the theorem prover to work all the way from the facts to the
conclusion of a large proof, the programmer may use the theorem
prover as an automatic assistant;  the programmer will design
an overall proof, but rely on the the theorem prover to automatically
fill in many of the details.

Going even further in that direction, theorem provers are sometimes
used as PROOF CHECKERS;  a human may construct a complex proof, and
the theorem prover is used to check to see that all of the steps
are sound.

---

Resolution theorem provers are based on PROOF BY CONTRADICTION.
(We're talking about the high-level strategy here, not a single local
inference step as in the case of modus tollens.)

Rather than trying to prove that something is true, you try to
prove that its negation results in a contradiction.  This turns
out to be a convenient way to detect that the system has
done the necessary steps, because detecting a simple contradiction
is easy---at some point, you end up with two clauses that cancel
each other out completely (because one contains just the negations
of the terms in the other), and yield a zero-length clause.
If this ever happens, you know that the negation of the thing
you want to prove is inconsistent with the other things in
the database, therefore it must be true.  For example, we might
end up with a clause that contains only P, and another clause
that contains only (NOT P).  At that point we know there's an
inconsistency somewhere.

There's a catch here, of course---when we detect an inconsistency,
it's not always obvious where the inconsistency came from.  Real
theorem provers do some bookkeeping to keep track of which 
inference steps led to each clause in the database, so that they
can show the whole chain of steps that led from the initial
database state to a contradiction, and leave out all of the
other steps, which ended up producing clauses that weren't involved
in generating the contradiction.  

There's still the problem that what the theorem prover did may
be hard to understand---there may be a sequence of resolutions
you can follow, but it's unclear what the clauses mean because
they're just weird disjunctions.

 
----------------------------------------------------------------------

Prolog

Prolog is a "logic programming" language, which is based on a simple
kind of theorem prover.  Prolog restricts the language that you
write rules in to be implications whose right-hand sides contain
preconditions ANDed together, e.g.,

  A <- B AND C AND D
  
tells prolog that if B and C and D are all true, then A is true.
Actually, not only do we reverse the direction of the implication
arrow, but we also write it differently, as :-, and we connect
use a comma to mean AND so we say

  A :- B, C, D

There are two reasons for this particular restricted format:

   1. It's pretty easy for humans to understand, and
   
   2. Rules like this can be converted to HORN clauses, which are
      a special case of disjunctive clauses, and can be operated
      on by a special restricted case of the resolution algorithm.

---
      
HORN CLAUSES and Prolog
 
There's a special kind of disjunctive clause called a HORN CLAUSE,
which consists of any number of negated terms and exactly one
non-negated term.  By convention, the non-negated term is written
first, and is called the HEAD.
 
Consider the rule A <- B, C, D

We can convert any Prolog-style implication statement into a Horn
clause automatically.  First we rewrite it as

A OR (NOT (B AND C AND D)),

then we must rewrite (NOT (B AND C AND D) to get rid of the ANDs.
There's a simple rule for this:  the negation of a conjunction
is equivalent to a disjunction where the parts are negated.  In
this case, if we know it's NOT true that B and C and D are all
true, then that's the same thing as knowing that at least one
of A, B, and C is false.  That is, (NOT B) OR (NOT C) or (NOT D);
we plug this back in to get
 
A OR (NOT B) OR (NOT C) OR (NOT D)

Now it's in the format we want for Prolog's restricted version
of resolution: a disjunction with exactly one non-negated term.

---

Prolog's Inference Strategy

While Prolog uses resolution, in a restricted way, you don't have
to worry about that when you program in Prolog.  You can think of
it as being Modus Ponens applied to the rules in the Prolog program.

Prolog uses a very simple backward-chaining search strategy, 
specifically a depth-first search for rules, which is ordered by
the order you write the rules.

When you state a goal clause, Prolog attempts to use the first relevant
rule it finds to prove that clause.  To do so, it tries to find
proofs of the preconditions for those rules, in order.  To prove
the precondition of a rule, it similarly tries to find an applicable
rule, prove its preconditions, and so on.

This results in a depth-first search of all of the chains of rule
applications that could lead to proving the clause in question.
At each attempt to satisfy a precondition of a rule, that term
becomes a SUBGOAL, i.e., a goal to be satisfied as part of the
overall goal of proving what you asked.  Whenever prolog fails
to prove a subgoal, it FAILS, and BACKS UP to find another way
of proving the higher-level subgoal.  This depth-first pattern
is therefore called BACKTRACKING.

---

Problems with Prolog

Prolog has three main problems, and two of them arise because
of this simple depth-first backtracking strategy:

   1. backward chaining may be inefficient, because it may backtrack
      through many subgoals and rule applications without being
      able to find a path backward to the facts.  There can be
      an exponential explosion due to the number of rules that
      are applicable at each step.
      
   2. backtracking is not a complete proof strategy.  There are
      many simple inferences that this simple search will miss,
      and that means 
   
The first problem is dealt with by structuring the rules carefully,
and using certain non-logical operations (in particular, the CUT,
which can be embedded in a clause to tell Prolog not to backtrack
through it).

This means that Prolog programmers must be very careful how they
structure their rule sets.  They order their rules carefully to
make sure that the right rules are used first, and so that CUTs
can be used to prune the "search tree" of the normal backtracking.

The second problem is also dealt with by using non-logical operators.

Here's an example of something Prolog can't infer using its usual
backtracking alone.  Suppose we have the following two rules:

       A -> B
 (NOT A) -> B
                                         
and we ask if B is true.  Logically, it's obvious that B is true,
because if A is true, then B is true, AND if A is false, then B
is true in that case, too.  There's no possible state of affairs
where B is not true. 

What Prolog will do is try to prove B using the first rule, and
fail, because there's no rule that allows it to infer A.  Then
it will try to use the second rule, which will fail because there's
no rule that allows it to infer not A.  Then it will stop, having
failed to prove B.

In contrast, this is a trivial one for normal resolution.  The
equivalent disjunctive clauses are:

   B OR (NOT A)
   B OR (NOT (NOT A)),

which simplifies to

   B OR (NOT A)
   B OR A

One resolution step will cancel the (NOT A) and the A out, and
combine the remaining terms to get (B OR B), which simplifies
immediately to B.

---

Because of Prolog's limitation, there is a facility for forward-chaining,
which programmers must use explicitly if they want forward chaining
to occur.  This facility is called ASSERT, which is an operation
that can be imbedded in a rule to add something to the database.

e.g.,

  foo :- bar, bleen, assert(quux)
         
If prolog attempts to apply this rule while it's backtracking,
it will first try to prove bar, then bleen, and if it gets that
far, it reaches the assert(quux).  At that point, quux is added
to the database.

Prolog programmers become very adept at using non-logical operations
like this to get around the incompleteness of backward chaining.
They must understand both the data (rules) and the search algorithm,
and ensure that they use assertions like this to force Prolog to
draw inferences it otherwise would not draw.

This is rather tacky, because ideally there would be a SEPARATION
OF CONCERNS in logic programming:

  * the database would hold declarative information only, and
  
  * the search strategy would be controlled separately.
  
That way, verifying that a logic program is correct could be
separated into two steps:

  * ensuring that the declarative information is correct, i.e.,
    factual, and
    
  * ensuring that the proof strategy is efficient enough to
    make the system usable.
    

---

NEGATION-AS-FAILURE and THE CLOSED-WORLD ASSUMPTION

Not only is Prolog's inference strategy incomplete, it's also unsound.
There are times when Prolog will draw inferences that are not logically
implied by the rules and facts in the database.

This occurs because Prolog assumes that if it can't prove something
is true, then it must be false.  If you ask whether something is
true, and Prolog can't prove it using backtracking (and whatever
assertions you've put in), it will tell you it's false.  Prolog
doesn't know how to say "I don't know".

This embodies the "closed-world" assumption:  Prolog assumes that
its database contains all of the information relevant to any question
that you ask it.

The closed-world assumption makes sense in some domains.  If you can
guarantee that all of the relevant information is in the database,
it doesn't, in itself, cause a problem.  But that means that you can't
count on Prolog to give you the right answer if the information in
the database is incomplete, as is true in many real-world situations.

Actually, though, Prolog makes an even stronger assumption, because
of its incomplete inference strategy.  Not only does it assume that
all of the relevant information is available in the database, it
assumes that it's all encoded in such a way that any question you
ask it can be answered correctly using its incomplete search strategy.  

---

You might think that the undsoundness of Prolog's inference strategy
would make it unusable, but Prolog programmers become adept at
organizing their rules in a way that works with Prolog's inference
strategy.  They also become adept at writing procedural code within
the framework of depth-first backtracking;  this allows them to
easily mix real logic programming with more conventional algorithmic
programming.

Because of this, it is fairly easy to write many programs in Prolog
and know that they will execute reasonably efficiently (within, say,
an order of magnitude of the speed of a C program).  In contrast,
purer theorem provers are notoriously difficult to make efficient.

Prolog is by far the most popular logic programming language, partly
because of its pragmatic nature, allowing a mix of programming styles.   


---

VARIABLES, UNIFICATION, DATA STRUCTURES

  (See book chapter on Prolog.)
  
---

TIME in LOGIC PROGRAMMING

Earlier we said that in logic programming, we generally don't CHANGE
(mutate) the state of our representation, in the sense we often do
in conventional programming languages.  Instead, as we infer new
facts, we ADD those to our representation to flesh it out.  Usually,
when solving a logic problem, there's no state involved.

Sometimes, though, we want to reason about time.  For example,
a robot might have a theorem prover that proves that it's possible
to accomplish a task, and the proof itself will contain a plan
for accomplishing the task.

Usually, rather than relying on side effects (like Prolog's assert
and retract), we represent states explicitly as sets of facts or
data structures.  For example, we may represent time explicitly
using numbers that indicate steps of a process.  The fact

   ON_CHAIR(Box,21)
   
might indicate that the Box is on the chair at time 21.
  
------------------------------------------------------------------------

CONSTRAINT PROGRAMMING

Like logic programming, constraint programming is a kind of declarative
programming.  A constraint programming system allows you to partially
specify the state of a program, and have the system automatically
flesh out things that you didn't explictly declare.

The first (recognizable) constraint programming system was SKETCHPAD,
an interactive graphical program created by Ivan Sutherland.  SKETCHPAD
let you graphically specify geometric shapes and constraints between
them, and then interactively change things.  (For example, you might
draw a triangle, and declare that two sides should always be of equal
length.)  In response to a change (e.g., moving a point of a triangle),
SKETCHPAD would update other aspects of the state of the system, in
order to preserve the specified constraints.  (E.g., if you stretched
one side of the triangle whose two sides must be the same length,
SKETCHPAD would stretch the other one to match.)

The basic idea of a constraint programming system is that you declare
some constraints and the system figures out how to keep them satisfied.
In most constraint programming systems, you can use side effects (i.e.,
change something), and the system must respond.  In effect, this poses
a new problem to the system, which must solve it.  At each step,
however, the system's task is to solve a problem: how to transform
the current state into an acceptable new state that incorporates the
changes made by the user.

If the user can specify arbitrary constraints, then there is no
algorithm that can ensure that they are all satisfied.  It's even
less likely that the system will automatically find a satisfactory
arrangement within a reasonable period of time.

Real constraint programming systems (like logic programmign systems)
therefore trade of generality for efficiency, or require the programmer
to carefully arrange the constraints so that the problems are tractable.

One common way of helping keep constraints tractable is to LIMIT THE
DOMAIN over which you can specify constraints.  You might, for example,
only be able to express constraints about 2-dimensional arrangements
of geometric shapes using quadratic (and simpler) formulas.  (Another
example would be constraints among voltages and resistances in 
resistive networks.)

Another way of making things tractable is by requiring that the user
PRIORITIZE constraints, and be willing to accept violations of low-
priority constraints.  This allows the system to satisfy the
high-priority constraints first, and then see if it can satisfy the
low-priority ones.  Without this kind of prioritization, the system
might repeatedly attempt to satisfy a constraint, only to find that
it causes a chain of changes which eventually violates another
constraint.

A third way of making life easier for the constraint satisfier is
to let the programmer supply PROCEDURES for satisfying constraints.
That is, besides specifying the constraint (what), the programmer
may supply procedures for re-satisfying the constraint when things
change.  This lets the programmer explicitly program strategies that
the underlying constraint satisfaction engine is too dumb to find
by itself.

General constraint-satisfaction programming languages let you combine
these strategies.  In effect, they let you come up with your own
constraint-satisfaction language for the application domain you
care about, specifying your own constraints, priorities, and
procedures for re-satisfying constraints.  Then you can use the
result as a domain-specific constraint programming system.  The
programmer who tailors the general-purpose system may be a different
person than the "users" who use the resulting domain-specific
system.

SPREADSHEETS

The most popular form of constraint satisfaction language is the
SPREADSHEET.  A spreadsheet is a simple constraint language, presented
to the user via a graphical interface.  The user sees a two-dimensional
array of CELLS, organized into rows and columns.  The user can load
data into cells, either by typing them directly in at the graphical
user interface, or by loading them in from files.

The user can also specify constraints between the values in the cells,
for example telling the system that the value in a given cell should
always be the sum of the values in two other cells, or in a whole column
of cells.

Spreadsheets are acceptably efficient because they limit the kinds
of constraints you can specify (e.g., you typically can't have
cyclic dependencies), and because you generally don't use a spreadsheet
for operating on huge volumes of data.  (You might load tens or
hundreds or even a few thousand data points into a spreadsheet,
but probably not a million.)

Spreadsheets are limited, but within their limitations they can
be extremely useful and easy to use.  They are often used for financial
calculations and for exploratory data analysis.  It's easy to program
a formula into a spreadsheed, and then fiddle with one of the
inputs to the formula and watch the changes propagate to the outputs.
This kind of "what if analysis" makes it easier to see the impact
of things like budget changes, or different ways of normalizing
experimental data.

Spreadsheets are the most popular new form of programming language
in the last couple of decades, precisely because they are NOT
general-purpose programming langauges.  Their limitations and their
user interface make them easy for relatively naive users to grasp
and start using, without having to "learn to program" in the usual
sense.