# Automated Reasoning

*First published Wed Jul 18, 2001; substantive revision Wed Oct 5, 2005*

Reasoning is the ability to make inferences, and automated reasoning is concerned with the building of computing systems that automate this process. Although the overall goal is to mechanize different forms of reasoning, the term has largely been identified with valid deductive reasoning as practiced in mathematics and formal logic. In this respect, automated reasoning is akin to mechanical theorem proving. Building an automated reasoning program means providing an algorithmic description to a formal calculus so that it can be implemented on a computer to prove theorems of the calculus in an efficient manner. Important aspects of this exercise involve defining the class of problems the program will be required to solve, deciding what language will be used by the program to represent the information given to it as well as new information inferred by the program, specifying the mechanism that the program will use to conduct deductive inferences, and figuring out how to perform all these computations efficiently. While basic research work continues in order to provide the necessary theoretical framework, the field has reached a point where automated reasoning programs are being used by researchers to attack open questions in mathematics and logic, and to solve problems in engineering.

- 1. The Problem Domain
- 2. Language Representation
- 3. Deduction Calculi
- 4. Resolution
- 5. Sequent Deduction
- 6. Natural Deduction
- 7. The Matrix Connection Method
- 8. Term Rewriting
- 9. Mathematical Induction
- 10. Higher-Order Logic
- 11. Non-classical Logics
- 12. Logic Programming
- 13. SAT Solvers
- 14. Deductive Computer Algebra
- 15. Conclusion
- Bibliography
- Other Internet Resources
- Related Entries

## 1. The Problem Domain

A problem being presented to an automated reasoning program consists
of two main items, namely a statement expressing the particular
question being asked called the **problem's conclusion**,
and a collection of statements expressing all the relevant information
available to the program — the **problem's
assumptions**. Solving a problem means proving the conclusion
from the given assumptions by the systematic application of rules of
deduction embedded within the reasoning program. The problem solving
process ends when one such proof is found, when the program is able to
detect the non-existence of a proof, or when it simply runs out of
resources.

A first important consideration in the design of an automated
reasoning program is to delineate the class of problems that the
program will be required to solve — the **problem
domain**. The domain can be very large, as would be the case for
a general-purpose theorem prover for first-order logic, or be more
restricted in scope as in a special-purpose theorem prover for Tarski's
geometry, or the modal logic K. A typical approach in the design of an
automated reasoning program is to provide it first with sufficient
logical power (e.g., first-order logic) and then further demarcate its
scope to the particular domain of interest defined by a set of
**domain axioms**. To illustrate, EQP, a theorem-proving
program for equational logic, was used to solve an open question in
Robbins algebra (McCune 1997): *Are all Robbins algebras
Boolean?* For this, the program was provided with the axioms
defining a Robbins algebra:

(A1) x+y=y+x(commutativity) (A2) ( x+y) +z=x+ (y+z)(associativity) (A3) −(−( x+y) + −(x+ −y)) =x(Robbins equation)

The program was then used to show that a characterization of Boolean algebra that uses Huntington's equation,

−(−follows from the axioms. We should remark that this problem is non-trivial since deciding whether a finite set of equations provides a basis for Boolean algebra is undecidable, that is, it does not permit an algorithmic representation; also, the problem was attacked by Robbins, Huntington, Tarski and many of his students with no success. The key step was to establish that all Robbins algebras satisfyx+y) + −(−x+ −y) =x,

(∃since it was known that this formula is a sufficient condition for a Robbins algebra to be Boolean. When EQP was supplied with this piece of information, the program provided invaluable assistance by completing the proof automatically.x)(∃y)(x+y=x),

A special-purpose theorem prover does not draw its main benefit by restricting its attention to the domain axioms but from the fact that the domain may enjoy particular theorem-proving techniques which can be hardwired — coded — within the reasoning program itself and which may result in a more efficient logic implementation. Much of EQP's success at settling the Robbins question can be attributed to its built-in associative-commutative inference mechanisms.

## 2. Language Representation

A second important consideration in the building of an automated
reasoning program is to decide (1) how problems in its domain will be
presented to the reasoning program; (2) how they will actually be
represented internally within the program; and, (3) how the solutions
found — completed proofs — will be displayed back to the
user. There are several formalisms available for this, and the choice
is dependent on the problem domain and the underlying deduction
calculus used by the reasoning program. The most commonly used
formalisms include standard first-order logic, typed λ-calculus,
and clausal logic. We take up clausal logic here and assume that the
reader is familiar with the rudiments of first-order logic; for the
typed λ-calculus the reader may want to check Church 1940.
Clausal logic is a quantifier-free variation of first-order logic and
has been the most widely used notation within the automated reasoning
community. Some definitions are in order: A **term** is a
constant, a variable, or a function whose arguments are themselves
terms. For example, *a*, *x*, *f*(*x*), and
*h*(*c*,*f*(*z*),*y*) are all terms.
A **literal** is either an atomic formula, e.g.
*F*(*x*), or the negation of an atomic formula, e.g.
~*R*(*x*,*f*(*a*)). Two literals are
**complementary** if one is the negation of the other. A
**clause** is a (possibly empty) finite disjunction of
literals *l*_{1}
…
*l _{n}* where no literal
appears more than once in the clause (that is, clauses can be
alternatively treated as sets of literals).

**Ground**terms, ground literals, and ground clauses have no variables. The

**empty clause**, [ ], is the clause having no literals and, hence, is unsatisfiable — false under any interpretation. Some examples: ~

*R*(

*a*,

*b*), and

*F*(

*a*) ~

*R*(

*f*(

*x*),

*b*)

*F*(

*z*) are both examples of clauses but only the former is ground. The general idea is to be able to express a problem's formulation as a set of clauses or, equivalently, as a formula in

**conjunctive normal form (CNF)**, that is, as a conjunction of clauses.

For formulas already expressed in standard logic notation, there is
a systematic two-step procedure for transforming them into conjunctive
normal form. The first step consists in re-expressing a formula into a
semantically equivalent formula in **prenex normal form**,
(Θ*x*_{1})…(Θ*x _{n}*)α(

*x*

_{1},…,

*x*), consisting of a string of quantifiers (Θ

_{n}*x*

_{1})…(Θ

*x*) followed by a quantifier-free expression α(

_{n}*x*

_{1},…,

*x*) called the

_{n}**matrix**. The second step in the transformation first converts the matrix into conjunctive normal form by using well-known logical equivalences such as DeMorgan's laws, distribution, double-negation, and others; then, the quantifiers in front of the matrix, which is now in conjunctive normal form, are dropped according to certain rules. In the presence of existential quantifiers, this latter step does not always preserve equivalence and requires the introduction of

**Skolem functions**whose role is to "simulate" the behaviour of existentially quantified variables. For example, applying the skolemizing process to the formula

(∀x)(∃y)(∀z)(∃u)(∀v)[R(x,y,v) ~K(x,z,u,v)]

requires the introduction of a one-place and two-place Skolem
functions, *f* and *g* respectively, resulting in the
formula

(∀x)(∀z)(∀v) [R(x,f(x),v) ~K(x,z,g(x,z),v)]

The universal quantifiers can then be removed to obtain the final
clause, *R*(*x*,*f*(*x*),*v*)
~*K*(*x*,*z*,*g*(*x*,*z*),*
v*) in our example. The Skolemizing process may not preserve
equivalence but maintains satisfiability, which is enough for
clause-based automated reasoning.

Although clausal form provides a more uniform and economical notation — there are no quantifiers and all formulas are disjunctions — it has certain disadvantages. One drawback is the exponential increase in the size of the resulting formula when transformed from standard logic notation into clausal form. The increase in size is accompanied by an increase in cognitive complexity that makes it harder for humans to read proofs written with clauses. Another disadvantage is that the syntactic structure of a formula in standard logic notation can be used to guide the construction of a proof but this information is completely lost in the transformation into clausal form.

## 3. Deduction Calculi

A third important consideration in the building of an automated reasoning program is the selection of the actual deduction calculus that will be used by the program to perform its inferences. As indicated before, the choice is highly dependent on the nature of the problem domain and there is a fair range of options available: General-purpose theorem proving and problem solving (first-order logic, simple type theory), program verification (first-order logic), distributed and concurrent systems (modal and temporal logics), program specification (intuitionistic logic), hardware verification (higher-order logic), logic programming (Horn logic), and so on.

A deduction calculus consists of a set of logical axioms and a
collection of deduction rules for deriving new formulas from previously
derived formulas. Solving a problem in the program's problem domain
then really means establishing a particular formula α — the
problem's conclusion — from the extended set Γ consisting
of the logical axioms, the domain axioms, and the problem assumptions.
That is, the program needs to determine if
Γ α. How the program goes
about establishing this semantic fact depends, of course, on the
calculus it implements. Some programs may take a very
**direct** route and attempt to establish
that
Γ α by
actually constructing a step-by-step proof of α from Γ. If
successful, this shows of course that Γ derives — proves
— α, a fact we denote by writing
Γ α. Other reasoning programs
may instead opt for a more **indirect** approach and try
to establish that
Γ α by showing that Γ ∪ {~α}
is inconsistent which, in turn, is shown by deriving a contradiction,
⊥, from the set Γ ∪ {~α}. Automated systems that
implement the former approach include natural deduction systems; the
latter approach is used by systems based on resolution, sequent
deduction, and matrix connection methods.

Soundness and completeness are two (metatheoretical) properties of a
calculus that are particularly important for automated deduction.
**Soundness** states that the rules of the calculus are
truth-preserving. For a direct calculus this means that
if
Γ α
then
Γ α. For
indirect calculi, soundness means that
if
Γ∪{~α} ⊥ then
Γ α. **Completeness** in a direct
calculus states that if
Γ α then
Γ α. For indirect calculi, the completeness
property is expressed in terms of **refutations** since
one establishes that
Γ α by showing the existence of a proof, not
of α from Γ, but of ⊥ from Γ∪{~α}.
Thus, an indirect calculus is **refutation complete**
if
Γα
implies
Γ∪{~α} ⊥. Of the two properties, soundness is the
most desirable. An incomplete calculus indicates that there are
entailment relations that cannot be established within the calculus.
For an automated reasoning program this means, informally, that there
are true statements that the program cannot prove. Incompleteness may
be an unfortunate affair but lack of soundness is a truly problematic
situation since an unsound reasoning program would be able to generate
false conclusions from perfectly true information.

It is important to appreciate the difference between a logical calculus and its corresponding implementation in a reasoning program. The implementation of a calculus invariably involves making some modifications to the calculus and this results, strictly speaking, in a new calculus. The most important modification to the original calculus is the “mechanization” of its deduction rules, that is, the specification of the systematic way in which the rules are to be applied. In the process of doing so, one must exercise care to preserve the metatheoretical properties of the original calculus.

Two other metatheoretical properties of importance to automated
deduction are decidability and complexity. A calculus is
**decidable** if it admits an algorithmic representation,
that is, if there is an algorithm that, for any given Γ and
α, it can determine in a finite amount of time the answer,
“Yes” or “No”, to the question
“Does
Γ α?”
A calculus may be undecidable in which case one needs to determine
which decidable fragment to implement. The time-space
**complexity** of a calculus specifies how efficient its
algorithmic representation is. Automated reasoning is made the more
challenging because many calculi of interest are not decidable and have
poor complexity measures forcing researchers to seek tradeoffs between
deductive power versus algorithmic efficiency.

## 4. Resolution

Of the many calculi used in the implementation of reasoning
programs, the ones based on the **resolution** principle
have been the most popular. Resolution is modeled after the chain rule
(of which Modus Ponens is a special case) and essentially states that
from *p*
*q* and
~*q*
*r* one can infer
*p*
*r*. More formally,
let *C* − *l* denote the clause *C* with the
literal *l* removed. Assume that *C*_{1} and
*C*_{2} are ground clauses containing, respectively, a
positive literal *l*_{1} and a negative literal
~*l*_{2} such that *l*_{1} and
~*l*_{2} are complementary. Then, the rule of
**ground resolution** states that, as a result of
**resolving** *C*_{1} and
*C*_{2}, one can infer (*C*_{1} −
*l*_{1})
(*C*_{2} − ~*l*_{2}):

C_{1}C_{2}(ground resolution) ( C_{1}−l_{1}) (C_{2}− ~l_{2})

Herbrand's theorem (Herbrand 1930) assures us that the
non-satisfiability of *any* set of clauses, ground or not, can
be established by using ground resolution. This is a very significant
result for automated deduction since it tells us that if a set Γ
is not satisfied by any of the infinitely many interpretations, this
fact can be determined in *finitely* many steps. Unfortunately,
a direct implementation of ground resolution using Herbrand's theorem
requires the generation of a vast number of ground terms making this
approach hopelessly inefficient. This issue was effectively addressed
by generalizing the ground resolution rule to **binary
resolution** and by introducing the notion of unification
(Robinson 1965a). Unification allows resolution proofs to be
“lifted” and be conducted at a more general level; clauses
only need to be instantiated at the moment where they are to be
resolved. Moreover, the clauses resulting from the instantiation
process do not have to be ground instances and may still contain
variables. The introduction of binary resolution and unification is
considered one of the most important developments in the field of
automated reasoning.

### Unification

A**unifier**of two expressions — terms or clauses — is a substitution that when applied to the expressions makes them equal. For example, the substitution

{is a unifier forx←b,y←b,z←f(a,b)}

since when applied to both expressions makes them equal:R(x,f(a,y)) andR(b,z)

A

R(x,f(a,y)){x←b,y←b,z←f(a,b)}= R(b,f(a,b))= R(b,z){x←b,y←b,z←f(a,b)}

**most general unifier**(mgu) produces the most general instance shared by two unifiable expressions. In the previous example, the substitution {

*x*←

*b*,

*y*←

*b*,

*z*←

*f*(

*a*,

*b*)} is a unifier but not an mgu; however, {

*x*←

*b*,

*z*←

*f*(

*a*,

*y*)} is an mgu. Note that unification attempts to "match" two expressions and this fundamental process has become a central component of most automated deduction programs, resolution-based and otherwise.

**Theory-unification**is an extension of the unification mechanism that includes built-in inference capabilities. For example, the clauses

*R*(

*g*(

*a*,

*b*),

*x*) and

*R*(

*g*(

*b*,

*a*),

*d*) do not unify but they AC-unify, where AC-unification is unification with built-in associative and commutative rules such as

*g*(

*a*,

*b*) =

*g*(

*b*,

*a*). Shifting inference capabilities into the unification mechanism adds power but at a price: The existence of an mgu for two unifiable expressions may not be unique (there could actually be infinitely many), and the unification process becomes undecidable in general.

### Binary resolution

Let*C*

_{1}and

*C*

_{2}be two clauses containing, respectively, a positive literal

*l*

_{1}and a negative literal ~

*l*

_{2}such that

*l*

_{1}and

*l*

_{2}unify with mgu θ. Then,

by binary resolution; the clause (

C_{1}C_{2}(binary resolution) ( C_{1}θ −l_{1}θ) (C_{2}θ − ~l_{2}θ)

*C*

_{1}θ −

*l*

_{1}θ) (

*C*

_{2}θ − ~

*l*

_{2}θ) is called a

**binary resolvent**of

*C*

_{1}and

*C*

_{2}.

### Factoring

If two or more literals occurring in a clause*C*share an mgu θ then

*C*θ is a

**factor**of

*C*. For example, in

*R*(

*x*,

*a*) ~

*K*(

*f*(

*x*),

*b*)

*R*(

*c*,

*y*) the literals

*R*(

*x*,

*a*) and

*R*(

*c*,

*y*) unify with mgu {

*x*←

*c*,

*y*←

*a*} and, hence,

*R*(

*c*,

*a*) ~

*K*(

*f*(

*c*),

*b*) is a factor of the original clause.

### The Resolution Principle

Let*C*

_{1}and

*C*

_{2}be two clauses. Then, a

**resolvent**obtained by

**resolution**from

*C*

_{1}and

*C*

_{2}is defined as: (a) a binary resolvent of

*C*

_{1}and

*C*

_{2}; (b) a binary resolvent of

*C*

_{1}and a factor of

*C*

_{2}; (c) a binary resolvent of a factor of

*C*

_{1}and

*C*

_{2}; or, (d) a binary resolvent of a factor of

*C*

_{1}and a factor of

*C*

_{2}.

Resolution proofs, more precisely refutations, are constructed by
deriving the empty clause [ ] from Γ ∪ {~α} using
resolution; this will always be possible if Γ ∪ {~α} is
unsatisfiable since resolution is refutation complete (Robinson 1965a).
As an example of a resolution proof, we show that the set
{(∀*x*)(*P*(*x*)
*Q*(*x*)),
(∀*x*)(*P*(*x*) ⊃
*R*(*x*)),(∀*x*)(*Q*(*x*)
⊃ *R*(*x*))}, denoted by Γ, entails the formula
(∃*x*)*R*(*x*). The first step is to find
the clausal form of Γ ∪
{~(∃*x*)*R*(*x*)}; the resulting clause set,
denoted by *S*_{0}, is shown in steps 1 to 4 in the
refutation below. The refutation is constructed by using a
level-saturation method: Compute all the resolvents of the initial set,
*S*_{0}, add them to the set and repeat the process
until the empty clause is derived. (This produces the sequence of
increasingly larger sets: *S*_{0},
*S*_{1}, *S*_{2},…) The only
constraint that we impose is that we do not resolve the same two
clauses more than once.

S_{0}1 P(x)Q(x)Assumption 2 ~P(x)R(x)Assumption 3 ~Q(x)R(x)Assumption 4 ~ R(a)Negation of the conclusion S_{1}5 Q(x)R(x)Res 1 2 6 P(x)R(x)Res 1 3 7 ~ P(a)Res 2 4 8 ~ Q(a)Res 3 4 S_{2}9 Q(a)Res 1 7 10 P(a)Res 1 8 11 R(x)Res 2 6 12 R(x)Res 3 5 13 Q(a)Res 4 5 14 P(a)Res 4 6 15 R(a)Res 5 8 16 R(a)Res 6 7 S_{3}17 R(a)Res 2 10 18 R(a)Res 2 14 19 R(a)Res 3 9 20 R(a)Res 3 13 21 [ ] Res 4 11

Although the resolution proof is successful in deriving [ ], it
has some significant drawbacks. To start with, the refutation is too
long as it takes 21 steps to reach the contradiction, [ ]. This is
due to the naïve brute-force nature of the implementation. The
approach not only generates too many formulas but some are clearly
redundant. Note how *R*(*a*) is derived six times; also,
*R*(*x*) has more “information content” than
*R*(*a*) and one should keep the former and disregard the
latter. Resolution, like all other automated deduction methods, must be
supplemented by strategies aimed at improving the efficiency of the
deduction process. The above sample derivation has 21 steps but
research-type problems command derivations with thousands or hundreds
of thousands of steps.

### Resolution Strategies

The successful implementation of a deduction calculus in an automated reasoning program requires the integration of search strategies that reduce the search space by pruning unnecessary deduction paths. Some strategies remove redundant clauses or tautologies as soon as they appear in a derivation. Another strategy is to remove more specific clauses in the presence of more general ones by a process known as**subsumption**(Robinson 1965a). Unrestricted subsumption, however, does not preserve the refutation completeness of resolution and, hence, there is a need to restrict its applicability (Loveland 1978).

**Model elimination**(Loveland 1969) can discard a sentence by showing that it is false in some model of the axioms. The subject of model generation has recently received renewed attention as a complementary process to theorem proving. The method has been used successfully by automated reasoning programs to show the independence of axioms sets and to determine the existence of discrete mathematical structures meeting some given criteria.

Instead of removing redundant clauses, some strategies prevent the
generation of useless clauses in the first place. The
**set-of-support strategy** (Wos, Carson and Robinson
1965) is one of the most powerful strategies of this kind. A subset
*T* of the set *S*, where *S* is initially Γ
∪ {~α}, is called a **set of support** of
*S* iff *S* − *T* is satisfiable.
Set-of-support resolution dictates that the resolved clauses are not
both from *S* − *T*. The motivation behind
set-of-support is that since the set Γ is usually satisfiable it
might be wise not to resolve two clauses from Γ against each
other. **Hyperresolution** (Robinson 1965b) reduces the
number of intermediate resolvents by combining several resolution steps
into a single inference step.

Independently co-discovered, **linear resolution**
(Loveland 1970, Luckham 1970) always resolves a clause against the most
recently derived resolvent. This gives the deduction a simple
“linear” structure affording a straightforward
implementation; yet, linear resolution preserves refutation
completeness. Using linear resolution we can derive the empty clause in
the above example in only eight steps:

1 P(x)Q(x)Assumption 2 ~P(x)R(x)Assumption 3 ~Q(x)R(x)Assumption 4 ~ R(a)Negation of the conclusion 5 ~ P(a)Res 2 4 6 Q(a)Res 1 5 7 R(a)Res 3 6 8 [ ] Res 4 7

With the exception of unrestricted subsumption, all the strategies
mentioned so far preserve refutation completeness. Efficiency is an
important consideration in automated reasoning and one may sometimes be
willing to trade completeness for speed. **Unit
resolution** and **input resolution** are two such
refinements of linear resolution. In the former, one of the resolved
clauses is always a literal; in the latter, one of the resolved clauses
is always selected from the original set to be refuted. Albeit
efficient, neither strategy is complete. Ordering strategies impose
some form of partial ordering on the predicate symbols, terms,
literals, or clauses occurring in the deduction. **Ordered
resolution** treats clauses not as sets of literals but as
sequences — linear orders — of literals. Ordered resolution
is extremely efficient but, like unit and input resolution, is not
refutation complete. To end, it must be noted that some strategies
improve certain aspects of the deduction process at the expense of
others. For instance, a strategy may reduce the size of the proof
search space at the expense of increasing, say, the length of the
shortest refutations.

There are several automated reasoning programs that are based on resolution, or refinements of resolution. Otter is one of the most versatile among these programs and is being used in a growing number of applications (Wos, Overbeek, Lusk and Boyle 1984). Resolution also provides the underlying logico-computational mechanism for the popular logic programming language Prolog (Clocksin and Mellish 1981).

## 5. Sequent Deduction

Hilbert-style calculi (Hilbert and Ackermann 1928) have been
traditionally used to characterize logic systems. These calculi usually
consist of a few axiom schemata and a small number of rules that
typically include modus ponens and the rule of substitution. Although
they meet the required theoretical requisites (soundness, completeness,
etc.) the approach at proof construction in these calculi is difficult
and does not reflect standard practice. It was Gentzen's goal “to
set up a formalism that reflects as accurately as possible the actual
logical reasoning involved in mathematical proofs” (Gentzen
1935). To carry out this task, Gentzen analyzed the proof-construction
process and then devised two deduction calculi for classical logic: the
natural deduction calculus, **NK**, and the sequent
calculus, **LK**. (Gentzen actually designed NK first and
then introduced LK to pursue metatheoretical investigations). The
calculi met his goal to a large extent while at the same time managing
to secure soundness and completeness. Both calculi are characterized by
a relatively larger number of deduction rules and a simple axiom
schema. Of the two calculi, LK is the one that has been most widely
used in implementations of automated reasoning programs, and it is the
one that we will discuss first; NK will be discussed in the next
section.

Although the application of the LK rules affect logic formulas, the
rules are seen as manipulating not logic formulas themselves but
sequents. **Sequents** are expressions of the form Γ
→ Δ, where both Γ and Δ are (possibly empty)
sets of formulas. Γ is the sequent's **antecedent**
and Δ its **succedent**. Sequents can be interpreted
thus: Let
be an
interpretation. Then,

satisfies the sequent Γ → Δ (written as: Γ→Δ) iffIn other words,

either α (for some α ∈ Γ) or β (for some β ∈ Δ).

Γ → Δ iff (αIf Γ or Δ are empty then they are respectively valid or unsatisfiable. An_{1}& … & α_{n}) ⊃ (β_{1}… β_{n}), where α_{1}& … & α_{n}is the iterated conjunction of the formulas in Γ and β_{1}… β_{n}is the iterated disjunction of those in Δ.

**axiom of LK**is a sequent Γ → Δ where Γ ∩ Δ ≠ Ø. Thus, the requirement that the same formula occurs at each side of the → sign means that the axioms of LK are valid, for no interpretation can then make all the formulas in Γ true and, simultaneously, make all those in Δ false. LK has two rules per logical connective, plus one extra rule: the cut rule.

AxiomsCut Rule

Γ, α → Δ, αΓ → Δ, α α, λ→Σ

Γ, λ → Δ, ΣAntecedent Rules(Θ→)Succedent Rules(→Θ)&→ Γ, α, β → Δ

Γ, α & β → Δ→& Γ → Δ, α Γ → Δ, β

Γ → Δ, α & β→ Γ, α → Δ Γ, β → Δ

Γ, α β → Δ→ Γ → Δ, α, β

Γ → Δ, α β⊃→ Γ → Δ, α Γ, β → Δ

Γ, α ⊃ β → Δ→⊃ Γ, α → Δ, β

Γ → Δ, α ⊃ β⊃≡ Γ, α, β → Δ Γ → Δ, α, β

Γ, α ≡ β → Δ≡⊃ Γ, α → Δ, β Γ, β, → Δ, α

Γ → Δ, α ≡ β~→ Γ → Δ, α

Γ, ~α → Δ→~ Γ, α → Δ

Γ → Δ, ~α∃→ Γ, α( a/x) → Δ

Γ, (∃x)α(x) → Δ∃→ Γ → Δ, α( t/x), (∃x)α(x)

Γ → Δ, (∃x)α(x)∀→ Γ, α( t/x), (∀x)α(x) → Δ

Γ, (∀x)α(x) → Δ∀→ Γ → Δ, α( a/x)

Γ → Δ, (∀x)α(x)

The sequents above a rule's line are called the **rule's
premises** and the sequent below the line is the **rule's
conclusion**. The quantification rules ∃→ and
→∀ have an eigenvariable condition that restricts their
applicability, namely that *a* must not occur in Γ,
Δ or in the quantified sentence. The purpose of this restriction
is to ensure that the choice of parameter, *a*, used in the
substitution process is completely “arbitrary”.

Proofs in LK are represented as trees where each node in the tree is
labeled with a sequent, and where the original sequent sits at the root
of the tree. The children of a node are the premises of the rule being
applied at that node. The leaves of the tree are labeled with axioms.
Here is the LK-proof of (∃*x*)*R*(*x*) from
the set {(∀*x*)(*P*(*x*)
*Q*(*x*)),
(∀*x*)(*P*(*x*) ⊃
*R*(*x*)),(∀*x*)(*Q*(*x*)
⊃ *R*(*x*))}. In the tree below, Γ stands for
this set:

Γ, P(a) →
P(a),
R(a),(∃x)R(x) |
Γ, P(a),
R(a) → R(a),
(∃x)R(x) |
Γ, Q(a) →
Q(a), R(a),
(∃x)R(x) |
Γ, Q(a),
R(a) → R(a),
(∃x)R(x) |

Γ, P(a), P(a) ⊃
R(a) → R(a),
(∃x)R(x)
Γ, P(a) → R(a),
(∃x)R(x) |
Γ, Q(a), Q(a) ⊃
R(a) → R(a),
(∃x)R(x)
Γ, Q(a) → R(a),
(∃x)R(x) |
||

Γ, P(a)
Q(a) → R(a),
(∃x)R(x)
Γ, → R(a),
(∃x)R(x)
Γ → (∃ x)R(x) |

In our example, all the leaves in the proof tree are labeled with
axioms. This establishes the validity of Γ →
(∃*x*)*R*(*x*) and, hence, the fact
that
Γ (∃*x*)*R*(*x*). LK
takes an indirect approach at proving the conclusion and this is an
important difference between LK and NK. While NK constructs an actual
proof (of the conclusion from the given assumptions), LK instead
constructs a proof that proves the existence of a proof (of the
conclusion from the assumptions). For instance, to prove that α
is entailed by Γ, NK constructs a step-by-step proof of α
from Γ (assuming that one exists); in contrast, LK first
constructs the sequent Γ → α which then attempts to
prove valid by showing that it cannot be made false. This is done by
searching for a counterexample that makes (all the sentences in)
Γ true and makes α false: If the search fails then a
counterexample does not exist and the sequent is therefore valid. In
this respect, proof trees in LK are actually refutation proofs. Like
resolution, LK is refutation complete: If
Γ α
then the sequent Γ
→ α has a proof tree.

As it stands, LK is unsuitable for automated deduction and there are some obstacles that must be overcome before it can be efficiently implemented. The reason is, of course, that the statement of the completeness of LK only has to assert, for each entailment relation, the existence of a proof tree but a reasoning program has the more difficult task of actually having to construct one. Some of the main obstacles: First, LK does not specify the order in which the rules must be applied in the construction of a proof tree. Second, and as a particular case of the first problem, the premises in the rules ∀→ and →∃ rules inherit the quantificational formula to which the rule is applied, meaning that the rules can be applied repeatedly to the same formula sending the proof search into an endless loop. Third, LK does not indicate which formula must be selected next in the application of a rule. Fourth, the quantifier rules provide no indication as to what terms or free variables must be used in their deployment. Fifth, and as a particular case of the previous problem, the application of a quantifier rule can lead into an infinitely long tree branch because the proper term to be used in the instantiation never gets chosen. Fortunately, as we will hint at below each of these problems can be successfully addressed.

Axiom sequents in LK are valid, and the conclusion of a rule is
valid iff its premises are. This fact allows us to apply the LK rules
in either direction, forwards from axioms to conclusion, or backwards
from conclusion to axioms. Also, with the exception of the cut rule,
all the rules' premises are subformulas of their respective
conclusions. For the purposes of automated deduction this is a
significant fact and we would want to dispense with the cut rule;
fortunately, the cut-free version of LK preserves its refutation
completeness (Gentzen 1935). These results provide a strong case for
constructing proof trees in a backwards fashion; indeed, by working
this way a refutation in cut-free LK gets increasingly simpler as it
progresses since subformulas are simpler than their parent formulas.
Moreover, and as far as propositional rules go, the new subformulas
entered into the tree are completely dictated by the cut-free LK rules.
Furthermore, and assuming the proof tree can be brought to completion,
branches eventually end up with atoms and the presence of axioms can be
quickly determined. Another reason for working backwards is that the
truth-functional fragment of cut-free LK is **confluent**
in the sense that the order in which the non-quantifier rules are
applied is irrelevant: If there is a proof, regardless of what you do,
you will run into it! To bring the quantifier rules into the picture,
things can be arranged so that all rules have a fair chance of being
deployed: Apply, as far as possible, all the non-quantifier rules
before applying any of the quantifier rules. This takes care of the
first and second obstacles, and it is no too difficult to see how the
third one would now be handled. The fourth and fifth obstacles can be
addressed by requiring that the terms to be used in the substitutions
be suitably selected from the Herbrand universe (Herbrand 1930).

The use of sequent-type calculi in automated theorem proving was
initiated by efforts to mechanize mathematics (Wang 1960). At the time,
resolution captured most of the attention of the automated reasoning
community but during the 1970's some researchers started to further
investigate non-resolution methods (Bledsoe 1977), prompting a frutiful
and sustained effort to develop more human-oriented theorem proving
systems (Bledsoe 1975, Nevins 1974). Eventually, sequent-type deduction
gained momentum again, particularly in its re-incarnation as
**analytic tableaux** (Fitting 1990). The method of
deduction used in tableaux is essentially cut-free LK's with sets used
in lieu of sequents.

## 6. Natural Deduction

Although LK and NK are both commonly labeled as “natural deduction” systems, it is the latter which better deserves the title due to its more natural, human-like, approach to proof construction. The rules of NK are typically presented as acting on standard logic formulas in an implicitly understood context, but they are also commonly given in the literature as acting more explicitly on “judgements”, that is, expressions of the form Γ α where Γ is a set of formulas and α is a formula. This form is typically understood as making the metastatement that there is a proof of α from Γ (Kleene 1962). Following Gentzen 1935 and Prawitz 1965 here we take the former approach. The system NK has no logical axioms and provides two introduction-elimination rules for each logical connective:

Introduction Rules(ΘI)Elimination Rules(ΘE)&I α β

α & β&E α _{1}& α_{2}

α_{i}(fori= 1,2)I α _{i}(fori= 1,2)

α_{1}α_{2}E α β

[α — γ]

[β — γ]

γ⊃I [α — β]

α ⊃ β⊃E α α ⊃ β

β≡I [α — β]

[β — α]

α ≡ β≡E α _{i}(i= 0,1) α_{0}≡ α_{1}

α_{1-i}~I [α — ⊥]

~α~E [~α — ⊥]

α∃I α( t/x)

(∃x)α(x)∃E (∃ x)α(x)

[α(a/x) — β]

β∀I α( a/x)

(∀x)α(x)∀E (∀ x)α(x)

α(t/x)

A few remarks: First, the expression [α --- γ]
represents the fact that α is an auxiliary assumption in the
proof of γ that eventually gets discharged, i.e. discarded. For
example, ∃E tells us that if in the process of constructing a
proof one has already derived (∃*x*)α(*x*)
and also β with α(*a*/*x*) as an auxiliary
assumption then the inference to β is allowed. Second, the
eigenparameter, *a*, in ∃E and ∀I must be foreign
to the premises, undischarged — “active” --
assumptions, to the rule's conclusion and, in the case of ∃E, to
(∃*x*)α(*x*). Third, ⊥ is shorthand for
two contradictory formulas, β and ~β. Finally, NK is
complete: If
Γ α then there is a proof of α from
Γ using the rules of NK.

As in LK, proofs constructed in NK are represented as trees with the
proof's conclusion sitting at the root of the tree, and the problem's
assumptions sitting at the leaves. (Proofs are also typically given as
sequences of judgements,
Γ α, running from the top to the bottom of the
printed page.) Here is a natural deduction proof tree of
(∃*x*)*R*(*x*) from
(∀*x*)(*P*(*x*)
*Q*(*x*)),
(∀*x*)(*P*(*x*) ⊃
*R*(*x*)) and (∀*x*)(*Q*(*x*)
⊃ *R*(*x*)):

(∀ x)(P(x) ⊃R(x))

P(a) ⊃R(a)(∀ x)(Q(x) ⊃R(x))

Q(a) ⊃R(a)(∀ x)(P(x)Q(x))

P(a)Q(a)[ P(a) —R(a)]

R(a)[ Q(a) —R(a)]

R(a)

R(a)

(∃x)R(x)

As in LK, a forward-chaining strategy for proof construction is not
well focused. So, although proofs are *read* forwards, that is,
from leaves to root or, logically speaking, from assumptions to
conclusion, that is not the way in which they are typically
*constructed*. A backward-chaining strategy implemented by
applying the rules in reverse order is more effective. Many of the
obstacles that were discussed above in the implementation of sequent
deduction are applicable to natural deduction as well. These issues can
be handled in a similar way, but natural deduction introduces some
issues of its own. For example, as suggested by the ⊃-Introduction
rule, to prove a goal of the form α ⊃ β one could
attempt to prove β on the assumption that α. But note that
although the goal α ⊃ β does not match the conclusion of
any other introduction rule, it matches the conclusion of all
*elimination* rules and the reasoning program would need to
consider those routes too. Similarly to forward-chaining, here there is
the risk of setting goals that are irrelevant to the proof and that
could lead the program astray. To wit: What prevents a program from
entering the never-ending process of building, say, larger and larger
conjunctions? Or, what is there to prevent an uncontrolled chain of
backward applications of, say, ⊃-Elimination? Fortunately, NK
enjoys the subformula property in the sense that each formula entering
into a natural deduction proof can be restricted to being a subformula
of Γ ∪ Δ ∪ {α}, where Δ is the set of
auxiliary assumptions made by the ~-Elimination rule. By exploiting the
subformula property a natural deduction automated theorem prover can
drastically reduce its search space and bring the backward application
of the elimination rules under control (Portoraro 1998, Sieg and Byrnes
1996). Further gains can be realized if one is willing to restrict the
scope of NK's logic to its intuitionistic fragment where every proof
has a normal form in the sense that no formula is obtained by an
introduction rule and then is eliminated by an elimination rule
(Prawitz 1965).

Implementations of automated theorem proving systems using NK deduction have been motivated by the desire to have the program reason with precisely the same proof format and methods employed by the human user. This has been particularly true in the area of education where the student is engaged in the interactive construction of formal proofs in an NK-like calculus working under the guidance of a theorem prover ready to provide assistance when needed (Portoraro 1994, Suppes 1981). Other, research-oriented, theorem provers true to the spirit of NK exist (Pelletier 1998) but are rare.

## 7. The Matrix Connection Method

The name of the matrix connection method (Bibel 1981) is indicative
of the way it operates. The term “matrix” refers to the
form in which the set of logic formulas expressing the problem is
represented; the term “connection” refers to the way the
method operates on these formulas. To illustrate the method at work, we
will use an example from propositional logic and show that *R*
is entailed by *P*
*Q*,
*P* ⊃ *R* and *Q* ⊃ *R*. This is
done by establishing that the formula

(is unsatisfiable. To do this, we begin by transforming it into conjunctive normal form:PQ) & (P⊃R) & (Q⊃R) & ~R

(PQ) & (~PR) & (~QR) & ~R

This formula is then represented as a matrix, one conjunct per row and, within a row, one disjunct per column:

PQ~ PR~ QR~ R

The idea now is to explore all the possible vertical paths running
through this matrix. A **vertical path** is a set of
literals selected from each row in the matrix such that each literal
comes from a different row. The vertical paths:

Path 1P, ~P, ~Qand ~RPath 2P, ~P,Rand ~RPath 3P,R, ~Qand ~RPath 4P,R,Rand ~RPath 5Q, ~P, ~Qand ~RPath 6Q, ~P,Rand ~RPath 7Q,R, ~Qand ~RPath 8Q,R,Rand ~R

A path is **complementary** if it contains two literals
which are complementary. For example, Path 2 is complementary since it
has both *P* and ~*P* but so is Path 6 since it contains
both *R* and ~*R*. Note that as soon as a path includes
two complementary literals there is no point in pursuing the path since
it has itself become complementary. This typically allows for a large
reduction in the number of paths to be inspected. In any event, all the
paths in the above matrix are complementary and this fact establishes
the unsatisfiability of the original formula. This is the essence of
the matrix connection method. The method can be extended to predicate
logic but this demands additional logical apparatus: Skolemnization,
variable renaming, quantifier duplication, complementarity of paths via
unification, and simultaneous substitution across all matrix paths
(Bibel 1981, Andrews 1981). Variations of the method have been
implemented in reasoning programs in higher-order logic (Andrews 1981)
and non-classical logics (Wallen 1990).

## 8. Term Rewriting

Equality is an important logical relation whose behavior within
automated deduction deserves its own separate treatment.
**Equational logic** and, more generally, **term
rewriting** treat equality-like equations as **rewrite
rules**, also known as reduction or demodulation rules. An
equality statement like *f*(*a*)= *a* allows the
simplification of a term like
*g*(*c*,*f*(*a*)) to
*g*(*c*,*a*). However, the same equation also has
the potential to generate an unboundedly large term:
*g*(*c*,*f*(*a*)),
*g*(*c*,*f*(*f*(*a*))),
*g*(*c*,*f*(*f*(*f*(*a*)))),
… . What distinguishes term rewriting from equational logic is
that in term rewriting equations are used as unidirectional reduction
rules as opposed to equality which works in both directions. Rewrite
rules have the form *t*_{1} ⇒
*t*_{2} and the basic idea is to look for terms
*t* occurring in expressions *e* such that *t*
unifies with *t*_{1} with unifier θ so that the
occurrence *t*_{1}θ in *e*θ can be
replaced by *t*_{2}θ. For example, the rewrite
rule *x* + 0 ⇒ *x* allows the rewriting of
*succ*(*succ*(0) + 0) as
*succ*(*succ*(0)).

To illustrate the main ideas in term rewriting, let us explore an
example involving symbolic differentiation (the example and ensuing
discussion are adapted from Chapter 1 of Baader and Nipkow 1998). Let
*der* denote the derivative respect to *x*, let
*y* be a variable different from *x*, and let *u*
and *v* be variables ranging over expressions. We define the
rewrite system:

(R1)der(x) ⇒ 1

(R2)der(y) ⇒ 0

(R3)der(u+v) ⇒der(u) +der(v)

(R4)der(u×v) ⇒ (u×der(v)) + (der(u) ×v)

Again, the symbol ⇒ indicates that a term matching the
left-hand side of a rewrite rule should be replaced by the rule's
right-hand side. To see the differentiation system at work, let us
compute the derivative of *x* × *x* respect to
*x*, *der*(*x* × *x*):

der(x×x)⇒ ( x×der(x)) + (der(x) ×x)by R4 ⇒ ( x× 1) + (der(x) ×x)by R1 ⇒ ( x× 1) + (1 ×x)by R1

At this point, since none of the rules (R1)-(R4) applies, no further
reduction is possible and the rewriting process ends. The final
expression obtained is called a **normal form**, and its
existence motivates the following question: Is there an expression
whose reduction process will never terminate when applying the rules
(R1)-(R4)? Or, more generally: Under what conditions a set of rewrite
rules will always stop, for any given expression, at a normal form
after finitely many applications of the rules? This fundamental
question is called the **termination** problem of a
rewrite system, and we state without proof that the system (R1)-(R4)
meets the termination condition.

There is the possibility that when reducing an expression, the set
of rules of a rewrite system could be applied in more than one way.
This is actually the case in the system (R1)-(R4) where in the
reduction of *der*(*x* × *x*) we could have
applied R1 first to the second sub-expression in (*x* ×
*der*(*x*)) + (*der*(*x*) ×
*x*), as shown below:

der(x×x)⇒ ( x×der(x)) + (der(x) ×x)by R4 ⇒ ( x×der(x)) + (1 ×x)by R1 ⇒ ( x× 1) + (1 ×x)by R1

Following this alternative course of action, the reduction
terminates with the same normal form as in the previous case. This
fact, however, should not be taken for granted: A rewriting system is
said to be **(globally) confluent** if and only if
independently of the order in which its rules are applied every
expression always ends up being reduced to its one and only normal
form. It can be shown that (R1)-(R4) is confluent and, hence, we are
entitled to say: “Compute *the* derivative of an
expression” (as opposed to simply “*a*”
derivative). Adding more rules to a system in an effort to make it more
practical can have undesired consequences. For example, if we add the
rule

(R5)u+ 0 ⇒u

to (R1)-(R4) then we will be able to further reduce certain
expressions but at the price of losing confluency. The following
reductions show that *der*(*x* + 0) now has two normal
forms:

der(x+ 0)⇒ der(x) +der(0)by R3 ⇒ 1 + der(0)by R1

der(x+ 0)⇒ der(x)by R5 ⇒ 1 by R1

Adding the rule, (R6) *der*(0) ⇒ 0, would allow the
further reduction of 1 + *der*(0) to 1 + 0 and then, by R5, to
1. Although the presence of this new rule actually increases the number
of alternative paths — *der*(*x* + 0) can now be
reduced in four possible ways -- they all end up with the same normal
form, namely 1. This is no coincidence as it can be shown that (R6)
actually restores confluency. This motivates another fundamental
question: Under what conditions can a non-confluent system be made into
an equivalent confluent one? The Knuth-Bendix
**completion** algorithm (Knuth and Bendix 1970) gives a
partial answer to this question.

Term rewriting, like any other automated deduction method, needs strategies to direct its application. Rippling (Bundy, Stevens and Harmelen 1993, Basin and Walsh 1996) is a heuristic that has its origins in inductive theorem-proving that uses annotations to selectively restrict the rewriting process.

## 9. Mathematical Induction

Mathematical induction is a very important technique of theorem proving in mathematics and computer science. Problems stated in terms of objects or structures that involve recursive definitions or some form of repetition invariably require mathematical induction for their solving. In particular, reasoning about the correctness of computer systems requires induction and an automated reasoning program that effectively implements induction will have important applications.

To illustrate the need for mathematical induction, assume that a
property φ is true of the number zero and also that if true of a
number then is true of its successor. Then, with our deductive systems,
we can deduce that for any given number *n*, φ is true of
it, φ(*n*). But we cannot deduce that φ is true of all
numbers, (∀*x*)φ(*x*); this inference step
requires the rule of mathematical induction:

α(0) [α( n) — α(succ(n))](mathematical induction) (∀ x)α(x)

In other words, to prove that
(∀*x*)α(*x*) one proves that α(0) is
the case, and that α(*succ*(*n*)) follows from the
assumption that α(*n*). The implementation of induction in
a reasoning system presents very challenging search control problems.
The most important of these is the ability to determine the particular
way in which induction will be applied during the proof, that is,
finding the appropriate induction schema. Related issues include
selecting the proper variable of induction, and recognizing all the
possible cases for the base and the inductive steps.

Nqthm (Boyer and Moore 1979) has been one of the most successful implementations of automated inductive theorem proving. In the spirit of Gentzen, Boyer and Moore were interested in how people prove theorems by induction. Their theorem prover is written in the functional programming language Lisp which is also the language in which theorems are represented. For instance, to express the commutativity of addition the user would enter the Lisp expression (EQUAL (PLUS X Y) (PLUS Y X)). Everything defined in the system is a functional term, including its basic “predicates”: T, F, EQUAL X Y, IF X Y Z, AND, NOT, etc. The program operates largely as a black box, that is, the inner working details are hidden from the user; proofs are conducted by rewriting terms that posses recursive definitions, ultimately reducing the conclusion's statement to the T predicate. The Boyer-Moore theorem prover has been used to check the proofs of some quite deep theorems (Boyer, Kaufmann, and Moore 1995). Lemma caching, problem statement generalization, and proof planning are techniques particularly useful in inductive theorem proving (Bundy, Harmelen and Hesketh 1991).

## 10. Higher-Order Logic

Higher-order logic differs from first-order logic in that
quantification over functions and predicates is allowed. The statement
“*Any two people are related to each other in one way or
another*” can be legally expressed in higher-order logic as
(∀*x*)(∀*y*)(∃*R*)*R*(*
x*,*y*) but not in first-order logic. Higher-order logic is
inherently more expressive than first-order logic and is closer in
spirit to actual mathematical reasoning. For example, the notion of set
finiteness cannot be expressed as a first-order concept. Due to its
richer expressiveness, it should not come as a surprise that
implementing an automated theorem prover for higher-order logic is more
challenging than for first-order logic. This is largely due to the fact
that unification in higher-order logic is more complex than in the
first-order case: unifiable terms do not always posess a most general
unifier, and higher-order unification is itself undecidable. Finally,
given that higher-order logic is incomplete, there are always proofs
that will be entirely out of reach for any automated reasoning
program.

Methods used to automate first-order deduction can be adapted to
higher-order logic. TPS (Andrews *et al*. 1996) is a theorem
proving system for higher-order logic that uses Church's typed
λ-calculus as its logical representation language and is based
on a connection-type deduction mechanism that incorporates Huet's
unification algorithm (Huet 1975). As a sample of the capabilities of
TPS, the program has proved automatically that a subset of a finite
set is finite, the equivalence among several formulations of the Axiom
of Choice, and Cantor's Theorem that a set has more subsets than
members. The latter was proved by the program by asserting that there
is no onto function from individuals to sets of individuals, with the
proof proceeding by a diagonal argument. HOL (Gordon and Melham 1993)
is another higher-order proof development system primarily used as an
aid in the development of hardware and software safety-critical
systems. HOL is based on the LCF approach to interactive theorem
proving (Gordon, Milner and Wadsworth 1979), and it is built on the
strongly typed functional programming language ML. HOL, like TPS, can
operate in automatic and interactive mode. Availability of the latter
mode is welcomed since the most useful automated reasoning systems may
well be those which place an emphasis on interactive theorem proving
(Farmer, Guttman and Thayer 1993) and can be used as assistants
operating under human guidance. (Harrison 2001) discusses the
verification of floating-point algorithms and the non-trivial
mathematical properties that are proved by HOL under the guidance of
the user. Isabelle (Paulson 1994) is a generic, higher-order,
framework for rapid prototyping of deductive systems. Object logics
can be formulated within Isabelle's metalogic by using its many
syntactic and deductive tools. Isabelle also provides some ready-made
theorem proving environments, including Isabelle/HOL, Isabelle/ZF and
Isabelle/FOL, which can be used as starting points for applications
and further development by the user. Isabelle/ZF has been used to
prove equivalent formulations of the Axiom of Choice, formulations of
the Well-Ordering Principle, as well as the key result about cardinal
arithmetic that, for any infinite cardinal κ, κ
κ = κ (Paulson and Grabczewski
1996).

## 11. Non-classical Logics

Non-classical logics such as modal logics, intuitionsitic logic,
multi-valued logics, autoepistemic logics, non-monotonic reasoning,
commonsense and default reasoning, relevant logic, paraconsistent
logic, and so on, have been increasingly gaining the attention of the
automated reasoning community. One of the reasons has been the natural
desire to extend automated deduction techniques to new domains of
logic. Another reason has been the need to mechanize non-classical
logics as an attempt to provide a suitable foundation for artificial
intelligence. A third reason has been the desire to attack some
problems that are combinatorially too large to be handled by paper and
pencil. Indeed, some of the work in automated non-classical logic
provides a prime example of automated reasoning programs at work. To
illustrate, the Ackerman Constant Problem asks for the number of
non-equivalent formulas in the relevant logic R. There are actually
3,088 such formulas (Slaney 1984) and the number was found by
“sandwiching” it between a lower and an upper limit, a task
that involved constraining a vast universe of 20^{400}
20-element models in search of those models that rejected non-theorems
in R. It is safe to say that this result would have been impossible to
obtain without the assistance of an automated reasoning program.

There have been three basic approaches to automate the solving of problems in non-classical logic (McRobie 1991). One approach has been, of course, to try to mechanize the non-classical deductive calculi. Another has been to simply provide an equivalent formulation of the problem in first-order logic and let a classical theorem prover handle it. A third approach has been to formulate the semantics of the non-classical logic in a first-order framework where resolution or connection-matrix methods would apply.

**Modal logic**. Modal logics find extensive use in
computing science as logics of knowledge and belief, logics of
programs, and in the specification of distributed and concurrent
systems. Thus, a program that automates reasoning in a modal logic such
as K, K4, T, S4, or S5 would have important applications. With the
exception of S5, these logics share some of the important
metatheoretical results of classical logic, such as cut-elimination,
and hence cut-free (modal) sequent calculi can be provided for them,
along with techniques for their automation. Connection methods (Andrews
1981, Bibel 1981) have played an important role in helping to
understand the source of redundancies in the search space induced by
these modal sequent calculi and have provided a unifying framework not
only for modal logics but also for intuitionistic and classical logic
as well (Wallen 1990).

**Intuitionistic logic**. There are different ways in
which intuitionsitic logic can be automated. One is to directly
implement the intuitionistic versions of Gentzen's sequent and natural
deduction calculi, LJ and NJ respectively. This approach inherits the
stronger normalization results enjoyed by these calculi allowing for a
more compact mechanization than their classical counterparts. Another
approach at mechanizing intuitionistic logic is to exploit its semantic
similarities with the modal logic S4 and piggy back on an automated
implementation of S4. Automating intuitionistic logic has applications
in software development since writing a program that meets a
specification corresponds to the problem of proving the specification
within an intuitionistic logic (Martin-Löf 1982). A system that
automated the proof construction process would have important
applications in algorithm design but also in constructive mathematics.
Nuprl (Constable *et al*. 1986) is a computer system supporting a
particular mathematical theory, namely constructive type theory, and
whose aim is to provide assistance in the proof development process.
The focus is on logic-based tools to support programming and on
implementing formal computational mathematics. Over the years the scope
of the Nuprl project has expanded from “proofs-as-programs”
to “systems-as-theories”.

## 12. Logic Programming

Logic programming, particularly represented by the language
**Prolog** (Colmerauer *et al*. 1973), is probably
the most important and widespread application of automated theorem
proving. During the early 1970s, it was discovered that logic could
be used as a programming language (Kowalski 1974). What distinguishes
logic programming from other traditional forms of programming is that
logic programs, in order to solve a problem, do not explicitly state
*how* a specific computation is to be performed; instead, a
logic program states *what* the problem is and then delegates
the task of actually solving it to an underlying theorem prover. In
Prolog, the theorem prover is based on a refinement of resolution
known as SLD-resolution. SLD-resolution is a variation of linear input
resolution that incorporates a special rule for selecting the next
literal to be resolved upon; SLD-resolution also takes into
consideration the fact that, in the computer's memory, the literals in
a clause are actually ordered, that is, they form a sequence as
opposed to a set. A Prolog **program** consists of
clauses stating known facts and rules. For example, the following
clauses make some assertions about flight connections:

flight(toronto,london).

flight(london,rome).

flight(chicago,london).

flight(X,Y) :-flight(X,Z) ,flight(Z,Y).

The clause *flight*(*toronto*, *london*) is a
fact while *flight*(*X*,*Y*) :-
*flight*(*X*,*Z*) ,
*flight*(*Z*,*Y*) is a rule, written by convention
as a reversed conditional (the symbol “:-“ means
“if”; the comma means “and”; terms starting in
uppercase are variables). The former states that there is flight
connection between Toronto and London; the latter states that there is
a flight between cities *X* and *Y* if, for some city
*Z*, there is a flight between *X* and *Z* and one
between *Z* and *Y*. Clauses in Prolog programs are a
special type of Horn clauses having precisely one positive literal:
**Facts** are program clauses with no negative literals
while **rules** have at least one negative literal. (Note
that in standard clause notation the program rule in the previous
example would be written as *flight*(*X*,*Y*)
~*flight*(*X*,*Z*)
~*flight*(*Z*,*Y*).) The
specific form of the program rules is to effectively express statements
of the form: “I*f these conditions over here are jointly met
then this other fact will follow*”. Finally, a
**goal** is a Horn clause with no positive literals. The
idea is that, once a Prolog program Π has been written, we can then
try to determine if a new clause γ, the goal, is entailed by
Π, Π
γ; the Prolog
prover does this by attempting to derive a contradiction from Π
∪ {~γ}. We should remark that program facts and rules alone
cannot produce a contradiction; a goal must enter into the process.
Like input resolution, SLD-resolution is not refutation complete for
first-order logic but it is complete for the Horn logic of Prolog
programs. The fundamental theorem: If Π is a Prolog program and
γ is the goal clause then
Π γ iff Π ∪ {~γ}
[ ] by SLD-resolution (Lloyd
1984).

For instance, to find out if there is a flight from Toronto to Rome
one asks the Prolog prover to see if the clause
*flight*(*toronto*, *rome*) follows from the given
program. To do this, the prover adds ~*flight*(*toronto*,
*rome*) to the program clauses and attempts to derive the empty
clause, [ ], by SLD-resolution:

1 flight(toronto,london)Program clause 2 flight(london,rome)Program clause 3 flight(chicago,london)Program clause 4 flight(X,Y) ~flight(X,Z) ~flight(Z,Y)Program clause 5 ~ flight(toronto,rome)Negation of the conclusion 6 ~flight(toronto,Z) ~flight(Z,rome)Res 5 4 7 ~flight(london,rome)Res 6 1 8 [ ] Res 7 2

The conditional form of rules in Prolog programs adds to their
readability and also allows reasoning about the underlying refutations
in a more friendly way: To prove that there is a flight between Toronto
and Rome, *flight*(*toronto*,*rome*), unify this
clause with the consequent *flight*(*X*,*Y*) of
the fourth clause in the program which itself becomes provable if both
*flight*(*toronto*,*Z*) and
*flight*(*Z*,*rome*) can be proved. This can be
seen to be the case under the substitution {*Z* ←
*london*} since both
*flight*(*toronto*,*london*) and
*flight*(*london*,*rome*) are themselves provable.
Note that the theorem prover not only establishes that there is a
flight between Toronto and Rome but it can also come up with an actual
itinerary, Toronto-London-Rome, by extracting it from the unifications
used in the proof.

There are at least two broad problems that Prolog must address in
order to achieve the ideal of a logic programming language. Logic
programs consist of facts and rules describing what is true; anything
that is not provable from a program is deemed to be false. In regards
to our previous example, *flight*(*toronto*,
*boston*) is not true since this literal cannot be deduced from
the program. The identification of falsity with non-provability is
further exploited in most Prolog implementations by incorporating an
operator, **not**, that allows programmers to explicitly
express the negation of literals (or even subclauses) within a program.
By definition, not *l* succeeds if the literal *l* itself
fails to be deduced. This mechanism, known as
**negation-by-failure**, has been the target of criticism.
Negation-by-failure does not fully capture the standard notion of
negation and there are significant logical differences between the two.
Standard logic, including Horn logic, is monotonic which means that
enlarging an axiom set by adding new axioms simply enlarges the set of
theorems derivable from it; negation-by-failure, however, is
non-monotonic and the addition of new program clauses to an existing
Prolog program may cause some goals to cease from being theorems. A
second issue is the **control problem**. Currently,
programmers need to provide a fair amount of control information if a
program is to achieve acceptable levels of efficiency. For example, a
programmer must be careful with the order in which the clauses are
listed within a program, or how the literals are ordered within a
clause. Failure to do a proper job can result in an inefficient or,
worse, non-terminating program. Programmers must also embed hints
within the program clauses to prevent the prover from revisiting
certain paths in the search space (by using the **cut**
operator) or to prune them altogether (by using **fail**).
Last but not least, in order to improve their efficiency, many
implementations of Prolog do not implement unification fully and bypass
a time-consuming yet critical test — the so-called occurs-check
— responsible for checking the suitability of the unifiers being
computed. This results in an unsound calculus and may cause a goal to
be entailed by a Prolog program (from a computational point of view)
when in fact it should not (from a logical point of view).

There are variations of Prolog intended to extend its scope. By implementing a model elimination procedure, the Prolog Technology Theorem Prover (PPTP) (Stickel 1992) extends Prolog into full first-order logic. The implementation achieves both soundness and completeness. Moving beyond first-order logic, λProlog (Miller and Nadathur 1988) bases the language on higher-order constructive logic.

## 13. SAT Solvers

The problem of determining the satisfiability of logic formulas has
received much renewed attention lately by the automated reasoning
community due to its important applicability in industry. A
propositional formula is **satisfiable** if there is an
assignment of truth-values to its variables that makes the formula
true. For example, the assignment (*P* ← true, *Q*
← true, *R* ← false) does not make (*P*
*R*) & ~*Q* true but
(*P* ← true, *Q* ← false, *R* ←
false) does and, hence, the formula is satisfiable. Determining whether
a formula is satisfiable or not is called the Boolean Satisfiability
Problem -- SAT for short -- and for a formula with *n* variables
SAT can be settled thus: Inspect each of the 2^{n}
possible assignments to see if there is at least one assignment that
satisfies the formula, i.e. makes it true. This method is clearly
complete: If the original formula is satisfiable then we will
eventually find one such satisfying assignment; but if the formula is
contradictory (i.e. non-satisfiable), we will be able to determine this
too. Just as clearly, and particularly in this latter case, this search
takes an exponential amount of time, and the desire to conceive more
efficient algorithms is well justified particularly because many
computing problems of great practical importance such as
graph-theoretic problems, network design, storage and retrieval,
scheduling, program optimization, and many others (Garey and Johnson
1979) can be expressed as SAT instances, i.e. as the SAT question of
some propositional formula representing the problem. Given that SAT is
NP-complete (Cook 1971) it is very unlikely that a polynomial algorithm
exists for it; however, this does not preclude the existence of
sufficiently efficient algorithms for particular cases of SAT
problems.

The Davis-Putnam-Logemann-Loveland (DPLL) algorithm was one of the first SAT search algorithms (Davis and Putnam 1960; Davis, Logemman and Loveland 1962) and is still considered one of the best complete SAT solvers; many of the complete SAT procedures in existence today can be considered optimizations and generalizations of DPLL. In essence, DPLL search procedures proceed by considering ways in which assignments can be chosen to make the original formula true. For example, consider the formula in CNF

P& ~Q& (~PQR) & (P~S)

Since *P* is a conjunct, but also a unit clause, *P*
must be true if the entire formula is to be true. Moreover, the value
of ~*P* does not contribute to the truth of ~*P*
*Q*
*R* and *P*
~*S* is
true regardless of *S*. Thus, the whole formula reduces to

~Q& (QR)

Similarly, ~*Q* must be true and the formula further reduces
to

R

which forces *R* to be true. From this process we can recover
the assignment (*P* ← true, *Q* ← false,
*R* ← true, *S* ← false) proving that the
original formula is satisfiable. A formula may cause the algorithm to
branch; the search through a branch reaches a dead end the moment a
clause is deemed false -- a **conflicting clause** -- and
all variations of the assignment that has been partially constructed up
to this point can be discarded. To illustrate:

1 R& (PQ) & (~PQ) & (~P~Q)Given 2 ( PQ) & (~PQ) & (~P~Q)By letting R← true3 Q& ~QBy letting P← true4 ? Conflict: Qand ~Qcannot both be true5 ( PQ) & (~PQ) & (~P~Q)Backtrack to (2): R← true still holds6 ~ PBy letting Q← true7 true By letting ~ Pbe true, i.e.P← false

Hence, the formula is satisfiable by the existence of (*P*
← false, *Q* ← true, *R* ← true). DPLL
algorithms are made more efficient by strategies such as **term
indexing** (ordering of the formula variables in an advantageous
way), **chronological backtracking** (undoing work to a
previous branching point if the process leads to a conflicting clause),
and **conflict-driven learning** (determining the
information to keep and where to backtrack). The combination of these
strategies results in a large prune of the search space; for a more
extensive discussion the interested reader is directed to (Zhang and
Malik 2002).

A quick back-envelope calculation reveals the staggering computing
times of (algorithms for) SAT-type problems represented by formulas
with as little as, say, 60 variables. To wit: A problem represented as
a Boolean formula with 10 variables that affords a linear solution
taking one hundredth of a second to complete would take just four
hundredths and six hundredths of a second to complete if the formula
had instead 40 and 60 variables respectively. In dramatic contrast, if
the solution to the problem were exponential (say
2^{n}) then the times to complete the job for 10, 40
and 60 variables would be respectively one thousandth of a second, 13
days, and 365 centuries. It is a true testament to the ingenuity of the
automated reasoning community and the power of current SAT-based search
algorithms that real-world problems with thousands of variables can be
handled with reasonable efficency. (Küchlin and Sinz 2000)
dicusses a SAT application in the realm of industrial automotive
product data management where 18,000 (elementary) Boolean formulas and
17,000 variables are used to express constraints on orders placed by
customers. As another example, (Massacci and Marraro 2000) discusses an
application in logical cryptanalysis, that is, the verification of
properties of cryptographic algorithms expressed as SAT problems. They
demonstrate how finding a key with a cryptographic attack is analogous
to finding a model -- assignment -- for a Boolean formula; the formula
in their application encodes the commercial version of the U.S Data
Encryption Standard (DES) with the encoding requiring 60,000 clauses
and 10,000 variables.

Although on the surface the SAT question is conceptually very simple, its inner nature is not well understood -- there are no criteria that can be generally applied to answer as to why one SAT problem is harder than another. It should then come as no surprise that algorithms that tend to do well on some SAT instances do not perform so well on others, and efforts are being spent in designing hybrid algorithmic solutions that combine the strength of complementary approaches -- see (Prasad, Biere and Gupta 2005) for an application of this hybrid approach in the functional verification of hardware digital design.

The DPLL search procedure has been extended to quantified logic. MACE is a popular program based on the DPLL algorithm that searches for finite models of first-order formulas with equality. As an example (McCune 2001), to show that not all groups are commutative one can direct MACE to look for a model of the group axioms that also falsifies the commutation law or, equivalently, to look for a model of:

(G1) ex=x(left identity) (G2) i(x)x=e(left inverse) (G3) ( xy)z=x(yz)(associativity) (DC) ab≠ba(denial of commutativity)

When asked to go to work, MACE succeeds in a little over one second
with a six-element model, where *e* = 0, *a* = 1,
*b* = 2, and where
and
*i* are defined as:

0 1 2 3 4 5 0 0 1 2 3 4 5 1 1 0 4 5 2 3 2 2 3 0 1 5 4 3 3 2 5 4 0 1 4 4 5 1 0 3 2 5 5 4 3 2 1 0

x0 1 2 3 4 5 i(x)0 1 2 3 4 5

This example also illustrates the benefits of using an automated deduction system: How long would have taken the human researcher to come up with the above or a similar model? For more challenging problems, the program is being used as a practical complement to the resolution-based theorem prover Otter, with Otter searching for proofs and MACE jointly looking for (counter) models.

## 14. Deductive Computer Algebra

To prove automatically even the simplest mathematical facts requires a significant amount of domain knowledge. As a rule, automated theorem provers lack such rich knowledge and attempt to construct proofs from first principles by the application of elementary deduction rules. This approach results in very lengthy proofs (assuming a proof is found) with each step being justified at a most basic logical level. Larger inference steps and a significant improvement in mathematical reasoning capability can be obtained, however, by having a theorem prover interact with a computer algebra system, also known as a symbolic computation system. A computer algebra system is a computer program that assists the user with the symbolic manipulation and numeric evaluation of mathematical expressions. For example, when asked to compute the improper integral

a competent computer algebra system would quickly reply with the answer

Essentially, the computer algebra system operates by taking the
input expression entered by the user and successively applies to it a
series of transformation rules until the result no longer changes (see
the section *Term Rewriting* in this article for more details).
These transformation rules encode a significant amount of domain
(mathematical) knowledge making symbolic systems powerful tools in the
hands of applied mathematicians, scientists, and engineers trying to
attack problems in a wide variety of fields ranging from calculus and
the solving of equations to combinatorics and number theory.

Problem solving in mathematics involves the interplay of deduction
and calculation, with decision procedures being a reminder of the fuzzy
division between the two; hence, the integration of deductive and
symbolic systems, which we coin here as **Deductive Computer
Algebra (DCA)**, is bound to be a fruitful combination.
Analytica (Bauer, Clarke and Zhao 1998) is a theorem prover built on
top of Mathematica, a powerful and popular computer algebra system.
Besides supplying the deductive engine, Analytica also extends
Mathematica's capabilities by defining a number of rewrite rules --
more precisely, identities about summations and inequalities -- that
are missing in the system, as well as providing an implementation of
Gosper's algorithm for finding closed forms of indefinite
hypergeometric summations. Equipped with this extended knowledge,
Analytica can prove semi-automatically some nontrivial theorems from
real analysis, including a series of lemmas directly leading to a proof
of the Bernstein Approximation Theorem. Here is the statement of the
theorem simply to give the reader a sense of the level of the
mathematical richness we are dealing with:

Bernstein Approximation Theorem. Let I = [0, 1] be the closed unit interval,fa real continuous function on I, andB_{n}(x,f) thenthBernstein polynomial forfdefined as

Then, on the interval I, the sequence of Bernstein polynomials for

fconverges uniformly tof.

To be frank, the program is supplied with key information to
establish the lemmas that lead to this theorem but the amount and type
of deductive work done by the program is certainly nontrivial. (Clarke
and Zhao 1994) provides examples of fully automated proofs using
problems in Chapter 2 of *Ramanujan's Notebooks* (Berndt 1985)
including the following example that the reader is invited to try. Show
that:

where *A _{0}*=1,

*A*=3

_{n+1}*A*+1 and (

_{n}*x*,

*n*) is Ramanujan's abbreviation for

Analytica's proof of this identity proceeds by simplifying both the
left- and right-hand sides of the equality and showing that both sides
reduce to the same expression, -*H _{n}* +

*H*. The simplification uses the added summation identities mentioned before as well as some elementary properties of the harmonic numbers,

_{A}rThe resulting proof has 28 steps (some of which are nontrivial) taking about 2 minutes to find.

In (Kerber, Kohlhase and Sorge 1998) the authors use the Ωmega planning system as the overall way to integrate theorem proving and symbolic computation. (Harrison and Théry 1998) is an example of the integration of a higher-order logic theorem proving system (HOL) with a computer algebra system (Maple).

Their great power notwithstanding, symbolic algebra systems do not enforce the same level of rigor and formality that is the essence of automated deduction systems. In fact, the mathematical semantics of some of the knowledge rules in most algebra systems is not entirely clear and are, in cases, logically unsound (Harrison and Théry 1998). The main reason for this is an over-aggressiveness to provide the user with an answer in a timely fashion at whatever cost, bypassing the checking of required assumptions even if it means sacrificing the soundness of the calculation. (This is strongly reminiscent of most Prolog implementations that bypass the so-called "occurs-check" also abandoning logical soundness in the name of efficiency.) This serious problem opens the opportunity for a deduction system to provide a service to the computer algebra system: Use its deductive capabilities to verify that the computer algebra's computational steps meet the required assumptions. There is a catch in this, however: For sufficiently large calculation steps, verifying is tantamount to proving and, to check these steps, the deduction system may well need the assistance of the very same system that is in need of verification! The solution to the soundness problem may then well require an extensive modification of the chosen symbolic algebra system to make it sound; an alternative approach is to develop a new system, entirely from scratch, in conjunction with the development of the automated theorem prover. In either case, the resulting combined deductive computer algebra system should display a much improved ability for automated mathematical reasoning.

## 15. Conclusion

Automated reasoning is a growing field that provides a healthy interplay between basic research and application. Automated deduction is being conducted using a multiplicity of theorem-proving methods, including resolution, sequent calculi, natural deduction, matrix connection methods, term rewriting, mathematical induction, and others. These methods are implemented using a variety of logic formalisms such as first-order logic, type theory and higher-order logic, clause and Horn logic, non-classical logics, and so on. Automated reasoning programs are being applied to solve a growing number of problems in formal logic, mathematics and computer science, logic programming, software and hardware verification, circuit design, and many others. One of the results of this variety of formalisms and automated deduction methods has been the proliferation of a large number of theorem proving programs. To test the capabilities of these different programs, selections of problems has been proposed against which their performance can be measured (McCharen, Overbeek and Wos 1976, Pelletier 1986). The TPTP (Sutcliffe and Suttner 1998) is a library of such problems that is updated on a regular basis. There is also a competition among automated theorem provers held regularly at the CADE conference (Pelletier, Sutcliffe and Suttner 2002); the problems for the competition are selected from the TPTP library.

Initially, computers were used to aid scientists with their complex and often tedious numerical calculations. The power of the machines was then extended from the numeric into the symbolic domain where infinite-precision computations performed by computer algebra programs have become an everyday affair. The goal of automated reasoning has been to further extend the machine's reach into the realm of deduction where they can be used as reasoning assistants in helping their users establish truth through proof.

## Bibliography

- Andrews, P. B., M. Bishop, S. Issar, D. Nesmith, F. Pfenning and H.
Xi, 1996, "TPS: A Theorem-Proving System for Classical Type Theory",
*Journal of Automated Reasoning*, Vol. 16, No. 3, pp.321-353. - Andrews, P. B., 1981, "Theorem-Proving via General Matings",
*JACM*, Vol. 28, No. 2, pp. 193-214. - Baader, F. and T. Nipkow, 1998,
*Term Rewriting and All That*, Cambridge University Press. - Bauer, A., E. Clarke and X. Zhao, 1998, "Analytica: An Experiment
in Combining Theorem Proving and Symbolic Computation",
*Journal of Automated Reasoning*, Vol. 21, pp. 295-325. - Berndt, B., 1985,
*Ramanujan's Notebooks*, Part I, pp. 25-43, Springer-Verlag. - Bibel, W., 1981, "On Matrices with Connections",
*JACM*, Vol. 28, No. 4, pp. 633-645. - Bledsoe, W. W., 1977, "Non-resolution Theorem Proving",
*Artificial Intelligence*, Vol. 9, pp. 1-35. - Bledsoe, W. W. and M. Tyson, 1975, "The UT Interactive Prover",
*Memo ATP-17A*, Dept. of Mathematics, University of Texas. - Boyer, R. S., M. Kaufmann and J. S. Moore, 1995, "The Boyer-Moore
Theorem Prover and its Interactive Enhancement",
*Computers and Mathematics with Applications*, Vol. 29, pp. 27-62. - Boyer, R.S. and J. S. Moore, 1979,
*A Computational Logic*, Academic Press, New York. - Bundy, A., F. van Harmelen, J. Hesketh and A. Smaill, 1991,
"Experiments with Proof Plans for Induction",
*Journal of Automated Reasoning*, Vol. 7, No. 3, pp. 303-324. - Bundy, A., A. Stevens, F. van Harmelen, A. Ireland and A. Smaill,
1993, "Rippling: A Heuristic for Guiding Inductive Proofs",
*Artificial Intelligence*, Vol. 62, pp. 185-253. - Basin, D. A. and T. Walsh, 1996, "A Calculus for and Termination of
Rippling",
*Journal of Automated Reasoning*, Vol. 16, No. 1/2, pp. 147-180. - Church, A., 1940, "A Formulation of the Simple Theory of Types",
*Journal of Symbolic Logic*, Vol. 5, pp. 56-68. - Chang, C. L. and R. C. T. Lee, 1973,
*Symbolic Logic and Mechanical Theorem Proving*, Academic Press. - Clarke, E. and X. Zhao, 1994, "Combining Symbolic Computation and
Theorem Proving: Some Problems of Ramanujan",
*Proceedings of the 12th International Conference on Automated Deduction*, Nancy, France, Lecture Notes in Artificial Intelligence, A. Bundy, ed., Vol. 814, pp.758-763, Springer-Verlag. - Clocksin, W. F. and C. S. Mellish, 1981,
*Programming in Prolog*, Springer-Verlag. - Colmerauer, A., H. Kanoui, R. Pasero and P. Roussel, 1973,
*Un Système de Communication Homme-machine en Français*, Rapport, Groupe Intelligence Artificielle, Université d'Aix Marseille. - Constable, R. L., S. F. Allen, H. M. Bromley, W.R. Cleaveland, J.
F. Cremer, R. W. Harper, D. J. Howe, T. B. Knoblock, N. P. Mendler, P.
Panangaden, J. T. Sasaki and S. F. Smith, 1986,
*Implementing Mathematics with the Nuprl Proof Development System*, Prentice Hall. - Cook, S. A., 1971, "The complexity of Theorem-Proving Procedures",
*Proceedings of the 3rd Annual ACM Symposium on Theory of Computing*, Association for Computing Machinery, New York, pp. 151-158. - Davis, M., G. Logemann and D. Loveland, 1962, "A Machine Program
for Theorem-Proving",
*Communications of the ACM*, Vol. 5, No. 7, pp. 394-397. - Davis, M. and H. Putnam, 1960, "A Computing Procedure for
Quantification Theory",
*JACM*, Vol. 7, No. 3, pp. 201-215. - Fitting, M., 1990,
*First-Order Logic and Automated Theorem Proving*, Springer-Verlag. - Farmer, W. M., J. D. Guttman and F. J. Thayer, 1993, "IMPS: An
Interactive Mathematical Proof System",
*Journal of Automated Reasoning*, Vol. 11, No. 2., pp. 213-248. - Gentzen, G., 1935, "Investigations into Logical Deduction", in Szabo (1969), pp.68-131.
- Gordon, M. J. C. and T. F. Melham, eds., 1993,
*Introduction to HOL: A Theorem Proving Environment for Higher Order Logic*, Cambridge University Press. - Gordon, M. J. C., A. J. Milner and C. P. Wadsworth, 1979,
*Edinburgh LCF: A Mechanised Logic of Computation*, LNCS 78, Springer-Verlag. - Harrison, J., 2001, "High-Level Verification Using Theorem Proving
and Formalized Mathematics",
*Proceedings of the 17th International Conference on Automated Deduction*, Lecture Notes in Artificial Intelligence, D. McAllester, ed., Vol. 1831, pp. 1-6, Springer-Verlag. - Harrison, J. and L. Théry, 1998, "A Skeptic's Approach to
Combining HOL and Maple",
*Journal of Automated Reasoning*, Vol. 21, pp. 279-294. - Hilbert, D. and W. Ackermann, 1928,
*Principles of Mathematical Logic*, Trans. from 1938 ed., Chelsea Publishing Co., New York, 1950. - Herbrand, J., 1930,
*Recherches sur la Theorie de la Demonstration*, Travaux de la Societé des Sciences at des Lettres de Varsovie, Classe III, Science Mathématique et Physique, No. 33, 128. - Huet, G. P., 1975, "A Unification Algorithm for Typed
λ-calculus",
*Theoretical Computer Science*, Vol. 1, pp. 27-57. - Kerber, M., Kohlhase and V. Sorge, 1998, "Integrating Computer
Algebra into Proof Planning",
*Journal of Automated Reasoning*, Vol. 21, pp. 327-355. - Knuth, D. and P. B. Bendix, 1970, "Simple Word Problems in
Universal Algebras",
*Computational Problems in Abstract Algebra*, J. Leech, ed., pp. 263-297, Pergamon Press. - Kleene, S. C., 1962,
*Introduction to Metamathematics*, North-Holland. - Kowalski, R., 1974, "Predicate Logic as a Programming Language",
*Proc. IFIP 74*, North Holland, pp. 569-574. - Küchlin, W. and C. Sinz, 2000, "Proving Consistency Assertions
for Automotive Product Data Management",
*Journal of Automated Reasoning, Special Issue: Satisfiability in the Year 2000*, I. P. Gent and T. Walsh, eds., Vol. 24, Nos. 1-2, pp. 145-163. - Lloyd, J. W., 1984,
*Foundations of Logic Programming*, Springer-Verlag. - Loveland, D. W., 1969, "A Simplified Format for the Model
Elimination Procedure",
*JACM*, Vol. 16, pp. 349-363. - Loveland, D. W., 1970, "A Linear Format for Resolution",
*Proc. IRIA Symp. Automatic Demonstration*, Springer-Verlag, New York, pp. 147-162. - Loveland, D. W., 1978,
*Automated Theorem Proving: A Logical Basis*, North Holland. - Luckham, D., 1970, "Refinements in Resolution Theory",
*Proc. IRIA Symp. Automatic Demonstration*, Springer-Verlag, New York, pp. 163-190. - Martin-Löf, P., 1982, "Constructive Mathematics and Computer
Programming",
*Logic, Methodology and Philosophy of Science*, Vol. IV, pp. 153-175, North-Holland. - Massacci, F. and L. Marraro, 2000, "Logical Cryptanalysis: Encoding
and Analysis of the U.S. Data Encryption Standard",
*Journal of Automated Reasoning, Special Issue: Satisfiability in the Year 2000*, I. P. Gent and T. Walsh, eds., Vol. 24, Nos. 1-2, pp. 165-203. - McCune, W., 1997, "Solution of the Robbins Problem",
*Journal of Automated Reasoning*, Vol. 19, No. 3, pp. 263-276. - McCune, W., 2001,
*MACE 2.0 Reference Manual and Guide*, Mathematics and Computer Science Division, ANL/MSC-TM-249, Argonne National Laboratory. - McRobie, M. A., 1991, "Automated Reasoning and Nonclassical Logics:
Introduction",
*Journal of Automated Reasoning*, Vol. 7, No. 4, pp. 447-451. - Miller, D. and G. Nadathur, 1988, "An Overview of λProlog",
*Proceedings of the Fifth International Logic Programming Conference — Fifth Symposium in Logic Programming*, R. Bowen and R. Kowalski, ed., MIT Press. - McCharen, J. D., R. A. Overbeek and L. A. Wos, 1976, "Problems and
Experiments for and with Automated Theorem-Proving Programs",
*IEEE Transactions on Computers 25 (8)*, pp. 773-782. - Nivens, A. J., 1974, "A Human-Oriented Logic for Automatic Theorem
Proving",
*JACM*, Vol. 21, No. 4, pp. 606-621. - Paulson, L. C., 1994,
*Isabelle: A Generic Theorem Prover*, Lecture Notes in Computer Science, Vol. 828, Springer-Verlag. - Paulson, L. C. and K. Grabczewski, 1996, "Mechanizing Set Theory",
*Journal of Automated Reasoning*, Vol. 17, No. 3, pp.291-323. - Pelletier, F. J., 1986, "Seventy-Five Problems for Testing
Automatic Theorem Provers",
*Journal of Automated Reasoning*, Vol. 2, No. 2, pp.191-216. - Pelletier, F. J., 1998, "Natural Deduction Theorem Proving in
THINKER",
*Studia Logica*60, No. 1, pp. 3-43. - Pelletier, F. J., G. Sutcliffe, and C. Suttner, 2002, "The
Development of CASC",
*AI Communications*, Vol. 15, No. 2-3, pp. 79-90. - Portoraro, F. D., 1994, "Symlog: Automated Advice in Fitch-style
Proof Construction",
*Proceedings of the 12th International Conference on Automated Deduction*, Nancy, France, Lecture Notes in Artificial Intelligence, A. Bundy, ed., Vol. 814, pp. 802-806, Springer-Verlag. - Portoraro, F. D., 1998, "Strategic Construction of Fitch-style
Proofs",
*Studia Logica*, Vol. 60, No. 1, pp. 45-66. - Prasad, M., A. Biere and A. Gupta, 2005, "A Survey of Recent
Advances in SAT-Based Formal Verification",
*Software Tools for Technology Transfer*, Manuscript No. not yet assigned. - Prawitz, D., 1965,
*Natural Deduction: A Proof Theoretical Study*, Almqvist & Wiksell, Stockholm. - Robinson, J. A., 1965, "A Machine Oriented Logic Based on the
Resolution Principle",
*JACM*, Vol. 12, pp. 23-41. - Robinson, J. A., 1965, "Automatic Deduction with Hyper-resolution",
*Internat. J. Comput. Math.*, Vol. 1, pp. 227-234. - Sieg, W. and J. Byrnes, 1996,
*Normal Natural Deduction Proofs (in Classical Logic)*, Report CMU-PHIL 74, Department of Philosophy, Carnegie-Mellon University. - Slaney, J. K., 1984, "3,088 Varieties: A Solution to the Ackerman
Constant Problem",
*Journal of Symbolic Logic*50, pp. 487-501. - Stickel, M. E., 1992, "A Prolog Technology Theorem Prover: A New
Exposition and Implementation in Prolog",
*Theoretical Computer Science 104*, pp. 109-128. - Suppes, P.,
*et al*., 1981, "Part I: Interactive Theorem Proving in CAI Courses",*University-Level Computer-Assisted Instruction at Stanford: 1968-1980*, P. Suppes, ed., IMSSS, Stanford University. - Sutcliffe, G. and C. Suttner, 1998, "The TPTP Problem Library - CNF
Release v1.2.1",
*Journal of Automated Reasoning*, Vol. 21, No. 2, pp. 177-203. - Szabo, M. E., 1969,
*The Collected Papers of Gerhard Gentzen*, M. E. Szabo ed., North-Holland. - Wallen, L. A., 1990,
*Automated Deduction in Nonclassical Logics*, MIT Press. - Wang, H., 1960, "Toward Mechanical Mathematics", in
*Automation of Reasoning*, J. Siekmann and G. Wrightson, eds., Vol. 1, pp. 244-264, Springer-Verlag, 1983. - Wos, L., 2001,
*Journal of Automated Reasoning, Special Issue: Advances in Logic Through Automated Reasoning*, L. Wos, ed., Vol. 27, No. 2. - Wos, L., D. Carson and G. R. Robinson, 1965, "Efficiency and
Completeness of the Set of Support Strategy in Theorem Proving",
*JACM*, Vol. 12, pp. 698-709. - Wos, L., R. Overbeek, E. Lusk and J. Boyle, 1984,
*Automated Reasoning: Introduction and Applications*, Prentice-Hall. - Zhang, L. and S. Malik, 2002, "The Quest for Efficient Boolean
Satisfiability Solvers",
*Proceedings of the 18th International Conference on Automated Deduction*, Lecture Notes in Artificial Intelligence, A. Voronkov, ed., Vol. 2392, pp. 295-313, Springer-Verlag.

## Other Internet Resources

- Automated Reasoning at Argonne
- Automated Reasoning Group at ANU
- HOL Automated Reasoning Group
- Isabelle
- The Nuprl Project
- TPS Theorem Proving System
- The CADE ATP System Competition