===================================================================================================
Ideas about syntactic and algorithmic aspects of Constraint and Logic Programming in Python (CLPiP)
===================================================================================================

** WORK IN PROGRESS **

Introduction
============

Scope
-----

This document tries to shed some light on integration of logic and
constraint programming into Python using the PyPy framework.

This takes place in Working Packages 09 and 10 of the EU PyPy funding
project. The logic and constraint programming features are to be added
to PyPy (WP9). An ontology library will be provided and will serve as
our first use case for logic programming.

PyPy has been progressively equiped with a parser and compiler
flexible enough that it is hoped that developpers can leverage it to
extend the language *at runtime*. This is quite in the spirit of Lisp
macros, if not the exact manner. It is expected that an aspect
oriented programming toolkit be built using the compiler and parser
infrastructure (WP10). This will serve the needs of WP9.

Clarifications
--------------

This work was described as integration of logic programming *and*
constraint programming into PyPy. Both are obviously related and we
have settled on the concurrent logic and constraint programing (CCLP)
model present in the Oz programing language. It allows to write
logic (Prolog-style) programs and to use constraint solving
techniques in an integrated manner (as opposed to the use of an
external toolkit with high impedance mismatch between language runtime
and constraint solving package). The relational way will be built on
the constraint solving machinery (much like, in Oz, the *choice*
operator is built on top of *choose*).

This will allow 

Lastly, here we mainly discuss syntactical issues: those are probably
the least difficult aspects of getting CLP into python; getting an
efficient implementation of the canonical algorithms into PyPy will be
the bulk of the work.

Constraint programming
======================

In constraint programming, a 'problem' is a set of variables, their
(finite discrete) domains, and the constraints that restrict their
possible values (or define the relations between the values). When all
these have been given to a constraint solver, it is possible to find
all possible solutions, that is the sets of valuations that satisfies
simultaneously all constraints. The solver is solely responsible for
finding solutions (or lack thereof).

Overview in python
------------------

At the time being, there exists a *constraints* package made by
Logilab and written in pure python, which implements some parts of the
solver found in Mozart (the reference Oz implementation). We use it to
illustrate where we want to go, syntactically-wise.

Let's start with a quite standard example (the problem being solved
here is fully described on
http://www.logilab.org/projects/constraint/documentation)::


 # import Repository class and fd module, 
 from logilab.constraint import *
 variables = ('c01','c02','c03','c04','c05','c06','c07','c08','c09','c10')

Variables are represented as any string object::

 values = [(room,slot) 
           for room in ('room A','room B','room C') 
           for slot in ('day 1 AM','day 1 PM','day 2 AM','day 2 PM')]

Values can be freely pre-computed using standard python constructs;
they can be any object; here, tuples of strings::

 domains = {}
 for v in variables:
     domains[v]=fd.FiniteDomain(values)

The relationship between variables and their possible values is set in
a dictionnary whose keys are variable designators (strings). Values
are wrapped into FiniteDomain instances (FiniteDomain has set
behaviour, plus some implementation subtleties)::

 groups = (('c01','c02','c03','c10'),
           ('c02','c06','c08','c09'),
           ('c03','c05','c06','c07'),
           ('c01','c03','c07','c08'))
 for g in groups:
     for conf1 in g:
         for conf2 in g:
             if conf2 > conf1:
                 constraints.append(fd.make_expression((conf1,conf2),
                                                       '%s[1] != %s[1]'%\
                                                         (conf1,conf2)))

Constraints are built by make_expression which takes a tuple of one or
two variables and a string representing an unary or binary
relationship. The example complete with all constraints is provided at
the url mentioned supra.

Then, when everything has been settled, comes the last step::

 r = Repository(variables,domains,constraints)
 solutions = Solver().solve(r)
 print solutions

Remarks
-------

Due to the compactness of Python syntax, this sample problem
specification remains quite small and readable. It is not obvious what
could be done to make it smaller and still readable.

Variables are not first-class (but close ...) and have nothing to do
with Python standard variables. The good side of this is that we can't
misuse a CSP variable with an ordinary variable.

Specifiying a constraint is clunky : variables and operator have to be
provided separately, and the operator has to be a string. This last
restriction because Python doesn't allow passing builtin infix
operators as functional parameters.

(the following sub-chapters are considered deprecated)

Special operator for CLP variable bindings
------------------------------------------

First, promote variables to second-class citizenry. Be able to write
something like::

 domain = [(room,slot) 
           for room in ('room A','room B','room C') 
           for slot in ('day 1 AM','day 1 PM','day 2 AM','day 2 PM')]
 c01 := domain
 c02 := domain

This introduces a special operator ``:=`` which binds a logical variable
to a domain. More generally::

 var := <any iterable>

With respect to normal assignment, we can imagine the following::

 c01 = 'foo' # raises a NotAssignable or ReadOnly exception
 bar = c01   # takes a reference to the current value of c01 into bar
             # also, meaningless (so ... None) before the solver has run

Problem ... we can't anymore do::

 for conf in ('c01','c05','c10'): ...

It should be good to define a kind of first-class designator for these
kind of variables. A specially-crafted class representing variables
(in the manner of Lisp's symbols) would suffice::

 for conf in (c01, c05, c10): ...

Is it worth the price ? Quite unsure. 

Domain-specific blocks
----------------------

An alternative which avoids the special operator and uses a keyword
instead could be::

 domain:
    c01 = <iterable>
    c02 = <iterable>

It makes us reuse ``=``, with twisted (non-standard Python) semantics
but under a clear lexical umbrella (a ``domain:`` block).

It is possible to get further in this direction::

 problem toto:
     D1 = <domain definition>
     a,b,c in D1
     def constraint1(a,b,c):
         a == b
     
 for sol in toto:
     print sol

There, we put a full constraints mini-language under a named 'problem'
block. The *problem* becomes a first class object (in the manner of
Python classes) and we can (lazily) extract solutions from it.


Stuffing constraints
--------------------

The ugly aspect of py-constraints is the definition of custom
unary/binary constraints through make_expression, as in::

 fd.make_expression ('var1', 'var2', "frob(var1,var2)")

One solution might be to parse the string at runtime to recover the
variable names::

 fd.make_expression ('frob(var1,var2)')

A simple hand-written parser could be sufficient for this.  On the
other hand, the lexically bounded mini-language proposed above helps
solve this more uniformly.

.. _`logic programming`:

Logic Programming, Prolog and Oz-style
======================================

Integrated search seamlessly into an already grown up imperative
programming language might cause some headaches. For instance, in the
perspective of benefiting from the Prolog way of doing logic
programming, we considered embedding a specific mini-language into
Python with strong and well-defined borders between the logic world
and the 'normal', or usual imperative world of standard Python.

Such a twist might be not needed, fortunately. Designers of Oz have
devised another way of doing logic programming that is certainly much
more easily integrable into current Python than bolting Prolog into
it.

Criticism of Prolog
-------------------

The Prolog-style for logic programming, while successfully applied to
many real-world problems, is not without defects. These can be
summarized as the following :

* Prolog is a compromise between relational and algorithmic style of
  programming. It does not go far enough on the side of
  relations/constraints (horn clauses have limited expressiveness) and
  makes unnatural to write algorithmic (non-searching) code.

* it is completely unsuitable to concurrent programming, which is
  becoming more regarded as new languages allow to do it with
  high-level, hassle-free constructs, and is a necessity to build
  reactive applications.


Integration in PyPy
-------------------

From the PyPy sprint in Belgium that was focused on constraint and
logic programming emerged an implementation of a so-called 'Logic
Objectspace' which extends PyPy's standard object space implementing
standard Python operations with two things :

* micro threads (which despite their names are merely, currently,
  greenlets, that is a kind of coroutine)

* Logic Variables, that is a new datatype acting as a box for normal
  Python values. 


Logic Variables
---------------

Logic variables have two states : free and bound. A bound logic
variable is indistinguishable from a normal Python value which it
wraps. A free variable can only be bound once (it is also said to be a
single-assignment variable). 

The operation that binds a logic variable is known as
"unification". Unify is an operator that takes two arbitrary data
structures and tries to check if they are the same, much in the sense
of the == operator, but with one twist : unify is a "destructive"
operator when it comes to logic variables.

Unifying one unbound variable with some value means assigning the
value to the variable (which then satisfies equalness), unifying two
unbound variables aliases them (they are constrained to reference the
same -future- value). Thus unify can change the state of the world and
raises an UnificationError exception whenever it fails, instead of
returning False like an equality predicate.

Assignment or aliasing of variables is provided by the 'bind' operator.

Threads and dataflow synchronisation
------------------------------------

When a piece of code tries to access a free logic variable, the thread
in which it runs is blocked (suspended) until the variable becomes
bound. This behaviour is known as "dataflow synchronization" and
mimics exactly the dataflow variables from Oz. With respect to
behaviour under concurrency conditions, logic variables come with two
operators :

* wait : this suspends the current thread until the variable is bound,
  it returns the value otherwise (in the logic objectspace, all
  operators make an implicit, transparent wait on their value arguments)

* wait_needed : this suspends the current thread until the variable
  has received a wait message. It has to be used explicitely,
  typically by a producer thread that wants to produce data only when
  needed.

* bind : binding a variable to a value will make runnable all threads
  suspended this variable.

Wait and wait_needed allow to write efficient lazy evaluating code.

Relation with logic programming
-------------------------------

All of this is not sufficient without a specific non-deterministic
primitive operator added to the language. In Oz, the 'choice' operator
allows to statically enumerate a set of possible actions, leaving the
actual decision to choose between several branches to the solver.

Let us look at a small relational program written respectively in
Prolog, Oz and extended Python.

Prolog ::

 Soft(beige).
 Soft(coral).
 Hard(mauve).
 Hard(ochre).

 Contrast(C1, C2) :- Soft(C1), Hard(C2).
 Contrast(C1, C2) :- Hard(C1), Soft(C2).

 Suit(Shirt, Pants, Socks) :- Contrast(Shirt, Pants), 
   Contrast(Pants, Socks), Shirt != Socks.

Oz ::

 fun {Soft} choice beige [] coral end end
 fun {Hard} choice mauve [] ochre end end

 proc {Contrast C1 C2}
     choice C1={Soft} C2={Hard}[] C1={Hard} C2={Soft} end
 end

 fun {Suit}
     Shirt Pants Socks
 in
     {Contrast Shirt Pants}
     {Contrast Pants Socks}
     if Shirt==Socks then fail end
     suit(Shirt Pants Socks)
 end

Python ::

 def soft():
     choice: 'beige' or: 'coral'

 def hard():
     choice: 'mauve' or: 'ochre'

 def contrast(C1, C2):
     choice:
        unify(C1, soft())
        unify(C2, hard())
     or:
        unify(C1, hard())
        unify(C2, soft())

 def suit():
     let Shirt, Pants, Socks:
         contrast(Shirt, Pants)
         contrast(Pants, Socks)
         if Shirt == Socks: raise UnificationError
         return (Shirt, Pants, Socks)

Since our variables (those created by the let declaration) really are
logic variables, and thus can be assigned to only once, the solver
must take some special measure to get all the solutions. The trick is
that the solver uses the Computation Space machinery for constraint
solving. Basically, a computation space is like an independant world
in which a specific, unique combination of choices will be tried and
eventually a -locally- unique solution be produced. The solver uses as
many computation spaces as necessary to eventually enumerate all
possible solutions.


Done and To Do
==============

What has been done
------------------

For constraint programming:

* A prototype concurrent solver using Computation Spaces, CPython
  threads and the existing logilab constraints package has been built
  and shown to work (note that it is not 100% complete in the sense
  that the part that allows logic programming to be used is not
  there. Only recently has this been figured out).

* A simulation of dataflow/logic variables has been built.

For logic programming:

* A prototype logic object space has been built, which provides
  integrated logic variables with dataflow semantics, an incomplete
  implementation of micro threads, 

What needs to be done 
---------------------

For constraint programming:

* Reimplement the computation space as an entity of the logic object
  space (we think that reusing thread pickling abilities would be an
  interesting start).

* Adapt the core algorithms to RPython

Logic programming:

* Provide a basic solver (depth-first), and a working choose() space
  operator to allow search as provided in logic programming.

For both:

* Adapt the parser/lexer to the syntactic requirements (which should
  be light).


