============================================================
                          RCtypes
============================================================

.. contents:: :depth: 3
.. sectnum::


------------------------------------------------------------
                        Using RCtypes
------------------------------------------------------------

ctypes_ is a library for CPython that has been around for some years,
and that is now integrated in Python 2.5.  It allows Python programs to
directly perform calls to external libraries written in C, by declaring
function prototypes and all necessary data structures dynamically in
Python.  It is based on the platform-dependent libffi_ library.

*RCtypes* is ctypes restricted to be used from RPython programs.
Basically, an RPython program that is going to be translated to C or a
C-like language can use (a subset of) ctypes more or less normally.

During the translation of the RPython program to C, the source code --
which contains dynamic foreign function calls written using ctypes --
becomes static C code that performs the same calls.  In other words,
once compiled within an RPython program, ctypes constructs are all
replaced by regular, static C code that no longer uses the *libffi*
library.  This makes ctypes a very good way to interface with any C
library -- not only from the PyPy interpreter, but from any program or
extension module that is written in RPython.

This document assumes some basic familiarity with ctypes.  It is
recommended to first look up the `ctypes documentation`_.  As an
appendix to this document we also point out the `Finest issues in the
behavior of ctypes`_, which are not explicitly documented anywhere else
to the best of our knowledge.


Restrictions
============

The main restriction that RCtypes adds over ctypes is similar to the
general restriction that RPython_ adds over Python: you cannot be "too
dynamic" at run-time, but you can be as dynamic as you want during
bootstrapping, i.e. when the source code is imported.

For RCtypes, this means that all ctypes type and function declarations
must be done while bootstrapping.  You cannot declare new types and
functions from inside RPython functions that are going to be translated
to C.  In practice, this is a fairly soft limitation: most modules using
ctypes would typically declare all functions globally at module-level
anyway.

Apart from this, you can use many ctypes constructs freely: all
primitive types except c_float and c_wchar and c_wchar_p (they will be
added at a later time), pointers, structures, arrays, external functions.
The following ctypes functions are supported at run-time:
create_string_buffer(), pointer(), POINTER(), sizeof(), cast().  (Adding
support for more is easy.)  There is some support for Unions,
and for callbacks and a few more obscure ctypes features, but
e.g. custom allocators and return-value-error-checkers are not
supported.

Remember to use ``None`` as the return type in the function declaration
if it returns void.

There is special support for the special variable ``errno`` of C: from
the RPython program, access it with the helper function
``pypy.rpython.rctypes.aerrno.geterrno()``.

Note that the support for the POINTER() function is an exception to the
rule that types should not be manipulated at run-time.  You can only
call it with a constant type, as in ``POINTER(c_int)``.

Another exception is that
an expression like ``c_int * n`` can appear at run-time.  It can
be used to create variable-sized array instances, i.e. arrays whose
length is not a static constant, as in ``my_array = (c_int * n)()``.
Similarly, the create_string_buffer() function returns a variable-sized
array of chars.

NOTE: in order to translate an RPython program using ctypes, the module
``pypy.rpython.rctypes.implementation`` must be imported!  This is
required to register to the annotator/rtyper how the ctypes constructs
should be translated.  Just import this module from your RPython
program, in addition to importing ctypes.


.. _installation:

ctypes version and platform notes
=================================

`ctypes-0.9.9.6`_ or later is required.

On Mac OSX 10.3 (at least with ctypes-0.9.9.6)
you need to change the ``RTLD_LOCAL`` default 
in  ctypes/__init__.py line 293 to::

    def __init__(self, name, mode=RTLD_GLOBAL, handle=None):

otherwise it will complain with "unable to open this file with RTLD_LOCAL"
when trying to load the C library. 


------------------------------------------------------------
                    Implementation design
------------------------------------------------------------

We have experimented with two different implementation approaches.  The
first one is relatively mature, but does not do the right thing about
memory management issues in the most advanced cases.  Moreover, it
prevents the RPython program to be compiled with a Moving Garbage
Collector, which has been the major factor stopping us from
experimenting with advanced GCs so far.  It is described in the chapter
`RCtypes implemented in the RTyper`_.

The alternative implementation of RCtypes is work in progress; its
approach is described in the chapter `RCtypes implemented via a pure
RPython interface`_.


RCtypes implemented in the RTyper
=================================

The currently available implementation of RCtypes works by integrating
itself within the annotation and most importantly RTyping_ process of
the translation toolchain: the ctypes objects that the RPython program
uses, and the operations it performs on them, are individually replaced
during RTyping by sequences of low-level operations performing the
equivalent operation in a C-like language.


Annotation
----------

Ctypes on CPython tracks all the memory it has allocated by itself,
may it be referenced by pointers and structures or only pointers.
Thus memory allocated by ctypes is properly garbage collected and no
dangling pointers should arise.

Pointers to structures returned by an external function are a different
story. For such pointers we have to assume that they were not allocated
by ctypes, and just keep a pointer to the memory, hoping that it will
not go away.  This is the same as in ctypes.

For objects that are known to be allocated by RCtypes, a simple GC-aware
box suffices to hold the needed memory.  For other objects, an extra
indirection is required.  Thus we use the annotator to track which
objects are statically known to be allocated by RCtypes.

RTyping: Memory layout
----------------------

In ctypes, all instances are mutable boxes containing either some raw
memory with a layout compatible to that of the equivalent C type, or a
reference to such memory.  The reference indirection is transparent to
the user; for example, dereferencing a ctypes object "pointer to
structure" results in a "structure" object that doesn't include a copy
of the data, but only a reference to that data.  (This is similar to the
C++ notion of reference: it is just a pointer at the machine level, but
at the language level it behaves like the object that it points to, not
like a pointer.)

The rtyper maps this to the LLType model as follows.  For boxes that
embed the raw memory content (i.e. are known to own their memory)::

    Ptr( GcStruct( "name_owner",
           ("c_data", Struct(...) ) ) )

where the raw memory content and layout is specified by the
"Struct(...)" part.

The key point here is that the "Struct(...)" part follows exactly the C
layout, with no extra headers.  If it contains pointers, they point to
exact C data as well, without headers, and so on.  From C's point of
view this is all perfectly legal C data in the format expected by the
external C library -- because the C library never sees pointers to the
whole GcStruct boxes.

In other words, we have a two-level picture of how data looks like in a
translated RPython program: at the RPython level, it contains a graph of
GcStructs, with GC headers, and sometimes fields pointing at each other
for the purpose of keeping the necessary GcStructs alive.  At the same
time, *inside* these GcStruct, there is a part that looks exactly like C
data structures.  This part can contain pointers to other C data
structures, which may live within other GcStructs.  So the C libraries
will only see and modify the C data that is embedded in the "c_data"
substructure of GcStructs.  This is somewhat similar to the idea that if
you call malloc() in a C program, you actually get a block of memory
that is embedded in some larger block with malloc-specific headers; but
at the level of C it doesn't matter because you only pass around and
store pointers to the "embedded sub-blocks" of memory, never to the
whole block with all its headers.

Owner/alias boxes
~~~~~~~~~~~~~~~~~

Here is again the case of boxes that embed the raw memory content (owner
boxes)::

    Ptr( GcStruct( "name_owner",
           ("c_data", Struct(...) ) ) )

For boxes that don't embed the raw memory content (alias boxes)::

    Ptr( GcStruct( "name_alias",
           ("c_data_owner_keepalive", Ptr( GcStruct( "name_owner" ) ),
           ("c_data", Ptr( Struct(...) ) ) ) )

In both cases, the outer GcStruct is needed to make the boxes tracked by
the GC automatically.  The "c_data" field either embeds or references
the raw memory (we reuse the same field name to simplify writing ll
helpers).  The nested "Struct(...)" definition specifies the exact C
layout expected for that memory.

In the alias case, the "c_data_owner_keepalive" pointer is used for
cases where the memory was actually allocated by Ctypes, even though the
annotator didn't figure it out statically.  This pointer field is set in
this case to the owner GcStruct, to keep it alive.  Note that the
"c_data" pointer will then point inside the *same* owner GcStruct -- it
will point to the structure "c_data_owner_keepalive.c_data".

Of course, the "c_data" field is not visible to the RPython-level user.

Primitive Types
~~~~~~~~~~~~~~~
Ctypes' primitive types are mapped directly to the correspondending PyPy
LLType: Signed, Float, etc.  For the owned-memory case, we get::

    Ptr( GcStruct( "CtypesBox_<TypeName>"
            ( "c_data"
                    (Struct "C_Data_<TypeName>
                            ( "value", Signed/Float/etc. ) ) ) ) )

Note that we don't make "c_data" itself a Signed or Float directly because
in LLType we can't take pointers to Signed or Float, only to Struct or
Array.

The non-owned-memory case is::

    Ptr( GcStruct( "CtypesBox_<TypeName>"
            ( "c_data_owner_keepalive" ... ),
            ( "c_data"
                    (Ptr(Struct "C_Data_<TypeName>
                            ( "value", Signed/Float/etc. ) ) ) ) ) )

Pointers
~~~~~~~~

::

    Ptr( GcStruct( "CtypesBox_<TypeName>"
            ( "c_data"
                    (Struct "C_Data_<TypeName>
                            ( "value", Ptr(...) ) ) ) ) )

or::

    Ptr( GcStruct( "CtypesBox_<TypeName>"
            ( "c_data_owner_keepalive" ... ),
            ( "c_data"
                    (Ptr(Struct "C_Data_<TypeName>
                            ( "value", Ptr(...) ) ) ) ) ) )

However, there is a special case here: the pointer might point to data
owned by another CtypesBox -- i.e. it can point to the "c_data" field of
some other CtypesBox.  In this case we must make sure that the other
CtypesBox stays alive.  This is done by adding an extra field
referencing the gc box (this field is not otherwise used)::

    Ptr( GcStruct( "CtypesBox_<TypeName>"
            ( "c_data"
                    (Struct "C_Data_<TypeName>
                            ( "value", Ptr(...) ) ) )
            ( "keepalive"
                    (Ptr(GcStruct("CtypesBox_<TargetTypeName>"))) ) ) )

Note that the above example shows the memory-owning case.  As usual, its
memory-aliasing equivalent also contains the "c_data_owner_keepalive"
field::

    Ptr( GcStruct( "CtypesBox_<TypeName>"
            ( "c_data_owner_keepalive" ... ),
            ( "c_data"
                    (Ptr(Struct "C_Data_<TypeName>
                            ( "value", Ptr(...) ) ) ) )
            ( "keepalive"
                    (Ptr(GcStruct("CtypesBox_<TargetTypeName>"))) ) ) )

The two keepalive fields are easy to confuse with each other, but they
have different types and goals.  The "c_data_owner_keepalive" is used if
the place where the *pointer itself* is stored needs to be kept alive.
The "keepalive" field is used if whatever it is that the pointer *points
to* needs to be kept alive.

Structures
~~~~~~~~~~
Structures have the following memory layout (owning their raw memory)
if they were allocated by ctypes::

    Ptr( GcStruct( "CtypesBox_<StructName>"
            ( "c_data" 
                    (Struct "C_Data_<StructName>
                            *<Fieldefintions>) ) ) )

For structures obtained by dereferencing a pointer (by reading its
"contents" attribute), the structure box does not own the memory::

    Ptr( GcStruct( "CtypesBox_<StructName>"
            ( "c_data_owner_keepalive" ... ),
            ( "c_data" 
                    (Ptr(Struct "C_Data_<StructName>
                            *<Fieldefintions>) ) ) ) )

One or several Keepalive fields might be necessary in each case: more
precisely, for every pointer field that the structure contains, we need
an extra keepalive field.  It is set when a value is stored in the
pointer field of the structure, and when this value is a pointer to
RCtypes-owned memory.  In this case, the corresponding keepalive field
of the GcStruct is made to point to the target GcStruct.

Arrays
~~~~~~
Arrays behave like structures, but use an Array instead of a Struct in
the "c_data" declaration.

For similar reasons, arrays of pointers have an additional array of
keepalives.


RCtypes implemented via a pure RPython interface
================================================

As of January 2007, a second implementation of RCtypes is being
developed.  The basic conclusion on the work on the first implementation
is that it is quite tedious to have to write code that generates
detailed low-level operations for each high-level ctypes operation - and
most importantly, it is not flexible enough for our purposes.  The major
issue is that problems were recently found in the corner cases of memory
management, i.e. when ctypes objects should keep other ctypes object
alive and for how long.  Fixing this would involve some pervasive
changes in the low-level representation of ctypes objects, which would
require all the RCtypes RTyping code to be updated.  Similarly, to be
able to use a Moving Garbage Collector, the memory that is visible to
the external C code need to be separated from the GC-managed memory that
is used for the RCtypes objects themselves; this would also require a
complete upgrade of all the RTyping code.


The rctypesobject RPython library
---------------------------------

To solve this, we started by writing a complete RPython library that
offers the same functionality as ctypes, but with a more regular,
RPython-compatible interface.  This library is in the
`pypy/rlib/rctypes/rctypesobject.py`_ module.  All operations use either
static or instance methods on classes following a structure similar to
the ctypes classes.  The flexibility of writing a normal RPython library
allowed us to use internal support classes freely, together with data
structures appropriate to the memory management needs.

This interface is more tedious to use than ctypes' native interface, but
it not meant for manual use.  Instead, the translation toolchain maps
uses of the usual ctypes interface in the RPython program to the more
verbose but more regular rctypesobject interface.

The rctypesobject library is mostly implemented as a family of functions
that build and return new classes.  For example, ``Primitive(Signed)``
return the class corresponding to C objects of the low-level type
``Signed`` (corresponding to ``long`` in C),
``RPointer(Primitive(Signed))`` is the class corresponding to the type
"pointer to long" and ``RVarArray(RPointer(Primitive(Signed)))`` return
the class corresponding to arrays of pointers to longs.

Each of these classes expose a similar interface: the ``allocate()``
static method returns a new instance with its own memory storage
(allocated with a separate ``lltype.malloc()`` as regular C memory, and
automatically freed from the ``__del__()`` method of the instance).  For
example, ``Primitive(Signed).allocate()`` returns an instance that
manages a word of C memory, large enough to contain just one ``long``.
If ``x`` designates this instance, the expression ``pointer(x)``
allocates and returns an instance of class
``RPointer(Primitive(Signed))`` that manages a word of C memory, large
enough to hold a pointer, and initialized to point to the previous
``long`` in memory.

The details of the interface can be found in the library's accompanying
test file, `pypy/rlib/rctypes/test/test_rctypesobject.py`_.


ControllerEntry
---------------

We implemented a generic way in the translation toolchain to map
operations on arbitrary Python objects to calls to "glue" RPython
classes and methods.  This mechanism can be found in
`pypy/rpython/controllerentry.py`_.  In short, it allows code to
register, for each full Python object, class, or metaclass, a
corresponding "controller" class.  During translation, the controller is
invoked when the Python object, class or metaclass is encountered.  The
annotator delegates to the controller the meaning of all operations -
instantiation, attribute reading and setting, etc.  By default, this
delegation is performed by replacing the operation by a call to a method
of the controller; in this way, the controller can be written simply as
a family of RPython methods with names like:

* ``new()`` - called when the source RPython program tries to
  instantiate the full Python class

* ``get_xyz()`` and ``set_xyz()`` - called when the RPython program
  tries to get or set the attribute ``xyz``

* ``getitem()`` and ``setitem()`` - called when the RPython program uses
  the ``[ ]`` notation to index the full Python object

If necessary, the controller can also choose to entirely override the
default annotating and rtyping behavior and insert its own.  This is
useful for cases where the method cannot be implemented in RPython,
e.g. in the presence of a polymorphic operation that would cause the
method signature to be ill-typed.

For RCtypes, we implemented controllers that map the regular ctypes
objects, classes and metaclasses to classes and operations from
rctypesobject.  This turned out to be a good way to separate the
RCtypes implementation issues from the (sometimes complicated)
interpretation of ctypes' rich and irregular interface.

Performance
-----------

The greatest advantage of the ControllerEntry approach over the direct
RTyping approach of the first RCtypes implementation is its higher
level, giving flexibility.  This is also potentially a disadvantage:
there is for example no owner/alias analysis done during annotation;
instead, all ctypes objects of a given type are implemented identically.
We think that the general optimizations that we implemented - most
importantly `malloc removal`_ - are either good enough to remove the
overhead in the common case, or can be made good enough with some more
efforts.

XXX work in progress.


------------------------------------------------------------
                        Appendix
------------------------------------------------------------

.. _cpython-behavior:

Finest issues in the behavior of ctypes
=======================================

For reference, this section describes ctypes behaviour on CPython, as
far as it is know to the author and relevant for rctypes.  Of course,
you should consult the `ctypes documentation`_ for more information.
The test file `pypy/rpython/rctypes/test/test_ctypes.py`_ also exercices
many corner cases of ctypes that were relevant for the rctypes
implementation design.

Primitive Types
---------------
All primitive types behave like their CPython counterparts.
Some types like `c_long` and `c_int` are identical on some hardware platforms,
depending on size of C's int type on those platforms.


Heap Allocated Types
--------------------
Ctypes deals with heap allocated instances of types in a simple
and straightforward manner. However documentationwise it 
has some shady corners when it comes to heap allocated types.

Structures
~~~~~~~~~~
Structures allocated by ctypes

Structure instances in ctypes are actually proxies for C-compatible
memory. The behaviour of such instances is illustrated by the following
example structure throughout this document::

    class TS( Structure ):
        _fields_ = [ ( "f0", c_int ), ( "f1", c_float ) ]

    ts0 = TS( 42, -42 )
    ts1 = TS( -17, 17 )

    p0 = pointer( ts0 )
    p1 = pointer( ts1 )

You can not assign structures by value as in C or C++.
But ctypes provides a memmove-function just like C does.
You do not need to pass a ctype's pointer to the structure type to
memmove. Instead you can pass the structures directly as in the
example below::

    memmove( ts0, ts1, sizeof( ts0 ) )
    assert ts0.f0 == -17
    assert ts0.f1 == 17 

Structures created from foreign pointers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The author is currently not sure, whether the text below
is correct in all aspects, but carefully planned trials did
not provide any evidence to the contrary.

Structure instances can also be created by dereferencing pointers
that where returned by a call to an external function.

More on pointers in the next section.

Pointers
--------

Pointer types are created by a factory function named `POINTER`.
Pointers can be created by either calling and thus instanciating a
pointer type or by calling another function named `pointer` that
creates the necessary pointer type on the fly and returns
a pointer to the instance. 

Pointers only implement one attribute named contents which 
references the ctypes instance the pointer points to. Assigning to 
this attribute only changes the pointers notion of the object it points
to. The instances themselves are not touched, especially structures are
not copied by value. 

Pointers to Structures Allocated by The Ctypes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Pointers to structures that where allocated by ctypes contain
a reference to the structure in the contents attribute
mentioned above. This reference is know to the garbage collector,
which means that even if the last structure instance is deallocated,
the C-compatible memory is not, provided a pointer still points to it.

Pointers to Structures Allocated by Foreign Modules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In this case the structure was probably allocated by the foreign module - at
least ctypes must assume it was. In this case the pointer's reference to the 
structure should not be made known to the GC. In the same sense the structure itself
must record the fact that its C-compatible memory was not allocated by ctypes itself.


.. _`ctypes documentation`: http://docs.python.org/dev/lib/module-ctypes.html
.. _ctypes: http://starship.python.net/crew/theller/ctypes/
.. _libffi: http://sources.redhat.com/libffi/
.. _RPython: coding-guide.html#restricted-python
.. _`ctypes-0.9.9.6`: http://sourceforge.net/project/showfiles.php?group_id=71702&package_id=71318&release_id=411554
.. _RTyping: rtyper.html
.. _`malloc removal`: translation.html#malloc-removal

.. include:: _ref.txt
