Summary of "Extension Building" Session
"Extension Building Considered Painful": Session Summary
by Greg WardThe "Extension Building Considered Painful" session at IPC7 was very productive, and there was a good consensus in the room as to what's needed, what will work for various classes of users, and what ideas to steal from other related systems (the closest being Red Hat's RPM and Perl's MakeMaker).
Decisions made
Everyone seemed to agree with my proposed mode of operation: the
user downloads a package, unpacks it, and runs a Python script
tentatively called setup.py
(the fact that this name
overlaps somewhat with the module description file in the Python
distribution and in many large module distributions is deliberate,
and may be construed as a bug or a feature). See below for
arguments to setup.py
and what happens when it's run.
Also, nobody disagreed with my contention that the system should
work on "extensions" (ie. modules written in C) or plain ol'
"modules" (written in Python) with no difference in the user
interface. People also seemed to accept the idea of "building"
everything -- C modules, Python modules, eventually documentation --
to a temporary "build library" directory, tentatively called
blib/
. (This is one of the few implementation details
of Perl's MakeMaker
that survived the session.) The
blib/
directory serves (at least) two purposes: it makes
installation near-trivial, and it provides a realistic-looking
pseudo-installation tree for running test scripts. The actual
structure of the blib/
tree has yet to be decided
(although I have definite ideas of how it should look!)
We talked about terminology a bit. This is going to be a sticky
issue, and will have to be one of the first things we thrash out on
a SIG. First, module, extension module
(extension for short), and package will keep their
conventional Pythonic meanings: a single .py
file meant
to be imported by other modules or scripts, a module written in C,
and a collection of modules grouped under a directory containing an
__init__.py
file.
I proposed distribution (or module distribution
when absolute clarity is needed) as a stand-in for package,
which is already taken. A module distribution contains one or more
modules (including extension modules), their documentation, test
suites, and a setup.py
file. (The documentation and
test suites are optional but strongly recommended, especially if
anybody ever comes up with a standard way of documenting Python
modules. But a distribution must have at least one module
and a setup.py
). A single distribution will present
many faces to the world: the developer's source tree, a source
distribution, various binary distributions, the final installed
product, etc. (As I understand things, the source and binary
distributions are what Trove will call resources.)
One thing we didn't have time to decide on was a name for the
silly thing. For a long and formal name, my vote is Module
Distribution Utilities, or distutils for
short. The short name will be needed to group all the various
modules into a package: for instance, we plan to have
distutils.build
, distutils.install
,
distutils.gen_make
, and so forth.
Roles of setup.py
The planned interface to and tasks for setup.py
evolved rapidly throughout the session, mainly driven by the
division of labour identified by Eric Raymond and clearly written
down by Greg Stein, and the workflow diagram drawn by ??? and
expanded upon by me (see below).
It's clear that setup.py
has to contain 1) metadata
about the package, and 2) any custom code needed to configure the
distribution for the current machine. It's not yet clear how these
should be expressed. The first idea is to steal
MakeMaker
's idea of calling a function with a bunch of
named arguments, which then does all the work behind the scenes. An
example setup.py
using this scheme might be:
#!/usr/bin/env python import distutils distutils.setup (name = 'mymod', version_from = 'mymod.py', pyfiles = ['mymod.py', 'othermod.py'], cfiles = ['myext.c'])Obviously, custom configuration code would just be written in Python before or after the call to
distutils.setup
; it's not
clear how to make this code interact with everything that lies
behind distutils.setup
.
The main competing idea is to do things in a more OO manner, by defining a subclass and overriding various attributes and methods. Here's an off-the-cuff illustration of the concept:
#!/usr/bin/env python from distutils import Setup class MySetup (Setup): name = 'mydist version_from = 'mymod.py' pyfiles = ['mymod.py', 'othermod.py'] cfiles = ['myext.c']In this case, it's a bit clearer how to override specific behaviour of all the distutils classes: just subclass and override as needed. Obviously, all of the classes would then have to be well-documented!
Workflow for module distribution
The figure below illustrates the workflow involved in developing, packaging, building, and installing a module distribution (but not testing or documenting it, which ultimately should also be part of the plan).
Note the three kinds of people present in the diagram:
- developer
- creator/maintainer of the base source tree
- packager
- a) someone (presumably the developer, wearing his "packager" hat) who turns the base source tree into the source distribution; or b) someone (possibly the developer, possibly a friend with a machine running X, possibly an archive robot with access to a machine running X) who builds the source distribution to create a "built distribution" for architecture X
- user
- the poor sod who wants to install the distribution on his machine; not necessarily someone who knows (or wants to know) anything about Python
Developer utilities
Obviously, the workflow starts at the top, with the developer's
source tree. While the developer is toiling away, he will probably
want a Makefile
that knows about building Python
modules and extension modules (especially the latter). Rather than
writing his own, he can ask setup.py
to generate one
for him (presumably using the distutils.gen_make
module):
./setup.py gen_makeThen the developer can run
make
, make
test
, and so forth, just as he's probably used to doing
(assuming he's an old-fashioned Unix weenie!). If he doesn't like
Makefiles, or doesn't need one because this is a tiny little
project, he can just ask setup.py
to build, test,
etc. directly:
./setup.py build ./setup.py test(The idea is that
setup.py
will support "commands" --
build
, test
, etc. -- that correspond to
Makefile
targets. That way, nobody ever has to depend
on a Makefile
, but one can be generated for the
developer's convenience and efficiency (especially when working on
large distributions with lots of extension modules to be compiled).)
Packager utilities
When the developer is happy with the current state of his module(s) and it's time for a release, he puts on his "packager" hat and creates a source release:
make dist # or, equivalently ./setup.py distThis will bundle up all the files in the distribution (as listed in a
MANIFEST
file) into an archive file of some sort --
perhaps .tar.gz
under Unix, .zip
under
Windows, etc. The name of the archive file would be derived from
the name and version of the module distribution:
mydist-1.2.3.tar.gz
, for instance.
If he wishes, the developer can stop there and upload his source
release to an archive. Or, he can create built
distributions for all the architectures to which he has access.
(Note that I'm explicitly avoiding use of the more familiar term
binary distribution. That is because a module
distribution might well contain nothing more than .py
files and their associated documentation. Even in those cases,
though, there are reasons for a downloadable resource that can be
immediately installed. The main reason is consistency: it's nice if
naive users only have to deal with one kind of file for Python
module distributions (eg. Red Hat Linux users can just download and
install a bunch of RPMs; whether those RPMs contain .py
or .so
files or both is immaterial). Second, there
might be non-binary files that are generated from files in the
source release, such as man pages generated from SGML source. The
built distribution for a Unix platform might include man pages ready
for installation, so no documentation processing would be
necessary.)
It is important to underscore the concept of packager as a person separate from the developer. This is necessary to support built distributions for multiple platforms, since not many developers have access to a couple of Unix variants, Windows, and Mac -- they'll presumably need some help to make built distributions for one or more platforms. This help may come in the form of a friend (down the hall or around the world) who does have access to a particular platform; it might come in the form of someone who volunteers to keep certain distributions up-to-date for certain platforms; or it might take the form of an archive robot that automates the procedure. Security concerns become increasingly more relevant traversing that list.
I have in mind a couple of possible interfaces for creating built
distributions; furthermore, the idea of "dumb" vs "smart" built
distributions has been forming in my head since Developer's Day.
(Thus it probably doesn't really belong here, since this is meant to
be a summary of the Developer's Day session. So sue me.) First,
consider the creation of a traditional Unix built distribution: a
.tar.gz
file to be unpacked under
/usr/local
(or, more likely in the Python library
context, /usr/local/lib/python1.x
). This could be
accomplished with:
./setup.py bdistwhich would do a build (to put a mock installation tree into
./blib/
) and package the build tree to an archive file
named after the distribution name and version number, and the
current platform, e.g. mydist-1.2.3-sunos5.tar.gz
or mydist-1.2.3-win32.zip
.
However, there's a lot of interest in "smart" installers like Red
Hat's RPM (and I got the impression that there are a couple of
competing possibilities for the Windows world -- someone from the
dark side will have to fill me in on that). My current thinking is
that there should be a separate command (or Makefile
target) for each of these, so you might run
make rpmon a Red Hat Linux box, and
setup.py xxxon a Windows machine (where xxx is the abbreviated name of some smart installer for Windows). Supporting the old-fashioned "dumb" built distribution model is important, though -- not everyone will have that fancy new installer (or they might have a different smart installer).
User utilities
Finally, and perhaps most importantly, we most consider the lucky user who wishes to install a Python module distribution on his computer. Users come in all shapes and sizes, but we're mainly concerned with two distinctions:
- built distribution users: anyone on a popular platform for which a built distribution is available (or necessary: many Mac and Windows people won't have a compiler)
- source distribution users: people on less-popular platforms for which a compiler (and other possibly necessary tools) will most likely be available
Makefile
s, or supply the locations of
needed libraries, rarely want to do such things.
Where to go next
First, I think this topic is big enough to warrant a new sig, which I'm tentatively calling the distutils-sig. The proposed charter for that {will be|has been} posted to the meta-sig, so run over there if you think the whole concept is hopeless and you want to shoot me down in flames before this even gets started (or if you think the name sucks).
Once the sig is created, I'd like to spend some time discussing meta-issues: does anyone violently disagree with the whole idea? is 'distutils' a good enough name? what functionality should be present in the first pass, and what's needed for a full release? Then we can dive into nitty-gritty design and implementation issues.