Packing It All In: Distributing Python With an App
Tuesday, May 13th, 2008Python has lovely built-in distribution tools. They’re great to use if you need a nice, repeatable, easy way to distribute your source code and have it install cleanly on a platform that has its $PATH set up correctly. However, if you want to distribute Python as part of a commercial software package, to platforms that may not even have Python installed, the procedure is not as clean or clear-cut. We devised a way to do it that mostly works, though we have to tweak it somewhat for each release. I’ll show you here our method for doing just that, using the Snakefood program for dependency extraction and a custom script to fill in the gaps that Snakefood can’t quite bridge.
Python is an interpreted language, which means, very basically, that it will not compile down to something that will run natively on any platform. The standard way to get Python to operate is to use the CPython interpreter, a program written in C that reads Python code performs the actions it describes (called “interpreting” it). There are other options, too, like Jython and IronPython, which do basically the same thing as CPython except that they translate the Python code to Java and .NET, respectively. We stick with C. After all, the whole reason we’re doing any of this is that we can’t count on Python being installed. We certainly can’t count on Java of .NET being installed.
As a very basic step one, we need to bundle the CPython interpreter with our app. It’s only about 15MB and is highly compressible, so we can easily include the interpreter, but the standard libraries in Python make for a fairly large installation: the estimated size of Python 2.5.2 is about 180MB. Even if we compress that, it’s still a huge download and a not-so-inconsequential amount of hard drive space. The good news is that we don’t use all of the standard libraries. The even better news is that there’s a pretty simple way of extracting only the files you do need and packaging them into a much smaller distribution. The trick up our sleeve is a small program written in Python called Snakefood. It’s not perfect, but I’ll show ways to get the most out of it.
The first step, of course, is getting Snakefood and installing it. If Python is in your $PATH, just extract the source, then run:
% python setup.py install
from the Snakefood directory, which will install Snakefood to wherever your current Python installation is. You can then run it with:
% python sfood <target file>
from any directory. The target file is the main script of your program. With just that command, it will pull the dependencies from the ‘import’ statements in your main script. That’s probably not good enough, so use the option --follow, which follows all the import statements in each of the imported modules to their leaves. That gets most of what you need.
The output of running Snakefood on a target is not entirely intuitive. It is a list of tuples like the following:
((<source_package_root>, <source_file.py>), (<dest_package_root>, <dest_file.py>))
But sometimes the entry looks like this:
((<source_package_root>, <source_file.py>), (None, None))
It may be tempting, but you can’t skip these lines.
The format of the dependencies tells you that <source_file.py> depends on <dest_file.py>, so you need to preserve it in your pared-down distribution. For us, this is as simple as making a new directory called dist/, and copying the file at path os.path.join(<dest_package_root>, <dest_file.py>) into it. You can make a list of these files directly from the Snakefood output (piped from stdin) with the following script:
import sys
import os
files = set()
for dep in map(eval, sys.stdin):
if dep[1][0] is not None:
path = os.path.join(dep[1][0], dep[1][1])
files.add(path)
else:
path = os.path.join(dep[0][0], dep[0][1])
files.add(path)
Now take this set of files and copy them into your new directory. Preserving the directory hierarchy is nontrivial, but not that hard. Hopefully, you have already created a custom Python installation so that all of the relevant files are in one place anyway. From there, you must find the root of the dependency tree. My custom Python installation is at /Users/matthewmoskwa/ExpanDrive/python, so on each path in the file set, I split on 'python' and copy the new path into the dist/ directory (making sure to create new directory nodes first):
import shutil
for fi in files:
distPath = os.path.join('dist', fi.split("python")[1])
if not os.path.exists(os.path.dirname(distPath)):
os.makedirs(os.path.dirname(distPath))
shutil.copy(fi, distPath)
At this point, the writer of Snakefood claims 99% accuracy. I haven’t measured that claim, but I have found a major drawback: Snakefood misses all __init__.py files, and therefore any import statements in those files. Rather than being smart about it, I just use os.walk() to find all the __init__.py files and copy them into dist/. I then ru my code from dist/ and look for ImportErrors. When I see one, I modify my script to manually copy the missing file to dist/. Not perfect, but it works, and it’s still much faster than doing the whole thing by hand.
The final step is to compile all of the files down to .pyo and remove all the .py and .pyc files. We use a Python script called compileall.py, located in the standard library, to compile, and then
% find . -type f -name '*.pyc' -print0 | xargs -0 rm -rdf
to remove the files. Make sure to run compileall.py with the -OO option to get rid of docstrings and other unnecessary stuff.
Until someone writes an OS in Python or all OSes are guaranteed to have Python installed, this is a pretty good way to distribute Python code to the masses. The next step, actually getting it to run like an application, is up to you, though py2app and py2exe can certainly help.



