Extending Python for Numerical Computation
Submitted for the
December 1995 Python Workshop by Jim
Hugunin
This work is based on Jim Fulton's original implementation of a matrix
object as a container class.
Yet Another Numeric Language?
There are a huge collection of existing numeric programming languages, both
commercial (Matlab, S-PLUS,
IDL)
and free ( Octave, RLaB, Yorick , BASIS , Gnudl ,
...). Why on earth would I want to go out and create a new one?
I've used almost all of the available numerical languages at one time or
another over the past 8 years. One thing I've noticed is that over time, the
designers of these languages are steadily adding more of the features that one
would expect to find in a general-purpose programming language.
"Octave has a real mechanism for handling functions that take an
unspecified number of arguments, so it is no longer necessary to place an upper
bound on the number of optional arguments that a function can accept." Octave
FAQ
"However, the most significant improvement is RLaB's list class. A list
is a heterogeneous associative array that can contain any data type, including
other lists. The list gives RLaB users the opportunity to structure their data
as necessary." Why Use RLaB?
"The eval function now provides an optional mechanism for detecting and
trapping error conditions that occur during the evaluation of the argument
expression." Matlab 4.1
Release Notes
"S-PLUS now supports a number of features from the object-oriented
programming paradigm, including classes, inheritance, and methods." S-Plus
3.0 Release Notes
By starting with the Python programming
language as a base, I already have functions with optional arguments (and
keyword arguments), heterogeneous lists (and dictionaries), a powerful exception
system for evals (and everywhere else), and a strong object system that was
built into the language from the start; all in a robust, well-designed and
portable implementation (thanks Guido!). Rather than trying to retrofit an
existing numerical language to support the wealth of features found in a
powerful, modern, general-purpose programming language, it makes much more sense
to attack the problem from the other direction and add the features of a
powerful numerical programming language to Python.
These issues are important even if the only use for this language is to be a
numeric lab. However, they become overwhelming if you want to build applications
that contain a blend of numeric and more traditional computational needs. I've
been implementing a speech recognition system completely within my matrix
extended version of python. This task would be nearly impossible in any other
numerical language. I'm making extensive use of the sophisticated object and
module system of python, as well as its ability to handle sockets, audio i/o,
general purpose UI's, etc.
Design Goals
Easily Extensible with FORTRAN and C libraries
This object is based on
an original implementation by Jim
Fulton. His basic design already interfaced nicely with existing libraries,
and I just had to be careful not to break anything.
Close to the Speed of Optimized C for Vector Operations
Testing the
system on a Sparc-10 by multiplying a 10000 length vector by itself 1000 times,
I found that the python implementation was 20% slower than fully optimized (-O4)
hand-coded C. When compared to existing numerical languages, I found the python
system to be 30% faster than matlab (70% faster if I only need "float"
precision), and 1000% faster than octave.
All Array Operators Supported
All of the basic numeric operators are
supported element-wise for matrices. In addition, the basic array comparison and
logical operators are provided as methods (because these operators can't be
overloaded within python). The jury is still out as to how best to implement
matrix multiplication. The two likeliest possibilities are as a method (ie.
a.matrixMultiply(b)) or by replacing the modulo operator (ie. a % b).
Arbitrarily High-Dimensional Arrays
Many existing numerical languages
only support two-dimensional arrays. This sort of arbitrary restriction is
ridiculous in the current era of modern programming languages and practice. The
matrix object was designed from the ground up to support arrays of arbitrary
dimensions.
Powerful Generalized Product Form Indexing
Mapping semantics are used to
support multi-dimensional product form indexing for matrix objects.
Multi-dimensional indices are a sequence of python objects, where the first
object corresponds to the first dimension of the matrix, and so on. If there are
fewer objects in the sequence than dimensions in the matrix, each remaining
dimension is selected in it's entirety. The following objects are possible for
each dimension:
- A single integer, indicating one element from that dimension and a
reduction in the number of dimensions in the returned matrix.
- A sequence of integers, not necessarily unique, selecting an arbitrary
collection of elements along that dimension
- A slice object which selects a range of elements along the dimension with
optional start index, stop index and step size.
Function Objects for Flexibility
Every arithmetic operator is
implemented as a special ofunc object within python. These functions can be
called with matrix or scalar arguments to return the normal result. In addition,
these functions can be subscripted to indicate that they should be applied at a
specific rank, and they can be modified to indicate that instead of a direct
product, they should be applied as a generalized outer, or inner product, or as
a reduction or accumulation.
Full Range of Primitive Data Types
The matrix object supports arrays of
chars, bytes, shorts, ints, longs, floats, doubles, complex floats, complex
doubles, and raw python objects. This is essential to allow interfacing with the
full range of existing numerical libraries. I know of no other numerical
language that supports such a complete collection of data types.
No Changes to the Python Core Required
I was shocked at how much of
this system could be elegantly implemented by designing two new object types
(one for matrices, and one for functions on matrices) and a module.
Nevertheless, this effort did suggest two relatively small patches to the python
core to make numeric operations more convenient, and these have been implemented
by Konrad Hinsen. These include a**b <--> pow(a,b) and a[2,3,4] <-->
a[(2,3,4)]. Both of these additions have already received Guido's preliminary
sanction.
Current Status
All of the original design goals have been met. The
matrix object is currently in its second alpha release, and there have been
minimal reported bugs. Current work includes:
- Implementing linear-algebra style multiplication properly
- Supporting sparse matrices
- Cleaning up printing and string representations of matrices
- Extending PyArgs_ParseTuple to handle matrices
- Experimenting with Yorick style indexing
- General cleanup and bug squashing
Acknowledgements
The initial structure of the matrix object was created
by Jim Fulton. Many of the ideas in this object are stolen from the members of
the python matrix-sig. In particular, Konrad Hinsen, Paul Dubois, Chris Chase,
Thomas Schwaller, Tser-Yuan Ya, David Ascher, Dong Gweon Oh, and of course,
Guido van Rossum (I'm sure that I'm missing people here) have been filled with
good ideas. I still would have made this object without any of them, but it
wouldn't be nearly so well designed.
Jim Hugunin
November 13,
1995