Extending Python for Numerical Computation

Submitted for the December 1995 Python Workshop by Jim Hugunin

This work is based on Jim Fulton's original implementation of a matrix object as a container class.

Yet Another Numeric Language?

There are a huge collection of existing numeric programming languages, both commercial (Matlab, S-PLUS, IDL) and free ( Octave, RLaB, Yorick , BASIS , Gnudl , ...). Why on earth would I want to go out and create a new one?

I've used almost all of the available numerical languages at one time or another over the past 8 years. One thing I've noticed is that over time, the designers of these languages are steadily adding more of the features that one would expect to find in a general-purpose programming language.

"Octave has a real mechanism for handling functions that take an unspecified number of arguments, so it is no longer necessary to place an upper bound on the number of optional arguments that a function can accept." Octave FAQ

"However, the most significant improvement is RLaB's list class. A list is a heterogeneous associative array that can contain any data type, including other lists. The list gives RLaB users the opportunity to structure their data as necessary." Why Use RLaB?

"The eval function now provides an optional mechanism for detecting and trapping error conditions that occur during the evaluation of the argument expression." Matlab 4.1 Release Notes

"S-PLUS now supports a number of features from the object-oriented programming paradigm, including classes, inheritance, and methods." S-Plus 3.0 Release Notes

By starting with the Python programming language as a base, I already have functions with optional arguments (and keyword arguments), heterogeneous lists (and dictionaries), a powerful exception system for evals (and everywhere else), and a strong object system that was built into the language from the start; all in a robust, well-designed and portable implementation (thanks Guido!). Rather than trying to retrofit an existing numerical language to support the wealth of features found in a powerful, modern, general-purpose programming language, it makes much more sense to attack the problem from the other direction and add the features of a powerful numerical programming language to Python.

These issues are important even if the only use for this language is to be a numeric lab. However, they become overwhelming if you want to build applications that contain a blend of numeric and more traditional computational needs. I've been implementing a speech recognition system completely within my matrix extended version of python. This task would be nearly impossible in any other numerical language. I'm making extensive use of the sophisticated object and module system of python, as well as its ability to handle sockets, audio i/o, general purpose UI's, etc.

Design Goals

Easily Extensible with FORTRAN and C libraries

This object is based on an original implementation by Jim Fulton. His basic design already interfaced nicely with existing libraries, and I just had to be careful not to break anything.

Close to the Speed of Optimized C for Vector Operations

Testing the system on a Sparc-10 by multiplying a 10000 length vector by itself 1000 times, I found that the python implementation was 20% slower than fully optimized (-O4) hand-coded C. When compared to existing numerical languages, I found the python system to be 30% faster than matlab (70% faster if I only need "float" precision), and 1000% faster than octave.

All Array Operators Supported

All of the basic numeric operators are supported element-wise for matrices. In addition, the basic array comparison and logical operators are provided as methods (because these operators can't be overloaded within python). The jury is still out as to how best to implement matrix multiplication. The two likeliest possibilities are as a method (ie. a.matrixMultiply(b)) or by replacing the modulo operator (ie. a % b).

Arbitrarily High-Dimensional Arrays

Many existing numerical languages only support two-dimensional arrays. This sort of arbitrary restriction is ridiculous in the current era of modern programming languages and practice. The matrix object was designed from the ground up to support arrays of arbitrary dimensions.

Powerful Generalized Product Form Indexing

Mapping semantics are used to support multi-dimensional product form indexing for matrix objects. Multi-dimensional indices are a sequence of python objects, where the first object corresponds to the first dimension of the matrix, and so on. If there are fewer objects in the sequence than dimensions in the matrix, each remaining dimension is selected in it's entirety. The following objects are possible for each dimension:

A single integer, indicating one element from that dimension and a reduction in the number of dimensions in the returned matrix.
A sequence of integers, not necessarily unique, selecting an arbitrary collection of elements along that dimension
A slice object which selects a range of elements along the dimension with optional start index, stop index and step size.

Function Objects for Flexibility

Every arithmetic operator is implemented as a special ofunc object within python. These functions can be called with matrix or scalar arguments to return the normal result. In addition, these functions can be subscripted to indicate that they should be applied at a specific rank, and they can be modified to indicate that instead of a direct product, they should be applied as a generalized outer, or inner product, or as a reduction or accumulation.

Full Range of Primitive Data Types

The matrix object supports arrays of chars, bytes, shorts, ints, longs, floats, doubles, complex floats, complex doubles, and raw python objects. This is essential to allow interfacing with the full range of existing numerical libraries. I know of no other numerical language that supports such a complete collection of data types.

No Changes to the Python Core Required

I was shocked at how much of this system could be elegantly implemented by designing two new object types (one for matrices, and one for functions on matrices) and a module. Nevertheless, this effort did suggest two relatively small patches to the python core to make numeric operations more convenient, and these have been implemented by Konrad Hinsen. These include a**b <--> pow(a,b) and a[2,3,4] <--> a[(2,3,4)]. Both of these additions have already received Guido's preliminary sanction.

Current Status

All of the original design goals have been met. The matrix object is currently in its second alpha release, and there have been minimal reported bugs. Current work includes:

Implementing linear-algebra style multiplication properly
Supporting sparse matrices
Cleaning up printing and string representations of matrices
Extending PyArgs_ParseTuple to handle matrices
Experimenting with Yorick style indexing
General cleanup and bug squashing

Acknowledgements

The initial structure of the matrix object was created by Jim Fulton. Many of the ideas in this object are stolen from the members of the python matrix-sig. In particular, Konrad Hinsen, Paul Dubois, Chris Chase, Thomas Schwaller, Tser-Yuan Ya, David Ascher, Dong Gweon Oh, and of course, Guido van Rossum (I'm sure that I'm missing people here) have been filled with good ideas. I still would have made this object without any of them, but it wouldn't be nearly so well designed.

Jim Hugunin
November 13, 1995