Tutorial on using Numeric packages and Numarray packages in Python

The first thing to know about Numerical Python packages is that Numerical Python won't let you do anything that standard Python can't do. It just allows you to complete the same tasks that standard Python can accomplish. In fact, it's not the case; many array operations are much more elegant to express with Numeric or Numarray than standard Python data types and syntax. However, amazing speed is the main reason that attracts users to use Numerical Python.

In fact, Numerical Python just implements a new data type: array. Unlike lists, tuples, and dictionaries that can contain different types of elements, Numarray arrays can only contain data of the same type. Another advantage of Numarray arrays is that it can be multi-dimensional -- but the dimensions of the array are slightly different from the simple nesting of the list. Numerical Python draws on the practical experience of programmers (especially those with scientific computing backgrounds who abstract the best features of arrays in languages such as APL, FORTRAN, MATLAB, and S) to create arrays that can flexibly change shapes and dimensions. We will be back soon to continue the topic.

Operations on arrays in Numerical Python are performed by elements. Although two-dimensional arrays are similar to matrices in linear algebra, their operations (such as multiplication) are completely different from those in linear algebra (such as matrix multiplication).

Let's look at a specific example of the above question. In pure Python, you can create a "two-dimensional list" like this:
Listing 1. Python's nested arrays

>>> pyarr = [[1,2,3],
...     [4,5,6],
...     [7,8,9]]
>>> print pyarr
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> pyarr[1][1] = 0
>>> print pyarr
[[1, 2, 3], [4, 0, 6], [7, 8, 9]]

Good, but all you can do with this structure is to set and retrieve elements through separate (or multi-dimensional) indexes. Compared to this, Numarray arrays are more flexible:
Listing 2. Numerical Python array

>>> from numarray import *
>>> numarr = array(pyarr)
>>> print numarr
[[1 2 3]
 [4 0 6]
 [7 8 9]]

The change is not big, but how about using Numarray? Here is an example:
Listing 3. Element operations

>>> numarr2 = numarr * 2
>>> print numarr2
[[ 2 4 6]
 [ 8 0 12]
 [14 16 18]]
>>> print numarr2 + numarr
[[ 3 6 9]
 [12 0 18]
 [21 24 27]]

Change the shape of the array:
Listing 4. Change shape

>>>  = (9,)
>>> print numarr2
[ 2 4 6 8 0 12 14 16 18]

The difference between Numeric and Numarray

Overall, the new Numarray package is API-compatible with earlier Numeric. However, the developers have made some improvements that are not compatible with Numric based on user experience. Instead of breaking any applications that depend on Numeric, the developer created a new project called Numarray. Numarray is missing some of Numeric's features at the time of completing this article, but it is planned to implement them.

Some improvements made by Numarray:

Organize element types with a hierarchical class structure to support the isinstance() test. Numeric uses only character type encoding when specifying data types (but the initialization software in Numarray still accepts old character encoding).
Changes the type coercion rules to keep types in arrays (more common) rather than converting them to Python scalar types.
Additional array properties appear (no more than getters and setters).
More flexible exception handling is achieved.

New users don't have to worry about these changes, and for this point it's better to use Numarray instead of Numeric from the beginning.

Example of timing

Let's experience the speed advantages of operating in Numerical Python over standard Python. As a "demo task", we will create a sequence of numbers and then double them. First are some variations of the standard Python method:
Listing 5. Timer for pure Python operations

def timer(fun, n, comment=""):
  from time import clock
  start = clock()
  print comment, len(fun(n)), "elements",
  print "in %.2f seconds" % (clock()-start)
def double1(n): return map(lambda n: 2*n, xrange(n))
timer(double1, 5000000, "Running map() on xrange iterator:")
def double2(n): return [2*n for n in xrange(n)]
timer(double2, 5000000, "Running listcomp on xrange iter: ")
def double3(n):
  double = []
  for n in xrange(n):
    (2*n)
  return double
timer(double3, 5000000, "Building new list from iterator: ")

We can see the speed difference between the map() method, list comprehension and traditional loop methods. So, what about standard array modules of similar element types? It might be a little faster:
Listing 6. Timing of standard array modules

import array
def double4(n): return [2*n for n in ('i',range(n))]
timer(double4, 5000000, "Running listcomp on : ")

Finally, let’s see how fast Numarray is. As an additional comparison, let's see if the array has the same advantages if it has to be restored to a standard list:
Listing 7. Timer for Numarray operations

from numarray import *
def double5(n): return 2*arange(n)
timer(double5, 5000000, "Numarray scalar multiplication: ")
def double6(n): return (2*arange(n)).tolist()
timer(double6, 5000000, "Numarray mult, returning list:  ")

Now run it:
Listing 8. Comparison results

$ python2.3 
Running map() on xrange iterator: 5000000 elements in 13.61 seconds
Running listcomp on xrange iter: 5000000 elements in 16.46 seconds
Building new list from iterator: 5000000 elements in 20.13 seconds
Running listcomp on : 5000000 elements in 25.58 seconds
Numarray scalar multiplication:  5000000 elements in 0.61 seconds
Numarray mult, returning list:  5000000 elements in 3.70 seconds

The speed difference between different techniques for processing lists is not large, and perhaps it is worth noting, as this is a problem with the method when trying standard array modules. However, Numarray can usually complete the operation within less than 1/20. Restoring an array to a standard list loses a lot of speed advantage.

It should not be concluded by such a simple comparison, but this acceleration may be typical. For large-scale scientific calculations, it is very valuable to reduce the time of calculation from a few months to a few days or from a few days to a few hours.

System Modeling

Typical use cases for Numerical Python are scientific modeling, or perhaps related fields such as graphics processing and rotation, or signal processing. I will illustrate many of Numarray's features through a more practical question. Suppose you have a three-dimensional physical space with variable parameters. In abstractly, any parameterized space, no matter how many dimensions it has, Numarray is applicable. In fact, it is easy to imagine that, for example, a room has different temperatures at different points. My home in New England is already winter, so this question seems to be more realistic.

For simplicity, the example I gave below uses a smaller array (although this may be obvious, it is still necessary to point it out clearly). However, even if you process an array of millions of elements instead of just dozens of elements, Numarray is still very fast; the former may be more common in real scientific models.

First, let's create a "room". There are many ways to accomplish this task, but the most common one is to use the callable array() method. Using this method, we can generate Numerical arrays with multiple initialization parameters, including initial data from any Python sequence. However, for our room, using the zeros() function can generate a cold room with uniform temperature:
Listing 9. Initialize the temperature of the room

>>> from numarray import *
>>> room = zeros((4,3,5),Float)
>>> print room
[[[ 0. 0. 0. 0. 0.]
 [ 0. 0. 0. 0. 0.]
 [ 0. 0. 0. 0. 0.]]
 [[ 0. 0. 0. 0. 0.]
 [ 0. 0. 0. 0. 0.]
 [ 0. 0. 0. 0. 0.]]
 [[ 0. 0. 0. 0. 0.]
 [ 0. 0. 0. 0. 0.]
 [ 0. 0. 0. 0. 0.]]
 [[ 0. 0. 0. 0. 0.]
 [ 0. 0. 0. 0. 0.]
 [ 0. 0. 0. 0. 0.]]]

Each two-dimensional "matrix" from top to bottom represents a horizontal level of a three-dimensional room.

First, we raised the temperature of the room to a more comfortable 70 degrees Fahrenheit (about 20 degrees Celsius):
Listing 10. Turn on the heater

>>> room += 70
>>> print room
[[[ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]]
 [[ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]]
 [[ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]]
 [[ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]]]

Note that there is an important difference when we next operate on Numarray arrays and Python lists. When you select the level of the array -- we will see that the hierarchy method in multidimensional arrays is very flexible and powerful -- you get not a copy but a "view". There are multiple ways to point to the same data.

Let's look at it in detail. Suppose our room has a ventilation device that will reduce the temperature on the ground by four degrees:
Listing 11. Changes in temperature

>>> floor = room[3]
>>> floor -= 4
>>> print room
[[[ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]]
 [[ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]]
 [[ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]]
 [[ 66. 66. 66. 66. 66.]
 [ 66. 66. 66. 66. 66.]
 [ 66. 66. 66. 66. 66.]]]

In contrast, the fireplace on the north wall raises the temperature at each adjacent location by 8 degrees, while the temperature at its location is 90 degrees.
Listing 12. Use the fireplace to keep heating

>>> north = room[:,0]
>>> near_fireplace = north[2:4,2:5]
>>> near_fireplace += 8
>>> north[3,2] = 90 # the fireplace cell itself
>>> print room
[[[ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]]
 [[ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]]
 [[ 70. 78. 78. 78. 70.]
 [ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]]
 [[ 66. 74. 90. 74. 66.]
 [ 66. 66. 66. 66. 66.]
 [ 66. 66. 66. 66. 66.]]]

Here we use some clever indexing methods, which can specify the level along the direction of multiple dimensions. These views should be retained and will be used later. For example, you might want to know the current temperature on the entire north wall:
Listing 13. View the north wall

>>> print north
[[ 70. 70. 70. 70. 70.]
 [ 70. 70. 70. 70. 70.]
 [ 70. 78. 78. 78. 70.]
 [ 66. 74. 90. 74. 66.]]

More operations

The above is just a small part of the convenient functions and array methods/properties in Numarray. I hope to give you some preliminary understanding; Numarray documentation is an excellent reference for in-depth learning.

Since the temperatures in our rooms are no longer the same everywhere, we may need to judge the global status. For example, the average temperature in the current room:
Listing 14. View the averaged array

>>> ()/len()
70.066666666666663

I need to explain it here. All operations you can perform on arrays have corresponding general functions (ufunc). So, the floor -= 4 we used in the previous code can be replaced by subtract(floor,4,floor) . Specify the three parameters of subtract() and the operation can be completed correctly. You can also use floor=subtract(floor,4) to create a copy of floor , but this may not be what you expect, as the change will happen in a new array, not a subset of room .

However, unfunc is more than just a function. They can also be callable objects with their own methods: where .reduce() is probably the most useful one. reduce() works like the built-in function reduce() in Python, and each operation is basic ufunc (but these methods are much faster when applied to Numerical arrays). In other words, () means sum() and () means product() (these shortcut names are also defined).

Before finding the sum of temperatures of each unit in the room, you need to get a one-dimensional view of the data first. Otherwise, you get the sum of the first dimension and generate a new array with reduced dimensions. For example:
Listing 15. Error results for non-planar arrays

>>> (room)
array([[ 276., 292., 308., 292., 276.],
    [ 276., 276., 276., 276., 276.],
    [ 276., 276., 276., 276., 276.]])

Such a space and may be useful, but it is not what we want to get here.

Since we are modeling a physical system, let's make it more realistic. There is a tiny airflow in the room, causing the temperature to change. When modeling, we can assume that for every small time period, each unit will be adjusted according to the temperature around it:
Listing 16. Micro-air Flow Simulation

>>> def equalize(room):
...  z,y,x = map(randint, (1,1,1), )
...  zmin,ymin,xmin = maximum([z-2,y-2,x-2],[0,0,0]).tolist()
...  zmax,ymax,xmax = [z+1,y+1,x+1]
...  region = room[zmin:zmax,ymin:ymax,xmin:xmax].copy()
...  room[z-1,y-1,x-1] = sum()/len()
...  return room

This model certainly has something unrealistic: a unit does not adjust only based on the temperature around it without affecting its adjacent units. Nevertheless, let's take a look at how it performs. First we choose a random unit -- or in fact we choose the index value of the unit itself on each dimension plus 1, because we get the length instead of the largest index value through the .shape call. zmin , ymin and xmin ensure that our minimum index value is 0 and we will not get negative numbers; zmax , ymax and xmax are not actually needed, because the index value after minusing 1 in each dimension of the array is used as the maximum value (like a list in Python).

Then we need to define the area of the neighboring unit. Since our room is small, we often choose to the surface, edge or corner of the room -- the region of the unit may be smaller than the largest subset of 27 elements (3x3x3). It doesn't matter; we just need to use the correct denominator to calculate the average. This new average temperature value is assigned to the previously randomly selected unit.

You can perform as many averaging processes in your model. Only one unit is adjusted per call. Multiple calls will use some parts of the room to gradually become averaged. Even if the array is dynamically changed, the equalize() function can still return its array. This is very useful when you want to average only one copy of the model:
Listing 17. Execute equalize()

>>> print equalize(())
[[[ 70.    70.    70.    70.    70.   ]
 [ 70.    70.    70.    70.    70.   ]
 [ 70.    70.    70.    70.    70.   ]]
 [[ 70.    70.    71.333333 70.    70.   ]
 [ 70.    70.    70.    70.    70.   ]
 [ 70.    70.    70.    70.    70.   ]]
 [[ 70.    78.    78.    78.    70.   ]
 [ 70.    70.    70.    70.    70.   ]
 [ 70.    70.    70.    70.    70.   ]]
 [[ 66.    74.    90.    74.    66.   ]
 [ 66.    66.    66.    66.    66.   ]
 [ 66.    66.    66.    68.    66.   ]]]

Conclusion

This article only introduces some of the features of Numarray. Its functions far more than that. For example, you can use a fill function to fill an array, which is very useful for physical models. You can specify a subset of an array not only through layers but also by indexing an array -- this allows you to not only operate on discontinuous fragments in an array, but also redefine the dimensions and shapes of an array in various ways through the take() function.

Most of the operations I described earlier are aimed at scalars and arrays; you can also perform operations between arrays, including between arrays of different dimensions. This involves a lot of things, but all of them can be done intuitively through the API.

I encourage you to install Numarray and/or Numeric on your system. It's not difficult to get started, and the quick operations on arrays it provides can be applied to a wide range of fields -- often what you'll never expect at the beginning.