This chapter presents three interesting case studies in which more complex PDL constructs are used. Starting out will be an example of PDL's use in plotting a weather map. Next is an in-depth discussion of object-oriented PDL in which complex numbers are added as a data type. A numerical simulation example rounds out this tour of hairy PDL examples.
[or some such intro paragraph]
The object of this example is to produce a contour plot of global temperatures
overlaid upon a linear projection of a world map. Let's take a look at the finished product in
fig.
.
The first difficulty is to draw the world map. This is actually quite a trick, as map databases are large and complex. A good mapping facility is generally an add-on costing hundreds of dollars in packages such as MATLAB. Fortunately, there exists a perl module which is an interface to the popular GMT 6.1map plotting package. This perl extension, PDL::Graphics::PGPLOT::Map, uses PGPLOT to plot maps extracted from the GMT coastline database.
worldmap ({WEST => $minlon, EAST => 180, SOUTH => -90, NORTH => 90,
PROJECTION => 'LINEAR'});
The above command sets the boundaries of the map and requests a simple linear projection. A linear projection is convenient for plotting contours.
Next we will find something to plot. In this case, we choose National Center for Environmental Prediction 'AVN' model data6.2. The AVN model data consist of various atmospheric parameters (temperature, humidity, etc.) stored on latitude/longitude grid points and pressure levels (1000 millibars, 850 millibars, etc.) for the entire world. These data files are generated by complex weather models initialized using input data from weather balloons, airplane measurements, satellite measurements and other sources. They are stored in netCDF (network Common Data Format), a binary format used to store gridded data of all types in a system-independent, easy-to-access fashion. Here we open up an example file and grab a global surface temperature grid:
my $infile = "avn_97.032.00_cdf";
my $nc = PDL::NetCDF->new($infile);
my $tsfc = $nc->get("T");
$tsfc = $tsfc->slice(":,:,(0),(0)");
We open this file using an extension to PDL called PDL::NetCDF which allows one to easily read and write PDL data objects to/from netCDF files. Then we grab a four dimensional grid of temperature data (longitude by latitude by pressure level by time) and slice out the surface data for a single time step into $tsfc.
We must now determine how the gridded data we just read is oriented on the globe. Fortunately these netCDF data files have several variables included along with the actual weather data which tell where the grid starts and ends and how widely spaced the data are. These values are used:
Once these variables are read in, we rotate the surface temperature grid so that it starts at 180 degrees west longitude (the edge of the map), instead of 30 degrees west:
$tsfc = $tsfc->rotate(int(($lo1 - $minlon)/$di));
In the final lines of the script, we generate contour labels for the graph and plot the temperature contours:
my $ncont = 8;
my $range = $tsfc->max - $tsfc->min;
my $interval = int($range/$ncont);
my $contours = (sequence ($ncont) * $interval) + int($tsfc->min) + 1;
my @labels = list $contours;
cont ($tsfc, {LABELS => \@labels,
LABELCOLOR => 3,
CHARSIZE => 0.7,
LINEWIDTH => 1,
CONTOURS => $contours,
COLOR => 2,
TRANSFORM => pdl ([$minlon, $di, 0, $la1, 0, $dj])});
The contour labels are determined dynamically from the data in the temperature field. Only the number of contour levels (12) is hard-coded. A linear transform based on the grid specification variables read in earlier is used to line up the contour map with the coastlines. See perldl> help cont and the PGPLOT library documentation6.3 for more information.
The entire script is shown in figure
.
This example used a convenient PDL interface to the popular GMT mapping package. It also used the PDL extension PDL::NetCDF to read in a large global grid of temperature data. Finally, some of the more advanced features of PDL::Graphics::PGPLOT were used to overlay a contour map of the data. All the software used in this example is freely available and open source. As is typical in the open source world, half the battle can be installing all the necessary software. Here is a list of necessary packages to get this example to work:
PDL provides a very powerful datatype: Vectors and matrices6.4 can be used to represent almost any real-world problem. However, often one wants to extend the information stored in a piddle or even the functionality that PDL provide.
Extending PDL is easy, you just have to put your new method into the PDL package and it can then be used just like any other PDL method:
sub PDL::ring_modulate {
my $carrier = shift;
my $signal = shift;
$carrier * $signal;
}
Now ring_modulate can be used just like any other pdl method:
$car->ring_modulate($sig)->clip(-1,1);
Attaching data is also very easy, as every piddle has an attached attribute or header hash6.5 that you can fill with any data you like. The PDL::Audio-module for example attaches information like sampling rate and filetype to a piddle:
sub raudio {
my $audio = <... read some audio stream into a piddle ...>
my %hdr = (
rate => $samplingrate,
filetype => $filetype,
channels => $audiochannels,
);
# attach the header (actaully a reference to it)
$audio->sethdr(\%hdr);
$audio;
}
This information can then be queried or changed by anybody who is interested:
$srate = $audio->gethdr->{rate};
$audio->gethdr->{filetype} = FILE_AIFF;
If you want to attach this information not only to the piddle itself but also to all copies of it, you can set the headercopy flag, using hdrcpy:
# enable header-copying $piddle->hdrcpy(1); $otherpiddle = $piddle + 1; # now both $piddle and $otherpiddle share the same header hash.
This is not as useful as it sounds, however, since all piddles created that way share the same hash, so one must be careful when changing parts the header as this change will be effective in many piddles.
Both methods of extending PDL have drawbacks: Putting another function into the already crowded PDL namespace can be difficult6.6, and using verbose method names (e.g. audio_phaseshift instead of simply shift) is counter-productive: After all, we already know that our piddle represents some audio signal, and so should perl.
Similarly, attaching extra data to a piddle doesn't change its nature: All operations still treat it as a regular piddle.
Neither of the above methods lets us create piddles that ``know'' that they are audio files, or ``know'' that they should be multiplied like complex numbers instead as of vectors.
While we are at complex numbers... I once needed complex arithmetic for PDL myself, but couldn't find good support for it, so I sat down and tried to write a module implementing complex numbers using regular piddles. This was not only a very useful way to learn more about PDL, after experimenting and a lot of help from others it also became quite natural to use, even more than I initially expected. As it turned out, subclassing PDL is not difficult, but it is not always obvious how to do it.
The most natural representation for complex numbers in PDL is a vector
with two elements. I decided to put the real part into the first element
and the imaginary part into the second, so
becomes the piddle [ 5 -3 ].
The first thing we need is a way to create complex numbers. Instead of providing a specialised constructor as one would like for other objects6.7, I opted for typecasts:
package PDL::Complex;
# cast from PDL to PDL::Complex
sub cplx($) {
bless $_[0]->slice('');
}
*PDL::cplx = \&cplx;
# cast back
sub real($) {
bless $_[0], PDL::;
}
# provide 'i'
$i = cplx pdl 0, 1;
sub i() { $i->copy }
What's going on here? The cplx function expects a piddle (it doesn't really matter wether it's already a complex number of type PDL::Complex or a regular piddle (type PDL)) and re-blesses it into the PDL::Complex namespace. This has the effect of changing it into a complex number.
The unusual slice('') creates a new piddle object that references the same data as the original one. It is this operation that really makes it a cast. Without it, the original piddle would be changed, and using copy would destroy the dataflow a PDL-user expects.
The line
*PDL::cplx = \&cplx;
is a bit tricky: It puts a copy of the cplx function into the PDL namespace, so that regular piddles can be converted to complex numbers. I decided that it makes sense to call cplx on complex numbers as well, as it is a no-op then and can be used safely to ensure that a pdl is of complex type, much like the standard pdl constructor also accepts a piddle as argument.
The same could be said about real, but since this is less often used on nirmal piddles it makes sense to provide only a version that works on complex numbers.
The next line of code contains the first call to cplx: It creates
the complex constant
(
) and puts it into the global variable
$i. To make matters even easier a function with the same name is
created. This allows users to write code like the following:
5+3*i pdl(1,2,3) + pdl(3,4,5)*i $somepiddle *= 1-2*i;
Which looks very natural and makes the casting operators less neccessary.
Apart from implementations for the complex arithmetic operators, there is another detail one must know when subclassing PDL: PDL functions need a way to create result values of the right type. While PDL knows how to create an object of type PDL, it has no idea how to create an Icky::Yicky::Data. Instead, it calls the initialize method of the class, e.g. Icky::Yicky::Data->initialize.
All standard PDL functions always generate a result with the same type as their first argument (as if they were method calls, which, in some sense they all are).
If the new datatype can be implemented using a plain PDL, you do not have to supply an initialize method; PDL's default initialize will return an empty piddle blessed into the correct class. This is adequate for simple classes like PDL::Complex.
For more complex cases you can use a special feature of PDL: When PDL sees that the object it got passed is not a piddle, but a hash, it tries to find the real piddle in $object->{PDL}, that is, you can create a fairly standard perl object which still acts as a PDL by writing a constructor like the following:
sub initialize {
bless {
mydata => ...,
PDL => PDL->null,
datax => ...,
}, __PACKAGE__;
}
Unless the derived objects act just like plain piddles6.8, you also need to override the perl operators for the new type.
The first step is to implement these new operators as normal functions. Here are two examples, that implement Csub (addition)6.9. and Csin (the complex sine):6.10
pp_def 'Csub',
Pars => 'a(m=2); b(m=2); [o]c(m=2)',
Code => q^
$GENERIC() ar = $a(m=>0), ai = $a(m=>1);
$GENERIC() br = $b(m=>0), bi = $b(m=>1);
$c(m=>0) = ar + br;
$c(m=>1) = ai + bi;
^
;
pp_def 'Csin',
Pars => 'a(m=2); [o]c(m=2)',
Code => q^
$GENERIC() ar = $a(m=>0), ai = $a(m=>1);
double s, c;
SINCOS (ar, s, c);
$c(m=>0) = s * cosh (ai);
$c(m=>1) = c * sinh (ai);
^
;
The obvious (but, as usual, imperfect) way to make '-' act like 'Csub' is to use the overload pragma:
use overload
'-' => sub {
my ($a, $b, $swapped) = @_;
$b = r2C $b unless UNIVERSAL::isa $_[1], PDL::Complex::;
return $swapped ? Csub ($b, $a) : Csub ($a, $b);
};
'sin' => \&Csin,
Now you can see why I used Csub for this example and not Cadd:
subtraction is not symmetric, so the implementation must take care to swap
the operands when neccessary6.11. To support cases like 5
- $complex and $regularpiddle - $complex, where one of the
operands is a plain number or a regular piddle consisting of all real
numbers, the operator also tries to convert the operand to the complex
domain first (r2C maps each number
to the vector/complex number
).
This is also the reason why even operators like Cadd and Csub have to be overloaded - although their mathematical behaviour is identical to the standard '+' and '-' operators, the type of the result differs.
Fortunately, the sine function is much simpler, it is nothing more than a direct call to Csin.
If the world were perfect we would be finished. But of course it is not: while the overloaded definitions seem to work in most cases6.12, perl sometimes ignores our definitions and uses the regular PDL-operator. For example, when the first operand is a regular piddle, perl chooses the standard operation6.13. Sometimes, for example in 5*i+pdl(6), perl calls the standard operator for no obvious reason.
Fortunately, this can be fixed in the same generic way for each extension module, so there is hope that PDL will take care for this problem in the future. Until then, however, you also have to replace all binary operators in PDL, which is a bit tricky. Here is how it's done for addition:
{
package PDL; # place definitions in the PDL namespace
no warnings; # perl might complain about redefinitions
sub cp($;@) {
my $method;
if (ref $_[1]
&& (ref $_[1] ne 'PDL')
&& defined ($method = overload::Method($_[1], '+')))
{ &$method($_[1], $_[0], !$_[2])}
else { PDL::plus (@_)}
}
use overload
'+' => \&cp,
;
}
For subtraction, multiplication and division you have to replace the '+' by the corresponding operator name and the call to PDL::plus by a call the PDL::minus, PDL::mult or PDL::divide. Just look at the source of the Complex module6.14 for appropriate definitions.
However, once all this is in place, everything should work smoothly:
perldl> use PDL::Complex; perldl> p 1 + i * r2C pdl 1,7 [ [1 1] [1 7] ] perldl> p sin 5 * i ** 0.5 [-6.5908487 -15.829068]
The majority of this book has described the capabilities of PDL as a fast, flexible language for data analysis. In the present section, I will describe some facilities and methods for data generation.
Low level languages such as Fortran or C are often used for numerical analysis, mainly for speed and because of the availability of numerical libraries. As data sets become larger, the speed advantage of a compiled program becomes less important -- so long as array handling is done in a low-level fashion, as in PDL, there is little advantage in using a compiled language for the high-level part of the algorithm.
Using PDL has significant advantages over a traditional compiled language. The interpreted shell allows for an adaptable toolkit approach to the development of algorithms and for close integration between numerical algorithms and a flexible graphics front-end. The slicing methods intrinsic to PDL can also make the structure of the high-level algorithm clearer in the implementation.
My aim here is to introduce the use of PDL in numerical simulations -- techniques specific to PDL, and methods available in the PDL distribution -- rather than to provide a tutorial in numerical methods. As a result, the examples are chosen for clarity. Better algorithms for some problems are included in the PDL distribution, while the methods illustrated are easily extended using improved algorithms from elsewhere (e.g. [xxx]).
We now discuss a simple example, the numerical integration of a
one-dimensional wave equation,
| (6.1) |
| (6.2) | |||
![]() |
(6.3) |
The code implements the first-order Lax method, a simple algorithm for this time-dependent problem. In effect, the code assumes that the gas within each numerical grid cell is of uniform density. This leads to quite high numerical diffusion, which is added to further to maintain stability. More complex algorithms which approximate the data within the cells as linear, quadratic or higher power-laws maintain waves for longer without damping. However, the Lax method illustrates well the way in which these more complex algorithms can be coded in PDL.
The guiding principle is to avoid the generation of temporary arrays within the simulation loop. This means that some obscure preparatory work is needed before the loop is reached. Having got past this, the algorithm is clearly displayed once the simulation loop is reached.
In this code, I have used separate arrays for primitive data (the values used to calculate the fluxes) and conserved data (the values to which the fluxes are added). This distinction is not really necessary in the present example, but makes it easier to generalize the code to more complex systems.
![]() |
The first statements in the code allocate all the array space which will be needed: the data array (allowing ghost cells for the implementation of boundary conditions), the flux array (which stores rate of transfer of material across the edges of the cells), and an array to accumulate the change in the data over a time step before it is added back to the data.
Next, various slices of these arrays are prepared. By retaining these
slice PDLs, no calls of `slice' need to be made within the loop. The
names of these slices are chosen to reflect their contents:
$eleftp contains the values of the data to the left of the
current edge, $crghtp to the right of the current cell.
Various constants characterizing the simulation are then set up: the
boundary conditions, the maximum wave speed (which limits the allowed
timestep of the algorithm). The initial contents of the simulation
grid are set in the primitive (or physical) array $p, and
transferred to the conserved variable array $u by the
subroutine ptou(). Here this is a rather pointless direct copy
- the relation between primitive and conserved variables is less
trivial in many more interesting examples.
![]() |
Having laid out this ground work, we now reach the simulation loop. First, the primitive variables at the start of the timestep and the allowed size of this step are calculated. The timestep is a constant for this simple example, but often it will depend on the contents of the simulation grid. One common limiting factor for the timestep is the so-called `Courant condition'. This stability limit may be interpreted as meaning that, in the physical problem, the waves carrying perturbations from any cell will not have propagated beyond its immediate neighbours within the timestep. Methods for determining the limits which apply to any specific problem are described in most textbooks on numerical analysis.
Next, the values in the `ghost' cells beyond the edge of the
simulation are set. The numerical algorithm uses data values on
either side of each interface to calculate the flux across the
interface, so to calculate the fluxes at the edge of the grid, extra
data has to be `made up' for fictitious cells outside it. Here, I
have set the cells beyond the grid to have the same value of
,
but the opposite value of
: the effect of this is that waves are
relected from the ends of the grid.
Then, the flux subroutine is called, using two different `views' of
the grid data: $eleftp, the data value in the cell at the left
side of each interface, and $erghtp, the data at the right
side. The flux function `injects' the values of the flux into a
pre-existing pdl, $f. In the present example, the flux is a
simple function of the data and so we have coded the flux function in
perl: often it will be far more efficient to do this using PP.
The reason for `injecting' the flux data into $f is made clear
in the subsequent lines. We could have let the flux function
automatically generate a fresh PDL with the flux values. However, we
have already prepared two different views of the original flux array
at the beginning of the routine: $leftf, the flux through the
left edge of each cell, and rghtf, the flux through the right
edge. The difference between these fluxes is then calculated using
minus: both $df = $leftf-$rghtf and $df .= $leftf-$rghtf
would allocate a new array at each timestep, the second
as an intermediate array which would then have to be copied into
$df.
The flux difference is then multiplied by the timestep and added to the conserved value PDL to update the contents for the new tiemstep; the timestep and step counter are incremented to correspond to this evolved state. Finally, a plot of the data is made every 10 timesteps, to illustrate how the grid data is evolving.
Studying the development of the solution in this example, you will see that the total amount of gas within the grid remains constant to high accuracy. The algorithm was designed to achieve this, which is satisfying. The waves which move apart also have reasonably constant amplitude, which is good. However, the initially sharp shape of the solution rapidly decays to something rather smoother, where in the equations we are modelling the waveforms should remain sharp. All numerical algorithms of the form we have discussed have some of this broadening, called numerical diffusion: somewhat more complex higher order algorithms can nevertheless improve this performance dramatically.
The simple algorithm defined in the previous subsection can be improved significantly by changing the algorithm to assume that there is a gradient of density of the conserved variables within the grid. The effect of this is to maintain sharp features in the grid data with far less artificial diffusion.
The version I will now show has also been generalized to two spatial dimensions - the further extension of this to even higher dimensional spaces is fairly simple, although the computational time requirements increase so rapidly that even going as far as three dimensions will slow things down dramatically.
I should emphasise again that the algorithm used here is not really of a quality to be used in practise. In addition, there are various parts of the algorithm presented here which might better be transferred into PP for `production' runs.
![]() |
I have tried to keep the method as similar as possible to the first
example. In Table
, the first part is shown, in which
various data arrays are defined: the dimensions of the computational
grid are set in the array @d, a flag variable $second
says whether the code should run with second order accuracy (in space
and time). An additional conserved value is required to treat the
two-dimensional wave equation. Additional arrays are used to save
partial sums of fluxes and gradient values. The boundary constant
becomes an array, with an element for each dimension.
![]() |
In Table
, many slices of the data piddles are defined.
In particular, one slice is defined for each dimension of interest, so
we can treat the several dimensions of the physical problem by looping
over the elements in these arrays. Most if the slices are fairly
obviously related to those used in the one-dimensional algorithm. I
move the dimension of interest to be the first spatial dimension in
the first line so that algorithm looks the same in each dimension:
this move is unwound in the definition of the flux arrays
@leftf and $rghtf, so the correct components of the conserved
values are updated.
Numerous extra slices are also defined to help in the calculation of
the gradient arrays for the second-order algorithm. The components of
$ga contain the differences between physical values between
cells neighbouring a given edge of the grid. The values on opposite
edges of each grid cell are then combined in the elements of
$gb to give an average gradient within the grid cell.
![]() |
In Table
, we show the hub of the algorithm. The first
part is just the same as the first-order algorithm, iterated over
dimensions to turn it into a multidimensional code. The is all that
happens if $second is not set.
If $second is set, however, the first-order result is just used
to update the physical values to provisional half time-step values, so
a leapfrog scheme can then update the results with a correction which
is second-order in time. The second order step is similar to the
first order one, except that in the flux calculation, the primitive
variables on either side of the grid edge are corrected for the
gradient calculated within the grid cell.
The gradient is calculated from those at each edge of the grid cell
using an averaging function. This could just be the arithmetic mean
(in which case the gradient within the cell would just be a centred
difference between the primitive values in the neighbours of the
cell). A somewhat different, `limited', value is actually defined in
the function av (see Table
: such limited
averages have better properties in the presence of sharp
discontinuities in the solution, preventing ringing by decreasing the
order of the simulation just in the neighbourhood of the
discontinuity.
![]() |
In Table
, gives a few additional functions for this
algorithm. While utop(), ptou() and maxv() are
unchanged, flux() has had to be altered to take an extra
argument, the direction of the current sweep of integration, and to
cater for the extra conserved variable. The awkwardness of this
function and the new function av() (not to mention their
slowness in practise!), indicate that they would best be implemented
in PP.
The dynamics of a perfect gas is treated in the PDL::Hydro package in
the PDL distribution. By putting use PDL::Hydro; at the head
of the evolution codes above, I can load the
flux(), utop(), ptou(), maxv() and av()
routines provided by this package -- the versions provided for the
wave equation must be removed. The value of $nq needs to be
increased (to provide space for density, velocity and pressure
variables), and $bc must be changed to an appropriate value.
Finally, it is better to change the initial conditions to having a
step in pressure rather than density (
($t =$p->slice("(2),0:".($ncell/2-1))) .= 2;
) -- otherwise, the initial
grid contains are an equilibrium, and the values in the grid will just
remain constant!
The flux() routine treats the interaction between neighbouring
cells as a shock tube problem between uniform states, and calculates
the flux across the interface between the cells by an exact method.
This approach, based on a method introduced by Godunov, is used in
many widely used gas- and hydrodynamics algorithms. By default,
PDL::Hydro assumes the gas being studied is adiabatic, with a
adiabatic constant of
appropriate for monatomic gas. A value of
is perhaps more typical for air, for instance, and this may be
set by including setadc(1.4) at the start of the routine.
In a typical one-dimensional simulation (Figure XXX), the high pressure to the left of the grid drives a sharp shock to the right, and a rarefaction moves to the left (this rarefaction is smooth in the full solution). In between there remains a contact discontinuity between hot and cool gas. In reality, this contact discontinuity should remain sharp: its smooth form is again the result of the numerical viscosity introduced by this low order scheme.
- initial value problems by RK
- boundary value problems, via PDL::Func
lapeyre 2006-07-23