mdarrays - a library to simplify using multidimensional arrays in C


Table of Contents

1. Quick Start
2. About
Introduction
Other bounds-checking implementations
Features of mdarrays
How it works
3. Nitty gritty of C multidimensional arrays
4. List of routines
Allocation / Deallocation
Tests and Assertions
File Storage
Printing
Formatting
Iterators
Copy
Bibliography

Chapter1.Quick Start

mdarrays is a library to simplify using multidimensional arrays in C.

The three most important reasons to use mdarrays are:

  • easy to allocate arrays of pointers to pointer of any dimension

  • arrays are dynamically allocated

  • off-by-one errors in any dimension are guaranteed to be detected by a good memory checker such as Valgrind (http://valgrind.org).

Example code, showing creation of a 4D array of doubles, the dimension sizes being decided at run-time:

// uint is typedef to unsigned int
void a_function(uint dimension1_size, uint dimension2_size, 
                uint dimension3_size, uint dimension4_size)
{
  double ****cc;
  cc = MDARRAYS_ALLOCATE(double, dimension1_size, dimension2_size, 
                         dimension3_size, dimension4_size);
  cc[1][2][1][3] = 5.0;
  MDARRAYS_FREE(cc);
}
    

Chapter2.About

Introduction

Two of the biggest pitfalls creating mathematical software in C are the lack of array bounds checking, and the difficulty in working with multidimensional arrays. Bounds checking is desirable during debugging, but lack of bounds checking is desirable after a program has been debugged, because the speed penalty for bounds checking is enormous. This package addresses these two problems.

Other bounds-checking implementations

Several solutions are currently available:

  • A C language interpreter would easily monitor run-time exceptions including bounds violations. EiC ( http://eic.sourceforge.net [link currently broken] ) is a freely available C interpreter. However, in the case of a large or complex project, this is not a good solution for bounds checking.

  • gcc has been patched to include bounds checking [jones] [brugge]. However, this does not catch all bounds violations for multidimensional arrays. Also, it is not yet part of the official gcc distribution.

  • Many different run-time memory checking schemes have been developed over the years. There are none that can detect all bounds violations for all multidimensional arrays, because basic multidimensional arrays in C are allocated as a single block of memory.

As of the writing of this paper (December 2003), I know of no other Free software that easily implements bounds checking for all dimensions of a multidimensional array in C.

Features of mdarrays

This package adds bounds-checking to C programs by allocating arrays as pointers to pointers and using a memory checking program such as Valgrind. Unfortunately, Valgrind currently only works on i386 processors, so another memory checking library must be used for other architectures.

mdarrays has the following features:

  • provides complete bounds checking for multi-dimensional arrays

  • has very low overhead (normally < 10%) when the memory-checker isn't running

  • simplifies allocation of high-dimensional arrays

  • simplifies saving and loading high-dimensional arrays to files

  • simplifies passing high-dimensional arrays to functions and pointers

  • able to check bounds for an array by adding a simple macro function. This allows runtime bounds checking for some arrays even when the program is not running under the memory checker.

How it works

mdarrays works by allocating a chunk of memory large enough to hold a metadata structure plus the pointer array for the highest dimension. It then recurses through the array allocating as needed. The pointer returned to the user is moved forward to the start of the array, so the user can access the array like a normal array, but the metadata is still sitting there behind it. mdarrays functions move back and read the metadata, so that's how mdarrays can keep track of things like array dimensions and formatting styles. So if you don't use MDARRAYS_FREE() to free your arrays, you'll leak memory, even for 1D arrays.

Chapter3.Nitty gritty of C multidimensional arrays

In C, multidimensional arrays can be implemented in at least two ways:

  1. The built-in multidimensional array mechanism:

    int aa[3][4];
            
  2. An array of pointers to arrays:

    int **bb;
    (some code here to allocate storage for bb)
            

Both of these methods can access elements using index notation:

aa[1][1] = 4;
bb[1][1] = 4;
    

However, from the compiler's point of view, they are quite different.

Assume aa and bb are defined as above. A "true" array (int aa[3][4]) is allocated a single block of memory before the program starts if the variable is global, or at the start of the function if the variable is local. Let's say you want to know the value of aa[1][2]. The compiler first has to find the memory location holding the value. The compiler knows that each "column" has 4 elements, so it finds the memory location from the base aa pointer using &aa + 1*4 + 2. Now you can probably see why a function which receives array parameters must know the dimensions of the array. If it didn't, the compiler wouldn't know how to calculate the offsets when accessing the array.

In the case of a "true" array, the compiler asks for a single block of memory, so a machine-code memory checker like Valgrind will not realize if you go past a bound, as long as the resulting pointer points within the block of memory for the whole array. For example aa[1][5] will not cause an error with Valgrind.

A "pointer" 2D array, on the other hand, is a pointer to a pointer. At the ** (pointer to pointer) level, bb points to the base of a 1D array of pointers. For example, bb[2] would be the third pointer in that array. bb[2] would point to a 1D array of integers. bb[2][1] would point to the second integer in that array.

In the case of a 2D "pointer" array, each "column" is a separately-allocated block of memory, so if any dimension is accessed out of bounds Valgrind will catch it.

Advantages of "true" arrays:

  • Faster

  • Simpler to allocate

Advantages of "pointer" arrays:

  • Bounds checking using a memory checker works for multidimensional arrays

  • Simpler to pass to functions and other pointers

  • Easier to change bounds at run-time

  • Multi-dimensional arrays can have odd sizes (they don't have to be rectagular)

The speed advantage of using "true" arrays can be significant (200%) in some cases, but a typical simplified test is included in this distribution as array_access_test.c. On my 366 Mhz Pentium II laptop, the speed difference was about 10%. Based on this result I decided to focus entirely on pointer arrays.

Because arrays based on pointers to pointers have a separate memory allocation for each element in all but the last dimension, an off-by-one error at the boundary of any dimension will be detected by a good runtime memory checker.

Chapter4.List of routines

Allocation / Deallocation

void* MDARRAYS_ALLOCATE(type, dimension size 1, dimension size 2, ...)

returns a void pointer to a new array. See below for a description of type. Dimension sizes are the sizes of the first, second, and so on dimension.

void* MDARRAYS_ALLOCATE_AND_CLEAR(type, dimension size 1, dimension size 2, ...)

same as MDARRAYS_ALLOCATE, but fill the array with value zero.

void* MDARRAYS_ALLOCATE_AND_FILL(type, value_pointer, dimension size 1, dimension size 2, ...)

same as MDARRAYS_ALLOCATE, but fill the array with the value pointed to by value_pointer. value_pointer must point to the same type as the array, and this is not checked at compile-time or run-time!

void MDARRAYS_FREE(cc);

free array cc.

The type parameter is actually used two ways internally by the library. The first is as a macro stringification, which means that the macro compares what you use in the type parameter to a list of strings. The strings allowed are: char, int, bool, unsigned int, uint, long int, float, double, long double, real. Internally the bool type is really unsigned char type. The second way is to use sizeof(type) to determine the allocation needed for each array element.

If you are creating an array of pointers to objects, use MDARRAYS_ALLOCATE_OBJECT, MDARRAYS_ALLOCATE_AND_CLEAR_OBJECT, MDARRAYS_ALLOCATE_AND_FILL_OBJECT, which are identical to the above routines except that they allow type to be anything (internally, mdarrays just creates an array of void pointers and allocates space based on what sizeof(type) says).

Tests and Assertions

bool MDARRAYS_IS_MDARRAY(array)

Return 1 if array is an allocated mdarray pointer array, 0 otherwise

MDARRAYS_ASSERT_BOUNDS(array, ...)

Check that the indices given by ... are within bounds for array. Abort with error if not.

bool MDARRAYS_EQUAL(array1, array2)

If all the elements in array1 are the same as the elements in array2, return true, otherwise return false. Use DM_*_EQUALS() functions for comparing floating types. Only works for basic types.

File Storage

void MDARRAYS_SAVE(char *filename, void *array);

Save an mdarray to a file.

void MDARRAYS_LOAD(char *filename, void *array);

Load an array from a file into array.

void MDARRAYS_LOAD_2D_DYNAMIC(char *filename, void **array, type, uint *rows, uint *columns);

Load a 2D array from a file, and creates a new 2D mdarray with the correct sizes. Type is the type such as "unsigned int" or "real", much like the type used in MDARRAYS_ALLOCATE(). Type is stringified by the preprocessor, so it is whitespace-sensitive (so don't put two spaces between "long" and "int" in "long int". The number of rows and columns loaded is returned in *rows and *columns.

void MDARRAYS_SAVE_NONMDARRAY(filename, array, type, type_size, dimensions, sizes, delimiter, line_delimiter, print_type, print_width, print_precision);

Save a normal array to a file. type is one of the types from mdarrays.h such as MDARRAYS_DOUBLE_TYPE. sizes is an array of the dimension sizes. type_size should be "sizeof(type)". See the section called “Formatting” for info on the rest of the parameters.

void MDARRAYS_LOAD_NONMDARRAY(char *filename, void *array, uint dimensions, uint type, uint *sizes);

Load a normal array from a file. type is one of the types from mdarrays.h such as MDARRAYS_DOUBLE_TYPE. sizes is an array of the dimension sizes.

For the special case of arrays to type gpointer, a direct memory image of each array element is written sequentially to a binary file. Note that this could case non-portability problems for several reasons.

All other arrays are represented as ASCII text numbers, with enough decimal places to hold their type. For all the file storage functions, the order of array output is the same as the order of array elements in memory for C. A simple example will probably make this clear. Assume a 2D array with dimensions [2][2]. The output will be:

aa[1][1] aa[1][2] aa[2][1] aa[2][2]
    

If you write a function to read these files, assume that the separator between each array element is "any amount of whitespace".

Printing

The printing functions are very similar to the save functions in the previous section, although they are modified to make arrays look better to humans. Also, the GPOINTER type is printed instead of writing a memory image like with the save functions. There are currently no non-mdarray print functions.

void MDARRAYS_PRINT(FILE *file, void *array);

Print an mdarray to a file. Notice that a FILE* is used here, and not a filename as in the save functions. Normally you would just use stdout, but this gives you the option to save to a file if you want.

void MDARRAYS_PRINT_TAGGED(FILE *file, void *array, char **tags);

Print an mdarray to a file with tags (1D only).

Formatting

If the default print or save formatting style is not what you want, use the style macro functions. For style macros, the following definitions apply:

delimiter

the character to be placed between elements of the last index.

line_delimiter

the character to the placed between elements of the next to last index. For example, with a 2d array, setting delimiter to space and line_delimiter to a line feed will create a square grid like you would expect.

type, width, and precision

These parameters mirror the printf type, width, and precision. For example, type=MDARRAYS_PRINTF_TYPE_f, width=4, precision=2 will end up printing with "%4.2f". MDARRAYS_PRINTF_TYPEs are in mdarrays.h.

The style functions are:

void MDARRAYS_SET_PRINT_STYLE_DEFAULTS(void *array)

Set the print style to the default. The style defaults are located in mdarrays/mdarrays.h. When a new array is allocated, it is set to the default style, so this function is only needed if you have changed the style and want to revert back to defaults.

void MDARRAYS_SET_SAVE_STYLE_DEFAULTS(void *array)

Same as MDARRAYS_SET_PRINT_STYLE_DEFAULTS except that it sets the save style instead of the print style.

void MDARRAYS_SET_PRINT_STYLE(void *array, char delimiter, char line_delimiter, uint type, uint width, uint precision)

Set the print styles for array.

void MDARRAYS_SET_PRINT_STYLE(void *array, char delimiter, char line_delimiter, uint type, uint width, uint precision)

Set the save styles for array.

Iterators

void MDARRAYS_FOREACH(void *array, mdarrays_foreach_function foreach_function,
                      void *user_data);
    

where mdarrays_foreach_function is defined:

typedef void (*mdarrays_foreach_function)(void *array, uint *indices, void *user_data);
    

Call the user-defined foreach_function once for each element of array. user_data is passed directly to foreach_function. Note that foreach_function should not assume anything about the order it will receive the elements of the array.

An example using MDARRAYS_FOREACH:

void foreach_function(void *array, uint *indices, void *user_data)
{
  // Assign the value pointed to by user_data for all elements where the 
  // array indices are the same for both dimensions
  if (indices[0] == indices[1]) 
    array[indices[0]][indices[0]] = *((double *)user_data);
}

void main(void)
{
  double **aa;
  double value = 3;
  aa = (double **)MDARRAYS_ALLOCATE(double, 5, 6);
  MDARRAYS_FOREACH(aa, foreach_function, &value);
}
    

Copy

MDARRAYS_COPY(void *from, uint from_offset, void *to, uint to_offset)
    

Copy elements of from array to to array starting at element from_offset in the from array and to_offset in the to array. Stop if end of either array is reached.

Bibliography

[jones] R. W. M. Jones and P. H. J. Kelly. in "Third International Workshop on Automated Debugging". M., Byers, D. Kamkar. Copyright 1997. Linkoping University Electronic Press. Backwards-compatible bounds checking for arrays and pointers in C programs.

[brugge] http://web.inter.nl.net/hcc/Haj.Ten.Brugge[dead link].