go to Trig home page   Guide to GAUSS Programming - a basic introduction


 
Code refinements

Up to now the guide has concentrated on technical aspects. Despite leaving a large part of the GAUSS language uncovered, the guide now moves on to improve your programming skills rather than expanding your technical knowledge. The hope is that a deeper rather than a broader understanding of programming techniques makes it easier to solve problems, read manuals, and write programs.

Should programs be efficient?

This section concentrates on how to improve the performance of programs, rather than how to write them, and is much more case dependent. When to use procedures and parameters depends on the circumstances. The time and memory constraints on programs will rarely be apparent, and procedures can be used with little regard for their physical implementation. Variable ordering and accessing is unlikely to slow down program speed dramatically, and if it does the remedy, if one exists, is often straightforward.

However, some consideration should be given to programs using very large variables or lots of loops. A simple way of testing the efficiency of a program is to add timings to runs. This gives a simple benchmark as to the effect of different solutions. As a general rule, a faster program will also use resources more efficiently (although this is not necessarily the case), and the first draft of complex programs can almost always be improved. Whether the improvement is worth the time spent re-coding is a matter of judgment. A program can always be tweaked to improve efficiency, but the law of diminishing returns can take effect rapidly.


1 GAUSS vs user-defined procedures


GAUSS has a large number of standard functions. These could often be replaced by code written by the user. However, the GAUSS functions are almost always faster than an option written by the user - usually a great deal faster.

The main reason for this is that the maths co-processor has vector processing instructions built into it which the GAUSS standard functions were designed to use fully. A user defined procedure will always have to go through one level of abstraction (writing GAUSS code to be translated into machine instructions). This means that a user program is unlikely to be more efficient then the GAUSS function, and is probably less.

The general rule is that if a GAUSS command exists to solve a problem, then using that command will be the quickest and most efficient solution.

There are two exceptions to this. The first is due to the fact that there is a core of GAUSS functions upon which other standard functions are based. These "secondary" functions are to be found in the \GAUSS\SRC directory, and are in files with the extension ".SRC". Most of these are procedures much as any user may write and they can be edited as such, although this is not recommended. However, a user may copy these programs and tailor them to the user's own needs; the fact that these procedures are written by the GAUSS programmers does not necessarily make them the best available. In particular, many of these routines are wasteful of memory (I have already rewritten some routines to operate more efficiently). Other reasons to alter these standard procedures might be to remove excess code which the user knows is not needed, or to operate better on a particular form of data, for example.

While these standard routines will generally serve their purpose well, there may be situations where some modification is beneficial. Although the routines are supplied by the manufacturer, they are not unalterable; however, the cases where the standard routines are inadequate or unacceptably inefficient are rare.

The second exception is where the "basic" functions are themselves not the most appropriate to the task. For example, the function SUBMAT, which extracts blocks from a matrix, can often be replaced by a simple concatenation command, which removes an extra procedure call.

Alternatively, consider calculating xx' and adding it to a matrix where x is a sparse Nx1 vector of ones and zeroes and total is the NxN totals matrix. These two solutions will produce identical results:

total = total + MOMENT(x', 0);
  or
colNums = SEQA (1,1,N); colNums = SELIF (colNums, x);
i = ROWS(colNums);
DO WHILE i > 0;
total[.,colNums[i]] = totals[.,colNums[i]] + x;
i = i - 1;
ENDO;

Generally, "x'*x" is quicker than calculating the multiplication explicitly, and MOMENT(x', 0) is even quicker - often twice as fast. However, if N in the above example is large, our version is quicker - especially if the vector of column numbers does not have to be created). The above code is used in a number of our programs with a more efficient replacement for SELIF; when N is around 80 and the number of non-zero dummies is around 11, the time saving is substantial and increases with N. The dataset for which I devised this routine had around four million observations, with up to 1000 variables. This little bit of code took a couple of hours out of a run time of eight to ten hours.

This is a special example; the combination of a sparse matrix and the dummy variables makes this solution a significant improvement on the standard function. However, if the data is in a known format, then a non-standard solution might be worth considering.


2 Procedure calls

It was remarked in Procedures that there always an overhead involved in setting up procedures. The importance of this depends on how often the procedure is called and what variables are passed to it. It was mentioned that copies are taken of all the variables passed into the procedure as parameters. When the procedure is completed, these copies are deleted from memory, but while the procedure is running they take up memory space. There will also be a time delay as the procedure structure is set up, parameters are copied, and local variables are created. Therefore using procedures involves more memory and more time.

The first of these is not often a problem. GAUSS is very quick at creating the necessary structure for the procedure to run, and even with moderately large variables the time delay is insignificant. However, in some cases, the security of passing information through parameters may be outweighed by the time delay in passing very large parameters. This is where the global variable makes its comeback. Because it is visible inside the procedure, it can be accessed directly with no need to take parameter copies. A preferable (but often not applicable in GAUSS) alternative is to pass a marker between procedures, which indicates where the data may be found but does not contain the information itself.

Where the variables are only moderately large, memory space is more often a problem than the time delay. It usually arises from highly nested procedures. While a large variable itself may not cause any memory problems, once it has been passed as a parameter to procedure A, which passes it as a parameter to procedure B, which passes it as a parameter to procedure C... it can rapidly take up a lot of space.

For example, we do much work on large cross-product matrices - up to 15Mb. These are created using information in a dataset, and the data held in the cross-product matrices are abstracted and analysed. When the cross-product matrices are being created, the updating procedure may be called 240,000 times, and around 1.6 million vectors are added into the matrix. Asking GAUSS to copy a 15Mb variable a quarter of a million times seems less than efficient, and so in this case the totals matrix is made a global variable. The variables being passed to the updating procedure then total around 8Kb, but making these global has almost no effect on the running time - it might save roughly one minute per hour. Therefore these variables are kept as parameters to keep the program manageable.

In another program, data is extracted from the cross-product matrices and analysed. The analytical matrices are much smaller than the cross-products. However, the cross-products are not held in memory; instead, the name of the file containing the cross-product is passed around the program. When data is wanted, one procedure takes the filename as a parameter, reads in the cross-product matrix, extracts the necessary bits and pieces, deletes the cross-product from memory, and returns from the procedure, so that the full matrix is only in memory while it is actually being accessed. This program has no global variables at all which makes maintaining its 6,000-odd lines of code much easier.

3 Declaring and using variables

When and how many variables are declared will affect the efficiency of programs. As they are declared or created, we can imagine variables being added to a stack in the main program, with the most recently declared ones on top. Whenever a variable changes size, then the stack must be adjusted. If the variable is on top of the stack, no problem; if however, the variable is at the bottom of the stack, then changing the size of a variable may involve a lot of shuffling around.

The practical upshot of this is twofold. First, variables should not have their sizes changed unnecessarily; secondly, variables which do change their sizes should be declared after more stable variables. For example, consider the following procedure definition:

PROC (1) = Concat (vec, numTimes);

LOCAL outMat;
LOCAL i;

outMat = vec;
i = 2;
DO WHILE i <= numTimes;
outMat = outMat ~ vec;
i = i + 1;
ENDO;

RETP (outMat);

ENDP;

When the procedure is called, outMat will be placed on the stack and i on top of it. The size of outMat will keep changing as the concatenation proceeds, and the location of i in memory will shift accordingly. Declaring outMat second would have made a more efficient program, albeit marginally so in this case.

The same will be true of parameters and global variables.

The second issue is related to this. Unnecessary variable declarations may slow down adjustments to the stack, and they will increase the pressure on memory. Declaring variables within the smallest scope - using local variables in preference to global variables - will avoid some of this. Using local variables also ensures a measure of tidying up after the procedure has completed.


4 Workspace use

As has been mentioned, GAUSS augments memory with disk space used as virtual memory. This makes program storage space effectively unlimited. However, disk access is very slow compared to memory access. GAUSS manages this by keeping all the currently accessed variables in memory and dumping any variables not currently in use to disk if there is insufficient memory.

If a program spends a lot of time using the workspace on disk, then two questions should be asked

  • is the program using too many variables?
  • is the program accessing variables inefficiently?

The first question has been dealt with in sections 2 and 3. In some cases there will be no alternative to using disk space as auxiliary memory, in which case the order in which variables are accessed should be considered.

Suppose a program has two matrices matA and matB. The first column in each matrix is to be replaced by the first column of the other The two column are to be stored. Assume that there is enough memory to store the two columns and one (but only one) of the matrices. Consider the following pieces of code:

col1A = matA[., 1];
col1B = matB[., 1];
matA[.,1] = col1B;
matB[., 1] = col1A;
col1A = matA[., 1];
col1B = matB[., 1];
matB[., 1] = col1A;
matA[., 1] = col1B;

If there is insufficient memory space to store both matrices then the first piece of code will lead to (i) matA is loaded (ii) matA is unloaded and mat B is loaded (iii) matB is unloaded and matA is loaded (iv) matA is unloaded and matB is loaded. The code finishes with matB loaded. The second piece of code leads to (i) matA is loaded (ii) matA is unloaded and mat B is loaded (iii) matB is unloaded and matA is loaded. The code finishes with matA loaded. Assuming the program is unconcerned about whether matA or matB is currently loaded, then by doing as much work as possible on each matrix before moving to another the second option avoids one swap to disk.

With much lower memory prices and the resulting increases in capacity, this is less of an issue then it was five years ago. It is still most relevant on shared machines using a common memory core (eg on a Unix setup). Even on PCs it is not difficult to run out of memory in several layers of procedures. Moreover, Gauss is taking time to maniputlate these large matrices. If you can avoid creating them you can improve the efficiency of your programs. The above example will not just lessen the workspace demands, but it will also work faster.

5 Logical improvements

It was mentioned that GAUSS is a strict language when it comes to multiple logical operations. In other words, when it comes across a logical expression, it will solve all the components, regardless of whether it has enough information to come to a solution or not. For example, the expression

(mat1 > mat2) AND (mat2 > mat3) AND (mat3 > mat4)

is "false" if mat1<mat2; there is no need to calculate the second and third part of the expression. However, GAUSS will do so anyway. Often this makes little difference - if the above had all been scalars with an equal probability of any condition being true then this would have been an efficient solution to the comparison. However, suppose the operation had been

a = (DET(mat1)>DET(mat2)) AND (DET(mat2)>DET(mat3)) AND (DET(mat3)>DET(mat4));

DET is a slow operation and if the matrices are large this statement as it stands is horribly inefficient. A much more efficient solution is

a = 0;
IF DET(mat1) > DET(mat2);
IF DET(mat2) > DET(mat3);
IF DET(mat3) > DET(mat4);
a = 1;
ENDIF;
ENDIF;
ENDIF;

This seems longer but it is clearly a much more efficient operation. Its efficiency increases as the size of the matrices grows. The code could be still be greatly improved by using temporary variables to avoid the repeated calculation of the determinants. In addition, if prior information indicated that one of the statements had a higher chance of being false then the others, then testing this statement first decreases the expected time to complete the sequence.

The same principle obviously applies to other logical operators, and to the IF statement in a more general way. Consider

IF (RANK(x)==ROWS(x)) AND (RANK(y)==ROWS(y));
DoThings;
ELSE;
PRINT "Matrices not of full rank";
ENDIF;

IF x and y are large (and there is a more than negligible possibility of either being of less than full rank) then this is inefficient. A better solution is

IF RANK(x)==ROWS(x);
IF RANK(y)==ROWS(y);
DoThings;
ELSE;
PRINT "Matrix y not of full rank";
ENDIF;
ELSE;
PRINT "Matrix x not of full rank";
ENDIF;

which has the added advantage that a more helpful error message can be printed.

This issue is also related to the workspace issue discussed in section 4. If x and y are too large to fit into memory at the same time, then the one-line solution will involve x loaded, x unloaded, y unloaded whether x is of full rank or not. By contrast, the two-step test means that x will only be unloaded and y loaded if the second test is necessary.

[ previous page ] [ next page ]