go to Trig home page   Guide to GAUSS Programming - a basic introduction


 
Basic Operations

1. Variables

GAUSS variables are of two types: matrices and strings. There are also two ways of grouping variables: structures and string arrays.

Matrices obviously include vectors (row and column) and scalars as sub-types, but these are all treated the same by GAUSS. For example

a = b + c;

is valid whether a, b, and c are scalars, vectors, or matrices, assuming the variables are conformable. However, the results of the operation may differ depending on the variable type.

Matrices may contain numerical data or character data or both. Eight bytes are used to store each element of a matrix. Hence, each cell in a matrix can contain up to eight text characters, or numerical data with a range of about 1.0E±35. If you enter text of more than eight characters into the cells in a matrix, the text will be truncated. Numerical data are stored in scientific notation to around 12 places of precision.

Strings are pieces of text of unlimited length. These are used to give information to the user. If you try to assign a string value to an element of the matrix, all but the first eight characters will be lost.

1.1 Examples of data types

  • 4x3 Numerical matrix
    1 2.2 -3
    9 99 100
    6.29E-6 5 7
    1000 -5.3E+29 4

  • 2x4 Character matrix
    Will Will Harry Steve
    Harry Dick John HarryIII

  • 5x3 Mixed matrix
    Edinburg 40 EH
    Glasgow 25 G
    Heriot-W 43 EH
    Stirling 0 FK
    Strathcl 23 G

  • Strings
    "Hello Mum!"
    "Strings are pieces of text of unlimited length"
    "2.2"
    ""
Note the truncation of text in the character and mixed matrices. The null string "" is a valid piece of text for both strings and matrices.

Because GAUSS treats all matrix data the same, GAUSS sometimes must be told that it is dealing with character data. The $ sign identifies text and is used in a number of places. For example, to display the value of the variable "v1" requires
PRINT v1; PRINT $v1;
or
PRINT v1;
PRINT $v1;

depending on whether v1 is a numerical matrix, a character matrix, or a string. Strings are identified by GAUSS and don't need the $. You can put one in if you like but it makes no difference to printing.

Variables need to have names to reference them. Names can be any length (except in very old versions of GAUSS where they must be eight characters or less). Acceptable names for variables can contain alphanumeric data and the underscore "_", and must not begin with a number . Reserved words may not be used; standard procedure names may be reassigned, but this is not generally a good idea. Variables names are not case-sensitive.
  • Acceptable variable names:
        eric    Eric    eric1   eric_1    _eric1   _e_r_i_c
  • Unacceptable variable names:
        1eric    100    if (reserved word)    delif (GAUSS procedure - legal, but foolish)
Using good variable names can make a big difference to your programming. Having variables called "m" and "t" may be quick to write, but "max_value" and "total_obs" would be more meaningful, and hence easier to interpret when you come back to look at a program later when you've forgotten what it does.

1.2 Grouping variables

String arrays are, as the name suggests, a convenient way of grouping strings. They are similar to a character matrix, but the strings they contain can be of unlimited length. Thus this is a valid string array:

Aberdeen Dundee
Edinburgh Glasgow
Heriot-Watt St. Andrews
Stirling Strathclyde

Note how the data fields are more than eight characters long. One difference between a character matrix and a string array is that GAUSS treats the former as a standard array so you can carry out any matrix operation on it, whether it makes sense or not. In contrast, a lot of operations will not be allowed on a string array because GAUSS 'understands' the string data type.

String arrays are therefore more flexible in storing characters. However, they have some disadvantages. First, they only store strings, and therefore you cannot mix charcter and numeric data. Second, because the length of the element is variable, GAUSS will handle them less efficiently. If all your character strings are eight characters or less, then keeping them in a character matrix may be marginally quicker. Third, string arrays take up more memory. For example, a 32768-element character matrix takes roughly 270Kb, irrespective of the number of characters. A string matrix with an average string length of 4 characters takes 400Kb; with an average length of eight characters that rises to 560Kb, twice as much as the equivalent character matrix.

Structures allow the grouping of variables of different types. They were introduced in version 4.0. Suppose you are running repeated regressions and for each regression you want to store the following information for each array:

Scalars: TSS, ESS, RSS, σ, N
Vectors: Coefficients, standard errors
String array List of variable names

By placing these into a structure, they could be passed around between procedures, simplifying the program. This could also mean lower maintenance, by minimising changes to procedure calls if the structure form changes; see Writing for Posterity.

Because these are grouping concepts rather than new data types, we will not deal with these any further until the latter sections of the guide when we discuss better programming methods. For details on declaring string arrays and structures, see the rather opaque descriptions in the GAUSS manuals. Note that here is no indication (as at time of writing) how to create arrays of structures.

2. Creating matrices

New matrices can be defined at any point (except inside procedures). The easiest way is to assign a value to one. There are two ways to do this - by assigning a constant value or by assigning the result of some operation.

2.1 Creating a matrix using constants: LET

The keyword LET creates matrices. The format for creating a matrix called varName is

LET varName = constant-list;
LET varName[r,c] = constant-list;

In the first case, the type of matrix created depends on how the constants were specified. A list of constants separated by space will create a column vector. If, however, the list of constants is enclosed in braces {}, then a row vector will be produced. When braces are used, inserting commas in the list of constants instructs GAUSS to form a matrix, breaking the rows at the commas. If curly braces are not used, then adding commas has no effect. In the first case, the actual word 'LET' is optional.

If the second form is used, then an r by c matrix will be created; the constants will be allocated to the matrix on a row-by-row basis. If only one constant is entered, then the whole matrix will be filled with that number.

Note the square brackets. This is the standard way to tell GAUSS either the dimensions of a matrix or the coordinates of a block, depending on context. The first number refers to the row, the second the column. Curly braces generally are used within GAUSS to group variables together.

2.2 Examples of LET

Command Shape of x
LET x = 1 2 3 4 5 6; Column vector 6x1
LET x = 1,2,3, 4,5, 6; Column vector 6x1
LET x = 1 2, 3 4, 5 6; Column vector 6x1
LET x = {1 2 3 4 5 6}; Row vector 1x6
LET x = {1,2,3, 4,5, 6}; Column vector 6x1
LET x = {1 2, 3 4, 5 6}; Matrix 3x2
LET x[3,2] = 1 2 3 4 5 6; Matrix 3x2
LET x[3,2] = 1, 2, 3, 4, 5, 6; Matrix 3x2
LET x[3, 2] = 5; Matrix 3x2

If we have two variables "a" and "b" then the command

LET x = a*b;

is illegal as "a*b" is a value and not a constant. In practice, GAUSS will interpret "a*b" as a string constant and will create a string variable containing the letters and figures "a*b".

2.3 Creating a matrix using values

The results of any operation can be placed into a matrix without an LET explicit declaration. The result of the operation

m1= m2 + m3;

will be that the value "m2+m3" is contained in a variable called "m1". If the variable m1 did not exist before this statement, it will have been created.

The size and type of a variable depends entirely on the last thing done with it. Suppose m1 existed prior to the last operation. If m2 and m3 are both scalars, then m1 will now be a scalar - regardless of whether it was previously a matrix, vector, scalar, or string. Variables have no fixed size or type in GAUSS - they can be changed at will simply by assigning a different value to them. It is up to the programmer to make sure he has the correct variable for any operation, as GAUSS will rarely check.

Assigning a value is done by writing down the equation. Any correct (for GAUSS's syntax) mathematical expression is acceptable, as are strings or the results of procedures.

2.4 Examples of assigning values to a variable

The routines ZEROS and ONES create matrices of 0s and 1s. The transpose operator ' can be used as in any normal equation. Examining the impact of various assignment statements on matrices m1, m2 and m3 we get

Command m1 m2 m3
m1 = ZEROS(2,3); 2x3 undefined undefined
m2 = ONES(1, 3); 2x3 1x3 undefined
m3 = m1*m2'; 2x3 1x3 2x1
m1 = "Hello Mum!"; String 1x3 2x1
LET m2 = 5 2; String 2x1 2x1
m3 = m3'*m2; String 2x1 1x1

Note that LET statements can appear anywhere constants are used. The final size of m3 will be governed by the result of the last operation; in this case, it becomes a scalar.


Why use constant assignments rather than just creating matrices as a result of mathematical or other operations? The answer is that sometimes it is awkward to create matrices of appropriate shapes. It also allows for increased security, as constant assignment is finicky about what values are appropriate, and will trap more errors.

However, you cannot rely on this. The above example of LET x = a*b giving a string variable rather than a numeric variable is a simple of how GAUSS will do the correct thing, by its definition, and happily produce a meaningless result.

In practice the main place you will use constant assignment will be at the beginning of programs where you set initial values and environment variables (like the name of an output file, or font to use for graphing). During the program you will be using variable assignment most of the time and you can ignore the strict rules on constants assignment. However, this is one of those areas where unnoticed errors creep in, and you need to be aware that GAUSS assigns values in different ways depending upon the context.


3. Referencing matrices

3.1 Direct references

Referencing strings is easy. They are one unit, indivisible. Matrices, on the other hand, are composed of the individual cells, and access to these might be required. GAUSS provides ways of accessing cells, columns, rows and blocks of the matrix as well as referring to the whole thing.

The general format is

mat[r1:r2,c1:c2]

where mat is the matrix and r1, r2, c1, and c2 may be constants, values, or other variables. This will reference a block from row r1 to row r2, and from column c1 to column c2 of the matrix mat. A value could be assigned to this block; or this block could be extracted for output or transfer to some other location. For example,

mat = {1 2 3, 4 5 6, 7 8 9, 10 11 12};
PRINT mat[2:3,1:2];

would print the columns 1 to 2 of rows 2 to 3 of the matrix mat:

4 5
7 8

To reference only one row or one column, only one coordinate is needed in that dimension:

mat[r1,c1:c2] or mat[r1:r2,c1]

For example, to reference the cell in the third row and fourth column of the matrix mat, these terms are all equivalent:

mat[3:3,4:4] mat[3,4:4] mat[3:3,4] mat[3,4]

Entering "." or 0 as a co-ordinate instructs GAUSS to take the whole row or column of the matrix. For example

mat[r1:r2,.]

means "rows r1 to r2 and all columns of matrix mat", while

mat[0, c1:c2]

references for columns c1 to c2. A whole matrix could then be referred to identically as

mat or mat[.,.]

This particular feature of GAUSS causes a number of unexpected problems, particularly when using loops to access columns or rows in sequence. If your counter drops to zero (or some unspecified values) then you will find the program operating on all rows or columns instead of just one.

For vectors only one co-ordinate is needed. For a column vector, say, these are all identical

mat[r1:r2,.] mat[r1:r2,0] mat[r1:r2,1] mat[r1:r2]

For scalars there is obviously no need for co-ordinates. However, because a scalar is a subclass of matrix,

mat[1,1] mat[.,.] mat[1] mat[1,0]

or a number of other variations are acceptable.

This similarity in accessing matrices of zero, one, or two dimensions allows you to program loops to access matrices without necessarily knowing the dimensionality of the matrix in advance.

A last way to identify a set of rows or columns is to list them sequentially. For example, to refer to columns 1, 3, and 22 and rows 2 to 4 inclusive of the matrix mat we could use

mat[2:4,1 3 22]

Note that that there are no separating commas in the list of columns; GAUSS treats everything up to the comma as a row reference, everything afterwards as a column reference. If it finds two or more commas within square brackets, it treats this as an error.

These different methods can be combined:

mat[1 3:5 9, .]

will select every column on rows 1, 3 to 5, and 9. The order is also important:

mat[1 2 3, .]
mat[3 2 1, .]

will give two matrices with the row order reversed in the second one.

3.2 Indirect references


Elements of matrices can also be referred to indirectly. Instead of explicitly using a constant to indicate a row or column number, a variable can also be used. For example,

PRINT mat[1:5, .]; and endRow = 5;
PRINT mat[1:endRow, .];

are equivalent. This is a key feature in all but the most simple programs, as it avoids having to write out references explicitly. For example, suppose the program is to print out ten lines of a matrix. One solution would be to write a command to print each line:

PRINT mat[1,.];
PRINT mat[2,.];
...

This is clearly a tedious process. But one could write a loop to change the value of a variable i from 1 to 10. Then, only one PRINT statement is need in the loop:

PRINT mat[i,.];

Even more usefully, this feature will work even if you are unsure how many lines there are in the matrix. You can set the loop to go as many times round as there are lines in the matrix. The PRINT statement does not have to be changed at all.

Similarly, instead of entering explicilty a list of column or row numbers to be selected, if you enter a vector then GAUSS will use these as the indexes. For example, if rowv is a vector containing (1, 2, 3) then

mat[1 2 3, .]; and mat[rowv,.];

are equivalent.

3.3 Nested references

This section is in here to complete coverage of referencing matrices. It is more advanced, and can be skipped at this point.

Indirect references could be nested. If rowv and colv are a vectors of numbers, then

mat[rowv[1]:rowv[2], .]

is legal. So is

mat[rowv[r1,c1]:rowv[r2,c2], colv[rowv[r3, c3], rowv[r4,c4]]]

if values have been assigned to r1, c1... and the matrices row and col have the relevant dimensions. This process can be carried on ad infinitum.

However, one problem with this flexibility in referencing is that GAUSS will always try to find a solution. For example, to access the first row of matrix mat you could use the vector rowv (above), one could use

mat[rowv[1],.]

However, if you omit the index

mat[rowv,.]

then GAUSS will interpret this row vector as a list of rows to be selected, as in the previous section. It will not report an error, as this construct is perfectly acceptable

4. Managing data - SHOW, PRINT, FORMAT, NEW, CLEAR, DELETE

These commands are introduced at this point as they are the basic ones for managing data. DELETE may only be used at the command line, but all the others can be included in programs.

4.1 SHOW

SHOW displays the name, size and memory location of all global variables and procedures in memory at any moment (see Section 6 for an explanation of global variables). The format is

SHOW varName; or SHOW/m varName ;

where varName is the variable of interest. The "wild card" symbol "*" can be used, so that

SHOW er* ;

will find all references beginning with "er". The /m parameter means that only matrices are displayed.

4.2 PRINT and FORMAT

PRINT displays the contents of matrices and strings. The format is

PRINT var1 var2 var3... varx ;

which prints the list of variables. How it prints depends on the data. If the data fits on one line (all row vectors, scalars, or strings) then PRINT will display one after the other on the same line. If, however, one of the variables is a matrix or column vector, then the variable immediately following the matrix will be printed on a new line.

PRINT wraps round when it reaches the end of the line. Each PRINT command will start off on a new line. To display without going on to a new line, the PRINT statement must be ended with two semi-colons; this stops PRINT adding a carriage return to the variable list. For example, consider

PRINT "Hello";
PRINT "Mum";
and PRINT "Hello";;
PRINT "Mum";
and PRINT "Hello" "Mum";

These display, respectively,

Hello
Mum
  HelloMum   HelloMum

If strings or string constants (as above) are used, PRINT will recognise that this is character data. If, however, PRINT is given a matrix name, it must be informed if this matrix is to be printed as character data. This is done by prefixing the variable name with the dollar sign $. Hence

a = 1;
b = 3;
c = "Some string data";
d = "char." | "matrix"; PRINT a b c $d;

prints everything correctly. Matrices composed entirely of character data are shown in the same way; however, mixed matrices needs special commands, PRINTFM and PRINTFMT, of which more later.

Warning

Once GAUSS comes across a $ sign indicating character data, it prints all the rest of that line as text. Thus

PRINT a $c b;

would lead to 'b' being treated as if it were text. To get round this, 'b' must be printed in a separate statement, perhaps using the double-colon:

PRINT a $c;;
PRINT b;

PRINT style is controlled by the FORMAT commands, which sets the way matrices (but not strings) are printed. There are options to print numbers and character data with varying field widths, decimal expansion, justification, spacing and punctuation. These are covered in the manual and are all similar in form to:

FORMAT /RD 6, 0;

where, in this case, we have numbers right-justified (/RD), separated by spaces (/RDC would do commas), with 6 spaces left for writing the number and 0 decimal places. If the number is too large to fit into the space, then the field will be expanded but for that number only - not the whole matrix. Strings are given as much space as they need, but no spaces are inserted between them (see the "HelloMum" example, above).

The print styles set by FORMAT operate from the time they are set until the next FORMAT command is received.


4.3 NEW, CLEAR, and DELETE

These three all clean up memory. They do not affect files on disk. NEW clears all references from memory. It can be called from inside a program, but obviously this is rarely a smart move. The exception is at the start of a program. A call to NEW will remove any junk left over from previous work, leaving all memory free for the new program. NEW has no parameters and is called by

NEW;

Calling NEW at the start of a program ensures that the workspace is cleared of unwanted variables, and is good practice. Calling NEW at any other point is usually disastrous and not so highly recommended.

CLEAR sets particular variables to zero, and it can also be called by a program. It is useful for tidying up data and initialising variables:

CLEAR var1 var2 ... varN ;

Because it sets the variable to the scalar zero, then CLEAR is identically equal to a direct assignment:

CLEAR x;    is equivalent to x = 0;

DELETE clears variables from memory, and so is a better option than CLEAR for tidying up unwanted variables. However, it cannot be called from inside a program. The delete command is like SHOW:

DELETE varName;
DELETE/n varName;

where varName can include the wild card character. The /n option stops GAUSS double-checking the deletion is wanted. The special word "ALL" can be used instead of varName; this deletes all references, and so

DELETE/N ALL;

is equivalent to NEW.


5. Using procedures

The library functions in GAUSS work like library routines in other packages - a procedure is called with some parameters, something happens, and a result may be returned. The parameters may be constants or variables; any returned values must be placed in variables. There may be any number of input and output parameters, including none. The general format is

{outVar1, ...outVarN} = ProcName (inVar1, ... inVarN);

The inVar parameters are giving information to the procedure; the outVar variables are collecting information from the procedure. The input parameters will be unaffected by the action of the procedure (unless, of course, they also feature in the output list). The outVar parameters will be affected, and so obviously constants can not be used:

{outVar1, "eric"} = ThisProc (inVar1, inVar2);

is incorrect.

Note that we have curly brackets {} to group variables together for the purposes of collecting results, but that we have round brackets () to delineate the input parameters. The former is GAUSS's usual way of grouping things together, the latter is a near-universal programming syntax. They're mixed in together just to keep you on your toes.

If there is one or no parameter, then the form can be simplified:

{outVar1, ... outVarx} = ProcName (inVar); one input parameter
{outVar1, ... outVarx} = ProcName; no input parameter
ProcName (inVar1, ... inVarx); no returned result
outVar = ProcName (inVar1, ...inVarx); one result returned

For example, the procedure DELIF requires two input parameters (a matrix and a column vector), and returns one output, a matrix:

outMat = DELIF (inMat, colVec);

The procedure EIGCG requires two input parameters and two output parameters

{eigsReal, eigsImag} = EIGCG(matReal, matImag);

The procedure SORT needs four input parameters but returns no result:

SORT (inFile, outFile, keyName, keyType);

If the program is not concerned with the results from procedure then the function CALL tells GAUSS to throw away any returns. This can save time and memory in some cases. For example, the quickest way to find the determinant of a large matrix is through a Cholesky decomposition. Running the procedure CHOL sets a global variable which can be read by the procedure DETL to give the matrix's determinant. However, the actual result of the decomposition is not wanted, only a side effect. So, to find the determinant of "mat" most quickly use

CALL CHOL(mat);
determ = DETL;

As input and returned parameters are both lists, you can pass the whole list of returned parameters to a new function, along with any other parameters that are necessary. This means that you do not need to have any intermediate variables to store the results from one procedure before passing them to another, and it will make your code shorter. However, it will not necessarily make it more readable, and you can run into maintenance problems - if you change the list of parameters for one procedure you need to change it for the other as well.

Warning

For all procedures, it is the programmer's responsibility to ensure that the right sort of data is used. If a procedure is expecting a scalar as a parameter and you pass it a row vector, for example, this will not be flagged as an error when GAUSS checks the program syntax. It may or may not cause the procedure to crash but this will not be apparent until the program is running. All GAUSS will check is that the correct number of parameters is being passed back and forth.

[ previous page ] [ next page ]