- Optimisation and Keywords:
- near, far [non-standard, use common]
These keywords specify how global and static variables are accessed and how functions are called. At the machine code level, we see the compiler being directed to obtain particular balances of code compactness versus the range of effective address supported.
In general terms, one can obtain more compact code if one uses near types of addressing modes that utilise relatively small offsets embedded within each instruction i.e. if we agree to have the compiler generate code constrained to only access locations "nearby" to some default location or location 0. More compact code also implies less instruction fetching per execution hence performance is better.
On the other hand, when large data objects must be accessed or program branching must occur over a large address space, code compactness must be sacrificed. If the underlying architecture is RISC i.e. word-sized instructions are used to load a word-sized pointer, then a number (>1) of RISC instructions are needed to form a full 32-bit value that can be used as an address in a register. This is the type of compromise that you allow with the far modifier keyword.
C6000 example: The compiler has default code generation aims associated with data and function accesses. The -mrN; N=0,1 compiler switch can be used to set the default handling of calls to the run-time library functions. The -mlN; N=0..3 compiler switch can be used to set the default handling of data access, which in effect specifies whether a small or large memory model is in use. In all cases, the programmer's use of the {near, far} keywords overrides default compiler settings. - volatile [standard C/C++]
This modifier keyword can be used to force the compiler to never optimise out variable accesses. A very common use for this keyword is with memory mapped input and output (IO) where a programmer makes use of data transfers associated with read and writes, and also side effects wired into the behaviour of the peripherals, all of which directly depend on program execution precisely following the program statements. Note that volatile variables are allocated to an uninitialised memory section (RAM) and not to a CPU register.
MC68HC11 example: see sample files hc11.h and conio.h for some clean use of volatile pointers to provide putc(), puts() and similar C functions (other related files have links in section Embedded C/C++: Last Words, below). - const [standard C/C++]
In C/C++, this keyword instructs the compiler that the data item so specified cannot be changed i.e. it cannot be written to, so data flow analysis can test and enforce this condition at compile-time.
C6000 example: The compiler will normally arrange for const data items to be allocated in a memory region that the compiler calls .const unless the volatile modifier is also used (overrides storage to RAM) or the item is allocated at function call time (automatic storage implies the use of the stack i.e. RAM). The .const memory region can be allocated to ROM storage in the target system. - cregister [non-standard]
This keyword is typically used to alert the compiler that a variable corresponds to a special register of the same name.
C6000 example: For the TMS320C6711 DSP, these registers (and special variable names) are
AMR, CSR, FADCR, FAUCR, FMCR, ICR, IER, IFR, IRP, ISR, ISTP, NRP.
These variable names are usually predefined via system include files. - interrupt [non-standard, use common]
This keyword is used to specify that a function is an interrupt handler. In effect, this means that the function is executed with no input arguments and returns no result, and in fact typically uses alternative registers to handle return addresses and register preservation.
It has the prototype
void interrupt _my_irq_handler( void );
C6000 example: Here, it means that the function uses a register such as IRP,NRP to hold the return address and accesses to the function use the special interrupt call and return instructions. The interrupt handler functions must have no arguments and return data type void.
C6000 example: For correct handling of interrupts, the compiler must be instructed to generate interrupt safe code. For an interrupt function interrupt_srv(), we can use the pragma directive
#pragma FUNC_INTERRUPT_THRESHOLD( interrupt_srv, 1 )
Triggering of an interrupt causes the contents of the identified interrupt's instruction fetch packet to be inserted into the instruction packet fetch queue. If the service routine is longer than 8 instructions, an instruction fetch stall can occur. - register [standard C]
If there is no compiler optimisation in effect, then this storage modifier allows the programmer to strongly request that a variable be allocated to a CPU register.
C6000 example: When any compiler optimisation is enabled, this keyword is ignored and the compiler uses its own algorithm to optimise CPU register use for all variables and temporary values. See SPRU187 section 7-5 for more information. - restrict [standard C]
The restrict keyword is a type qualifier that may be applied to a pointer to guarantee to the compiler that any object accessed is accessed by only that pointer in the scope of the function. That is, each pointer in the function uniquely accesses non-overlapping memory regions. In fact, the restrict keyword can apply to pointers (C,C++), references (C++), and arrays (C,C++).
When a compiler knows that the pointers used in a function each access unique data, it knows that manipulation via one pointer cannot affect data accessed by any other pointer and is therefore able to guarantee that it knows the location of all current variable/object values via data flow analysis. This allows additional optimisations to be performed.
Example: If a variable is written via a pointer to memory and this variable's contents are soon read, then in a real-time environment or multi-threaded environment where the pointer specification is modified with restrict, the compiler is able to reuse any value still inside the CPU. The use of restrict has guaranteed that the value in memory must be consistent with the last value copied from the CPU register (so a load from memory into a register has been avoided, if the desired variable is still resident in a register).
C6000 details: See SPRU187 section 7-4-5 for more information. - Summary
Code density is an important issue for system performance. The near keyword may allow critical code and data to be located in internal RAM, where access times are the same as the system clock.
Code reuse is also important. The far keyword may allow a processor to access ROM code in a memory region distant from the address location of executable code held in RAM.
The restrict keyword is critical to compiler code optimisation as it allows a compiler to assume (and presumably have) total control of the state of memory contents which then allows full data flow based optimisation.
C6000 details: See SPRU187 section 7-4-5 for more information. - Summary
Code density is an important issue for system performance. The near keyword may allow critical code and data to be located in internal RAM, where access times are the same as the system clock.
Code reuse is also important. The far keyword may allow a processor to access ROM code in a memory region distant from the address location of executable code held in RAM.
The restrict keyword is critical to compiler code optimisation as it allows a compiler to assume (and presumably have) total control of the state of memory contents which then allows full data flow based optimisation.
- near, far [non-standard, use common]
- Portable Programming
The non-standard reserved words may be necessary to support specific architectural features or provide the compiler with additional information for optimisation. The C/C++ programmer should consider the use of macro wrappers around these non-standard words if the code is to be tested on alternative platforms.
For example, cregister is used to alert the C6000 compiler that a variable corresponds to a special CPU register of the same name. As noted earlier, these registers include CSR, ICR, IER. A code example with a macro wrapper for extern cregister follows:
The programmer can use the macro ECreg so that the non-standard cregister keyword is ignored when compiling on a non-DSP host where macro _TestHost_ would be defined. Here is a code fragment showing the use of ECreg:Code: Select all
#if defined( _TestHost_ ) #define ECreg #else #define ECreg extern cregister #endif
Code: Select all
ECreg volatile unsigned int CSR; ECreg volatile unsigned int IER; ECreg volatile unsigned int ICR; int main() { CSR = 0x100; /* P&D cache control cleared, little endian, GIE bit 0 cleared (disable IRQs) */ IER = 1; /* disable all interrupts, leave RESET on bit 0 enabled and clear bits 4-15 to disable IE4-IE15 */ ICR = 0xffff; /* clear any pending interrupts */ /* ... cut ... */ }
- Optimisation and Pragma Directives
C programming ultimately involves code generation (data, instructions) for various memory regions e.g. executable code, constant data, static variables, stack, and heap. If there is no operating system present, as is typical for embedded systems, then some run-time code is also generated by the compiler to initialise "constant data variables". How all of this is organised must depend on characteristics of the underlying architecture e.g. should/can commonly used variables be located in high-speed locked-down sections of a data cache memory system internal to a CPU, should the run-time code for a switch() statement be located in high-speed locked-down sections of a program cache memory system internal to a CPU, and for multi-layer caches which layer should be used. If the system has a Harvard architecture i.e. physically separate memory systems, some of which might use different technologies (SDRAM, flash, etc.) with different access characteristics, how can we direct the compiler to implement these design choices?
The usual answer is to make use of the pragma directive at the compiler level, and to also use specific linker control. In this section, we limit ourselves to some common pragma directives taken from coding for the Texas Instruments very long instruction word digital signal processors TMS320C6211 and TMS320C6711 (SPRU187 section 7-7). The ideas discussed here apply to other VLIW DSP systems and down to simple micro-controllers.
- FUNC_CANNOT_INLINE
The FUNC_CANNOT_INLINE pragma instructs the compiler to never expand the function in-line. For the C language, the syntax is
#pragma FUNC_CANNOT_INLINE( func )
and it means that a call to the function always involves an underlying subroutine/function access. - FUNC_INTERRUPT_THRESHOLD
The FUNC_INTERRUPT_THRESHOLD pragma allows interrupt support to be disabled in software pipelined loops for up to a specified number of machine cycles (or for interrupts to be assumed to be disabled, which allows additional optimisations). For the C language, the syntax is
#pragma FUNC_INTERRUPT_THRESHOLD( func,threshold )
In systems where a code execution stall causes data loss, interrupts must be disabled for correct operation. If interrupts must still be available after every threshold instructions, this directive causes the compiler to generate instructions that provide a short regular window every threshold instructions to allow interrupt response. For a function to be always interruptible, threshold=1. For a function to be assumed never interruptible, threshold=-1. When some other positive threshold is selected, loops are pipelined in a way that schedules clean interrupt handling between highly optimised loop fragments. - FUNC_IS_PURE
The FUNC_IS_PURE pragma specifies to the optimiser that the named function has no side effects i.e. it does not matter if it is never called should the optimiser determine that the result is not used, and it is permissible to remove duplicate calls. For the C language, the syntax is
#pragma FUNC_IS_PURE( func ) - FUNC_NEVER_RETURNS
The FUNC_NEVER_RETURNS pragma specifies to the optimiser that the named function never returns. It is assumed that this means that stack restore code normally generated after the function call is ignored etc. For the C language, the syntax is
#pragma FUNC_NEVER_RETURNS( func )
This pragma can be used when user code fragment calls a boot loader that ultimately replaces the user code. - FUNC_NO_IND_ASG
The FUNC_NO_IND_ASG pragma specifies to the optimiser that the function makes no assignments through pointers and contains no asm statements (asm statements allow in-line assembly code to be embedded in C code). For the C language, the syntax is
#pragma FUNC_NO_IND_ASG( func )
In effect, this allows more aggressive optimisation as the compiler is able to completely track data flow in and out of the function. - UNROLL
The UNROLL pragma specifies to the optimiser how many times a loop may be unrolled. For the C language, the syntax is
#pragma UNROLL( n )
If possible, the compiler unrolls the loop so that there are n copies of the original loop body. It assumes that we know the loop count at compile time. This pragma should be located immediately before the loop to which it corresponds. Two other pragma specifications are also needed to improve optimisation. - MUST_ITERATE
The MUST_ITERATE pragma provides the optimiser with loop iteration properties so that the compiler may choose the optimal loop control strategy. For the C and C++ languages, the syntax is
#pragma MUST_ITERATE( min,max,multiple )
where minimum and maximum iteration counts may be specified. If a loop iteration count, also called the trip count, is n or greater, then the MUST_ITERATE pragma can specify this minimum trip count. This directive is useful for improving loop efficiency when we do not know the loop count at compile-time. For example, if the loop trip count is 5 or greater, the statement is
#pragma MUST_ITERATE( 5 )
If it is known that the trip count is a multiple of 4, this may be specified with the statement
#pragma MUST_ITERATE( 4, ,4 )
where the unknown maximum trip count has been left unspecified. In this example, the compiler would be able to unroll 4 instances of loop body code for more efficient loop control. If the compiler has redundant CPU resources available, it might also be able to simultaneous loop body execution (if iterations are data flow independent). - PROB_ITERATE
The PROB_ITERATE pragma specifies to the optimiser the probable loop iteration properties of minimum and/or maximum trip count. For the C and C++ languages, the syntax is
#pragma PROB_ITERATE( min,max )
and it is correct use to specify either or both arguments. For example, a loop that usually iterates 8 times (but may iterate less or more) is specified via
#pragma PROB_ITERATE( 8,8 ) - DATA_MEM_BANK
The DATA_MEM_BANK pragma aligns a symbol or variable to a particular internal memory bank boundary. For the C language, the syntax is
#pragma DATA_MEM_BANK( symbol,constant )
where constant can have the values 0..3 to select the desired memory bank for variable symbol. In effect, some padding of variables occurs and the memory allocation pointer of the relevant memory system is adjusted.
- FUNC_CANNOT_INLINE
- Most architectures have unique features that must be carefully studied. Assembly language addressing modes are influenced by key words near, far. Register hints (and optimisation hints) are provided by key word register. The modifier volatile allows us to enforce access exactly as programmed (often because we are using side effects).
- For the Texas Instruments TMS320C6000 family, it must be emphasised that before any practical work can be done you must read at least chapters 3 and 7 of SPRU187 TMS320C6000 Optimizing Compiler User's Guide.