|
As part of optimization
package Dalsoft provides port of GNU compilers to the Digital Signal
Processor or CPU of your choice (referred as target).
Please click here
for some non technical details and/or contact us
if you like to further discuss the possibilities of retargeting GNU
compilers to a processor of your choice.
Introduction
The GNU Compiler Collection gcc offers several
versions of the compiler front ends (C, C++, Objective C, Fortran,
Java) and means for retargeting compiler to a new architecture. The
advantages of using gcc are
too numerous to be mentioned here,
however the
high execution speed of the code it generates is not one of them.
Numerous studies ( for example, this
one
)
have shown that code, generated by gcc,
does not
match the
speed achieved by other compilers for the same DSP/CPU target. We
believe that this is due to the nature of the technology that is
available for retargeting (e.g.
machine
description, target description macros etc.) which we found inherently
restrictive and incapable of
allowing implementation with a tight interface between compiler and
hardware architecture which is necessary to achieve high
utilization of the available resources.
To address this problem we at Dalsoft took another approach: we are
using superb front ends, supported by gcc,
fully utilizing the vast
number of
machine-independent (middle-end) optimizations, provided by gcc,
but the final code optimization leave to our
optimizer.
The study
we performed shows such an approach produces code on par or better than
one generated by compilers specially crafted for a particular target.
See this
for the results of another study comparing the code generated by the
gcc compiler, icc compiler
(Intel's compiler specifically designed for
x86) and dco optimized code that was generated by the gcc compiler.
General Overview
From the users point of
view, our port looks
exactly as most other ports of gcc.
We offer
legendary compiler
driver gcc
and enable all the options available.
Compiler is activated and, if no-compilation errors has been detected,
produces
assembly code for the program compiled. The assembly code, as produced
by the compiler, may
be processed by gcc
to generated object file. However, in order to
take full advantage of the options and features provided by the target
processor, the next stage, that may be performed, is to process
the assembly code, generated by the compiler, by our
optimizer (dco). Note that use of dco is optional and should be done only if extra optimizations are desirable.
The design and implementation of the gcc
– dco package is done in such a
way as to:
- take
full advantage of the
functionality provided by gcc
we are using front ends,
supported by gcc,
and provide machine description and target specific macros that enable
full and most efficient utilization of the functionality provided
by gcc (e.g vast number of
machine-independent (middle-end)
optimizations, register
allocation etc.).
- provide
tight integration between gcc and dco
compiler code generation is
done in
such a way as to enable dco to
perform its work in
most efficient and extensive manner.
Retargets
At this time we successfully
retargeted gcc package for
StarCore DSP from Freescale.
Retargeting for StarCore
Currently we only offer the
StarCore version of the C compiler - click here
to get a copy.
StarCore
is a powerful and complicated Digital Signal Processor. It provides
great number of options and superb functionality but also imposes
severe restrictions on the code sequence to be valid and/or optimal. We
feel that generating correct and optimal StarCore code utilizing
only gcc is as difficult as
recreating the smile of Mona Lisa using spray paint.
But integration with our StarCore
optimizer (sco)
solved the problem.
We evaluated
our work using POWERSTONE benchmark suite.
Assembly code was generated by gcc
run with maximum
optimizations (-O3). Optimizer was used with the default set of options
- no special optimizations (e.g. vectorization,
loop
unrolling) were attempted. The results were verified and
performance was compared to one achieved by the Metrowerk compiler
(Motorola's compiler specifically designed for StarCore) run with
maximum
optimizations (-O3).
The following table lists times (first number) and the total object
size (second number - e.g. 13019/19424 means that code run for 13019
clocks and total size of the object file, text+data, was 19424 bytes)
observed during our study. Needless to say that using sco
with
special optimization options may produce even better results.
|
Powerstone benchmarks |
| |
gcc+sco |
Metrowerk |
| compress |
11704
/ 18962 |
13019
/ 19424 |
| engine |
78876
/ 1877 |
86671
/ 3184 |
| eval2 |
437
/ 2357 |
916
/ 2960 |
| jpeg |
283167
/ 13980 |
334757
/ 13846 |
| ucbqsort |
339717
/ 2540 |
363775
/ 3096 |
| v42bis |
22005
/ 41469 |
88295
/ 36858 |
|
| |
|
|