Getting Familiar with GCC Parameters
Pages: 1, 2, 3, 4
Options Controlling Compilation Stages
For learning purposes, sometimes you want to know how your source code is transformed into an executable. Fortunately, gcc provides you options to stop at any processing stage. Recall that gcc has several stages to be accomplished--for example, linking. The options are:
-cstops at assembly phase but skips linking. The result is an object code.-Estops after preprocessing stage. All preprocessing directives are expanded so you only see plain code.-Sstops after compilation. It leaves you with assembler code
-c is mostly used when you have multiple source files and combine them to create the final executable. So, instead of:
$ gcc -o final-binary test1.c test2.c
it would be better to split them as:
$ gcc -c -o test1.o test1.c
$ gcc -c -o test2.o test2.c
and then:
$ gcc -o final-binary ./test1.o ./test1.o
You probably notice the same sequence if you build the program using a Makefile. The advantage of using -c is clear: you only need to recompile the changed source files. The only phase that has to be redone is linking all the object files, and that greatly saves time, especially in large projects. An obvious example of this is the Linux kernel.
-E is useful if you want to see how your code really looks after macros, definitions, and such are expanded. Take Listing 3 as an example.
#include<stdio.h>
#define A 2
#define B 4
#define calculate(a,b) a*a + b*b
void plain_dummy()
{
printf("Just a dummy\n");
}
static inline justtest()
{
printf("Hi!\n");
}
int main(int argc, char *argv[])
{
#ifdef TEST
justtest();
#endif
printf("%d\n", calculate(A,B));
return 0;
}
Listing 3. Code contains #define and #ifdef
We compile it like this:
$ gcc -E -o listing2.e listing2.c
Notice that we don't pass any -D parameters, which means TEST is undefined. So what do we have in the preprocessed file?
void plain_dummy()
{
printf("Just a dummy\n");
}
static inline justtest()
{
printf("Hi!\n");
}
int main(int argc, char *argv[])
{
printf("%d\n", 2*2 + 4*4);
return 0;
}
Where is the call to justtest() inside main()? Nowhere. TEST is undefined--that's why the code is eliminated. You can also see that the calculate() macro is already expanded into multiplication and addition of constant numbers. In final executable form, this number will be replaced with the operation result. As you see, -E is quite handy to double-check the correctness of directives.
Notice that plain_dummy() is still there even though it is never called. No surprise since no compilation happens here, therefore dead code elimination doesn't happen at this stage. stdio.h is also expanded but it isn't shown in the above listing.
I found an interesting application of -E as an HTML authoring tool [11]. In short, it helps you to adopt common programming practices such as code modularization and macros into the HTML world--something that cannot be done using plain HTML coding.
-S gives you assembly code, much like what you see with objdump -d/-D. However, with -S, you still see directives and symbol names, which makes it easier to study the code. For example, a call like printf("%d\n", 20) could be transformed into:
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "%d\n"
...
movl $20, 4(%esp)
movl $.LC0, (%esp)
call printf
You can see that format string %d is placed in a read-only data section (.rodata). Also, you can confirm that arguments are pushed to the stack from right to left, with the format string at the top of the stack.
Conclusion
gcc gives us many useful options to make our code into whatever we like. By understanding what these options really do, we can make the program faster and slimmer. However, do not depend entirely on them: you should pay more attention to writing efficient and well-structured code.
Acknowledgments
I would like to thank the communities in the OFTC chat room (#kernelnewbies and #gcc) and #osdev (Freenode) for their valuable ideas.
References
- Wikipedia's article about Preprocessing
- Wikipedia's article about Compilation
- Wikipedia's article about Assembler
- Wikipedia's article about Linker
- An example of code reordering using gcc
- Frame pointer omission (FPO) optimization and consequences when debugging, Part 1 and Part 2
- Explanation of DWARF
- Explanation of stabs
- Explanation of COFF
- Explanation of XCOFF (a COFF variant)
- Using a C preprocessor as an HTML authoring tool
gcconline documentation- AMD Athlon Processor x86 Code Optimization Guide
Mulyadi Santosa is a freelance writer who lives in Indonesia.
Return to ONLamp.com.
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 3 of 3.
-
Forwarded: Comments from Jeff Dike (User Mode Linux main developer)
2007-06-01 03:15:44 mulyadi_santosa [Reply | View]
-
Re: Forwarded: Comments from Jeff Dike (User Mode Linux main developer)
2007-06-12 01:36:49 mulyadi_santosa [Reply | View]
Dear readers...
I made mistakes when composing the HTML-ized version of the comment. Please check http://the-hydra.blogspot.com/2007/06/copy-of-feedback-jeff-dike-gave-me.html for the revised version.
Sorry for the trouble.
regards,
Mulyadi
-
Forwarded: Comments from Henrik Nordstrom (Squid developer)
2007-05-31 23:42:47 mulyadi_santosa [Reply | View]
Recently, some people gave me feedback via e-mail. Since I think their comments are valuable, I copy them here. Please consider them as both errata and critic:
Page 1, at the end. You talk about branch prediction. GCC can do better if you give it feedback using profiling of the actual use of the code, without any need to manually instrument the code with hints. See the-fprofile-useand-fprofile-generateoptions.
Also-O3does not always generate faster code than-O2. In most cases it does, but it also enlarges the code size by more aggressive inlining and loop unrolling which might cost a bit in cache misses on modern CPUs.
Page 2: GDB actually works pretty well with-fomit-frame-pointersif the program is built with gdb debug data using-ggdbor just-gon platforms where gdb is the default debugger such as Linux. But it's true that a backtrace will not show you the full trace. Also true in general when
using-O3(or-O2with manually inlined functions) which gets even more confusing to debug.
Page 4: Using-Eas a general preprocessor for things like HTML is not such great idea. Easy to get bitten by various implicitly defined symbols and some C preprocessor assumptions. I would advice against recommending this as one possible application of-E.
It is what it is meant to be used for. To check what the pre-processor did to your code, or on other words what the compiler actually compiled.
With complex macros etc it is not always obvious, and it's easy to get badly bitten by a missing()in a macro or similar which isn't obvious when reading the original source.
Trivial example where-Ehelps in explaining what happened:
#define ADD(a,b) a + b
int a = ADD(5,6) * 4
Regards
Henrik







gcc (GNU C Compiler) is actually a collection of frontend tools that
Actually, gcc == GNU Compiler Collection - the whole family is
referred to as gcc.
Preprocessing: Producing code that no longer contains directives. Things like "#if" cannot be understood directly by the compiler, so this must be translated into real code. Macros are also expanded at this stage, making the resulting code larger than the original.
It also pulls in headers.
..manipulate them further. This work is done in multipass style, which demonstrates that it sometimes takes more than one scan through the source code to optimize.
It doesn't scan the source - it scans the intermediate format, which used to be RTL, but which is something else now.
...As you may already be aware, registers can be accessed hundreds or thousands times faster than RAM cells.
Exaggeration - Maybe ~100 cycles for going out to main memory, but these things will be in cache, so might cost a few cycles.
0x7530 is 30,000 in decimal form, so we can quickly guess the loop is..
0x7530 is hex, "0x7530 is 30,000 in hexadecimal form" or "0x7530 in decimal is 30,000"
simplified. This code represents the innermost loop and the outermost loop ("for(j=0;j<5000;j++) ... for(k=0;k<4;k++)") because that is literally a request to do 30,000 loops. Note that you just need to...</i>
5000 * 4 = 20000 loops.
Author's note: I admit this is solely my own mistake that confused number of loops with the current value of accumulator (
accvariable). The correct sentence should be "this code represents the middle and the innermost loop (for(j=0;j<5000;j++) ...for(k=0;k<4;k++)</code>). In the end of these loops, accumulator is increased by 30,000".
To illustrate them better, here are the codes with inline comments. First check #1, then #2 and so on to understand the flow.
So, instead of originally looping 200,000,000 (10,000 * 5,000 * 4) times, it now does 50,000,000 (10,000 * 5,000) times only.
Now, on to parameter passing. In x86 architectures, parameters are pushed to the stack and later popped inside the function for further processing.
Sometimes popped, often they are left on the stack.
By using -mregparm, you basically break the Intel x86-compatible Application Binary Interface (ABI). Therefore, you should mention it when you distribute your software in binary only form.
Why? I see no problem shipping source with Makefiles that say
-mregparam. The ABI problem comes if you were to redeclare a library function as regparam and call it.