OS/2 eZine - http://www.os2ezine.com
Spacer
16 December 2001
 
Andrew Zabolotny is a long time OS/2 user and developer, the maintainer of the OS/2 port of Pentium GCC and now the GCC compiler suite.

Platon Fomichev aka Stauff is a long time OS/2 user and developer, and a porter of a wide range of UNIX software to OS/2 VIO and XFree86/2.


If you have a comment about the content of this article, please feel free to vent in the OS/2 eZine discussion forums.

There is also a Printer Friendly version of this page.

Spacer
Previous Article
Home
Next Article


OS/2.org - where I want to go.


GCC 3.0.2 - Installation, Usage and Tips

Survey

Are you interested in further GCC 3.0.x improvements including a switch to ELF object format (.obj) as a next major step in reaching further cross UNIX compatibility. OMF will be preserved too. (a.out will be dropped.)

Comments for the developers:




Preface

This happened in July - one rainy day I (Stauff) was going to my work and suddenly remembered that Andy, whom I knew from university, is working just near my work - in the next building. It is a great thing for OS/2 users to talk to each other, since there are few of us today (sigh) so we talked a lot about OS/2 news, my new job, Linux and GNU and suddenly I remembered one thing that had been on my porting schedule for a long long time. "Andy - QT-3 (Qt is a C++ framework/GUI toolkit for X and Windows) - requires GCC 3.0 and I can't build it with your PGCC, can you give it a look?" - I asked. "No problem" - was his answer. However we soon faced tons of problems. QT3 refused to build, had problem during linkage, the generated code was bad etc. etc. At some stage we asked the Mozilla developers to test the new compiler to ensure everything was ok. Only now after weeks and months during which Andrew debugged and fixed and again fixed (as GCC was also moving) you can see GCC 3.0.2 in the shape we want it to be. I do hope OS/2 developers will have as much fun and make as many interesting discoveries as we have during work on GCC. Good Luck!

How This Article is Organized

Both Andy and I are developers and not very skilled at writing professional documentation, although we tried to structure this article in such a way that it will be easy to read and understand for both newbie OS/2 developers and real gurus. The sections are as follows:

Getting started

Why GCC? This is maybe the first question that comes to your mind. Every OS/2 developer who uses it has their own answer but in my and only my opinion this is a very decent compiler, fast and robust, that allows porting of UNIX applications into OS/2. Besides this, GCC 3.0.2 has several interesting features that distinguish it from other compilers:
  • High portability. GCC runs today on a huge number of architectures and operating systems. Here's a list of only the CPU types supported by gcc (this is a simple listing of gcc/config/*):
    1750a, a29k, alpha, arc, arm, avr, c4x, clipper, convex, d30v, dsp16xx, elxsi, fr30, h8300, i370, i386, i860, i960, ia64, m32r, m68hc11, m68k, m88k, mcore, mips, mn10200, mn10300, ns32k, pa, pdp11, pj, romp, rs6000, s390, sh, sparc, v850, vax, we32k
    Not mentioning that for many CPU types, gcc supports more than one operating system (e.g. x86).
  • Standards. GCC developers try to keep close to the respective language standards (unlike some other C compilers which have a whole heap of doubtful language extensions), thus ensuring a high degree of language compatibility. Besides, gcc has the "-ansi" switch, which ensures 100% compatibility with ANSI C/C++, disabling all platform-specific extensions.
  • Optimizer. Of all the compilers I have seen, GCC has the most sophisticated optimizer. For the x86 platform, gcc has switches for optimizing code for every major CPU type - from i386 to Athlon and Pentium Pro.
  • Features. Of all the compilers I have seen, GCC has the largest number of switches, allowing you full control of how exactly the code is being generated, of language dialects, which warnings are emitted, and how preprocessing is done.
  • Open source. GCC is one of the first free C/C++ compilers. You can get its source code and see how something works exactly - if you still have questions after reading the very detailed and high-quality documentation. You can fix a bug if you find one - without having to wait months for "service packs". You can participate in its developement.

Installation and Configuration:

Download site

GCC 3.0.2 is currently in beta state, but this does not mean it is not working. Indeed beta time for such a complex thing as a compiler can last for quite a long time! We think it is usable now - so we consider it to be bug-free. Try now, don't wait for a release version because this one may not appear until next year. As this product is currently in beta stage, please do not upload it to Hobbes or any other OS/2 applications archive.

Get GCC 3.0.2 from Netlabs Server

Requirements

You will need the following pre-requisites before installing and using GCC 3.0:
  • OS/2 Warp 3.0 or later. If you're going to use the standard C++ library (libstdc++) you will also need Unicode support (service pack 26 for Warp 3.0.)
  • EMX 0.9d4 is required. Please note that EMX is not just a bunch of DLL's - it is a sort of development toolkit with its own compiler (GCC 2.8.1) tools and headers. So when we speak about EMX we actually mean the development part of this package. That's why fixpack number 4 is required - not only for the runtime but for the development part as well.
  • 32M RAM for C and 64M RAM for C++ development, because the GCC optimizer is very memory hungry.
  • Correct Config.sys SET entries are very important because they can lead to compiling errors. Please ensure you understand everything in the EMX part of config.sys before proceeding any further.
  • EMX single-threaded C runtime DLL fix - this is a requirement because it removes a potential source of problems for you with mixing DLLs linked against the multi-threaded and single-threaded versions of the EMX C runtime DLLs. The archive is called emx-strt-fix-0.0.2.zip and can be found on Hobbes.
  • GNU gettext 0.10.40 - another must-have package. A previous version (0.10.35) will work as well, but 0.10.40 is highly recommended, and it is fully backward compatible.
  • Binutils 2.11.2 (binary utilities) - new binutils with the new assembler are required for new GCC. Older GAS won't work.
    WARNING: GAS 2.9.1 will silently generate invalid code.
  • We will also need a wide range of UNIX utilities for our daily work, for GCC we will need GNU Make, other useful packages are GNU File Utilities, GNU text utilities, GNU patch/diff and others. We recommend getting them from Hobbes.

What We Are Going to Install

The installation process of GCC is pretty straight-forward (although it is without "advanced" `Next->Next->Finish' technology), copy the archives to the directory where your emx/ tree is located and unpack them making sure to preserve the directory structure in the archive.

All the GCC packages have a common directory structure that will help you keep the old EMX GCC alongside the new one. All the binaries are installed under emx/bin.new/ and do not overwrite anything in emx/bin/. The following compilers are available:

  • gcc-os2-3.0.2-docs.zip - Documentation in INF format for compilers
  • gcc-os2-3.0.2-gcc.zip - GNU C Compiler
  • gcc-os2-3.0.2-gcj.zip - GNU Java Compiler. The GNU Java runtime has not been ported (yet?) to OS/2, thus it is almost useless. However, if someone wants, he can continue this work. The compiler itself seems fully functional.
  • gcc-os2-3.0.2-goc.zip - GNU Objective C compiler
  • gcc-os2-3.0.2-gpp.zip - GNU C++ Compiler (C compiler required)
  • gcc-os2-3.0.2-g77.zip - GNU Fortran Compiler
  • gcc-os2-3.0.2-diff.zip - The differences between mainstream GCC and the OS/2 version.

Unpack the compiler packages in the emx directory. For C/C++ development you must install both the C and C++ compilers. Please ensure that you have the 'emx/dll/' directory in your LIBPATH.

Move the file /emx/bin.new/newgcc.cmd into /emx/bin; change it to suit all your needs, all instructions are inside newgcc.cmd. To switch to the new gcc just execute newgcc.cmd. To switch back, exit from your shell. To completely replace gcc 2.8.x with the new version you should merge all the files from /emx/bin.new with the files in /emx/bin, overwriting any existing files with the same name in /emx/bin. Then remove the bin.new subdirectory.

Now we will do the EMX update and libs creation. Switch to the new GCC with newgcc.cmd (please make sure to adjust it with your own preferences) and check that the basic compilers work. Type gcc -v and g++ -v to see the compilers' versions. You should see 3.0.2 as the GCC version in both cases. By launching the compilers we also check everything is O.K., so if you had any troubles, errors or even core-dumps during launch, it is time to return and re-read the installation instructions.

Building OMF libraries

(Before proceding, ensure you have switched to GCC 3.0.2 and its tools!)

Since EMX 0.9 was written, IBM has added to OS/2 several useful API's, one of which is the Unicode API. To allow usage of the Unicode API's in EMX programs (programs linked with libstdc++ will need them) we had to write the missing API definitions (.h and import definitions.) Now to add the missing functions to os2.a and os2.lib (the libraries defining all OS/2 system-specific functions) you must go to /emx/lib and type 'make'. This will rebuild os2.a so that it contains all the functions defined in unicode.imp, then it will re-convert all outdated OMF libraries (.lib) from their a.out counterparts (.a).

Also you must go to the /emx/lib/gcc-lib/i386-pc-os2_emx/3.0.2/ directory and type 'make' there as well for the same reason (to convert supplied a.out libraries into OMF format in the case you will use the -Zomf switch).

Setting variables

Now stop and analyze all the above stuff. Do you really have fix 4 for the EMX developer part installed? Do you know where your EMX stuff is located? Do you know your LIBPATH and is it DLL-clash free? Did you update EMX libraries? These are simple but very important questions because if something is wrong in this basic stuff, you can face very severe and fatal consequences, including GCC coredumps, illegal code produced etc. Now check your config.sys, there are a number of variables we will now check. First C_INCLUDE_PATH - this is not really important, just check whether it exists, and if yes it points to the right directory. Note that this variable may not exist at all. Now CPLUS_INCLUDE_PATH is very important! The new C++ uses a whole bunch of its own files and does not need any of the old EMX headers. Please place the new include path i.e. emx\include\g++-v3 before other paths specified in this set. Please note you may not have this variable either.

Compiling our First C++ Application with New GCC

We will use a very simple program to show some issues with OMF format in the new GCC. Let's create a file called test.cpp:
#include <iostream.h>
using namespace std;
int main()
{
cout<<"Hello World"<<endl;
return 0;
}

Now let's compile this by launching g++ test.cpp. No problems, but we get an a.out object file in this example. Let's use OMF.

g++ test.cpp -Zomf

You will get a lot of linking errors, for example:
X:\......\gcc-lib\i386-pc-os2_emx\3.0.2\st\stdcxx.lib(functexcept.obj):
error L2025: _ZTSSt9exception : symbol defined more than once

Bad indeed. This happens because linking with the libstdc++ library requires special handling when using OMF format. Erase weaksyms.omf from the current directory. Go to emx\lib\gcc-lib\i386-pc-os2_emx\3.0.2\ and copy weak symbols list file from there (weaksyms.omf) to your directory. Issue the compilation command once again. It should work as expected this time.

Here is how it works. As many people know, gcc never generates OMF files directly. Instead, it generates a.out files which then are run through emxomf in order to convert them from a.out to OMF. Finally, in the link stage, LINK386 or ILINK is run. Now, the weak symbol attribute has an effect at the link stage. Having a symbol marked as weak means that if there is another symbol with the same name and also marked as weak, a duplicate symbol error is not generated. Instead, the linker randomly chooses one of them (usually the first one encountered.) (In fact, the "real" weak attribute also means that if somebody refers to a weak external symbol, and that symbol is not found anywhere, it is resolved to NULL. But this syntax of weak symbols is supported only in the a.out format, not OMF.) The only way to do it for OMF is to keep a list of weak symbols of your project in a external file, which is not lost between numerous runs of emxomf. Suppose emxomf converts an a.out file to OMF and finds a weak symbol. Now it looks into that table: have we encountered this weak symbol already, or is this the first time? If we encountered it already, the symbol is converted to a non-exported local symbol (so that LINK386 won't barf about duplicate symbols.) If not, the symbol is marked as a normal symbol, and it is added to that external table, along with the object file name where it was encountered for the first time. The file name is needed so that when the same file is re-compiled (for example after you modify the source) emxomf should detect that this symbol should be declared as external and not as a local non-exported symbol, even though it is mentioned in the weak symbol table. This file is called weaksyms.omf. If you see such a file in your project's directory, it means your project uses weak symbols. Don't delete it, after some time it will cease to grow and it will contain a complete list of weak symbols of your project. If your project uses multiple directories, and your makefile changes the current directory then emxomf can miss the weaksyms.omf file and start the list from scratch. In this case you can get duplicate symbols error. The solution is to maintain one unique weak symbol list by setting the GCC_WEAKSYMS environment variable which points to some file where all your weak symbols will be collected:

SET GCC_WEAKSYMS=d:/myproject/weaksyms.omf

The libstdc++ contains a lot of weak symbols, and if your application uses libstdc++, emxomf should know in advance about them. For this reason, before starting the compilation of your project, you must copy the weaksyms.omf file supplied with gcc (the file name is lib/gcc-lib/i386-pc-os2_emx/x.x.x/weaksyms.omf) to your project's directory.

Frequently Asked Questions

Q)Compiling speed is slow.
A)Use SET GCCLOAD=x, where x is a number of minutes to keep GCC pre-loaded in memory. Also use -pipe switch while compiling, or SET GCCOPT=-pipe.

Q)Fork does not work anymore in GCC 3.0.2. Why?
A)Not exactly. It does not work when using the -Zcrtdll switch. This happens because EMX's fork() black magic gets confused by the new gcc302*.dll. If you add the "-lgcc" switch after -Zcrtdll, your application won't be linked against gcc302*.dll, and fork() will work.

Q)I can't allocate more than 32 Mb of memory. What to do?
A)That's an EMX feature. Usually EMX applications are limited to using a 32MB heap because the fork() logic expects the heap to be allocated in one big segment. If you don't use fork(), you can use the _uflags() function to tell the EMX runtime that you don't want your heap to be contiguous.

Q)What OS/2 specific macro symbols does GCC 3.0.2 define?
A)You can see these by running gcc -v on a .c or .cpp file. The OS/2-specific symbol is __OS2__, also __EMX__ is defined; other symbols like __i386__ are defined depending on CPU type.

Q)GDB does not find my function names!
A)g++ 3.0 and later has a new name mangler. This means that gdb and other old apps cannot demangle the mangled C++ names correctly. There is nothing you can do about it other than trying to port a newer gdb (:-).

Q)-mcpu=pentiumpro does not accelerate my app. OS/2 bug?
A)The speed increase is marginal and can give significant increases only for computational algorithms. Normal programs are limited by lots of other factors, thus pure execution speed is not a requirement for most programs. In fact, I'd recommend everyone compiles using the -O2 -march=i386 switches, because these two switches favour small code size over speed, and you won't observe any differences anyway.

Q)Executable size is sooooo huge. How to reduce it?
A)You can try to use the -Zcrtdll switch. Also if your project is written in C++ and you don't use exceptions, you should use the -fno-exceptions switch: this will reduce your executable size a lot. Also -fno-rtti could give a small gain. Finally, try -Os switch which tells gcc to optimize by favouring smaller code over faster but larger code.

Q)Can I mix object files from old and new GCC?
A)Only for plain C. As I mentioned above, the C++ mangler has changed in gcc 3.x, thus you will get linking errors when linking with old C++ object files and libraries.

Technical hints and tips

  • The OS/2 port of gcc (and only it) has a new flag: -malign-strings. By default gcc used to align all string constants on a 32-byte boundary (!) which is quite too much IMHO. I've added a CPU-type-dependent alignment (4 bytes for 386/486 and 32 bytes for Pentium and later) and also added the -malign-strings=x switch to override this default. `x' is a power-of-two, e.g. -malign-strings=5 will align on 32-byte boundaries (the default value with -mcpu=pentium). The compiler itself is built with -malign-strings=0, which alone made the C++ compiler cc1plus.exe about 150K smaller! It looks like there is someone in the GCC team who works for Intel and has the goal of making gcc generate extremely bloated code. This 32-byte alignment is said to be `extremely' useful on Pentium III when doing inline strlen's and strcmp's. Phuf. If someone cares about that `gain', use -mcpu=pentium or -mcpu=pentiumpro. You can still optimize for pentium and not align strings (because you either know strlen speed is not critical or you simply don't do strlen on static strings) by using the -malign-strings flag (in fact, all the compilers were built this way -- most static strings here are just messages.)
  • gcc 3.0 has the annoying habit of aligning its stack boundary to 128 bits (e.g. 32 bytes). This generates generally larger files. To avoid this, use -mpreferred-stack-boundary=4 or whatever you like (the argument is in bytes). The 32-byte stack alignment makes sense only when using SSE instructions (but is always enabled, no matter whether you use them or not.)
  • The default CPU type is Intel 386. You can generate faster (but generally larger) code by using -mcpu=xxx where xxx is one of (as of gcc 3.0.2): i386, i486, i586, pentium (=i586), i686, pentiumpro (=i686), k6, athlon.
  • On all Intel CPUs later than i386 and Athlon the default alignment for functions, loops, and jump labels is 16 bytes, and for K6 it is 32 bytes!. This can generate lots of unused space (NOPs) in your executables, especially if you use lots of small functions. and many loops and labels. If you use -mcpu=xxx and you care for space more than for speed, use the -malign-loops=2 -malign-jumps=2 and -malign-functions=2 switch, which will align everything on a double word boundary.
  • The semi-legendary MMX and SSE support which is mentioned in the documentation does NOT mean that gcc generates SSE or MMX from C/C++ code. This just means that you can use a number of special gcc builtin functions such as __builtin_ia32_emms(), __builtin_ia32_psllw() and so on instead of writing directly __asm("emms") etc. Of course, it is nicer to write MMX code in a C-looklike manner than directly in assembly language (especially since GAS uses the hard-to-understand for those who're used to Intel syntax, AT&T syntax) but you have to get from somewhere the respective header files and/or docs (I believe they should exist somewhere in the Linux world).


Previous Article
Home
Next Article

Copyright (C) 2001. All Rights Reserved.