Christian's Projects » LLVM

LDC 0.9 released

Christian — Thu, 08 Jan 2009 23:21:47 +0000

The first version of LDC, the LLVM based compiler for version one of the D programming language has been released for x86-32 Linux. Get it here!

We had already announced this release during the Tango conference in September (we hope the video of our presentation will be out soon), but – as was to be expected – it took a bit longer than planned.

LDC ships with a precompiled Tango rev 4237 and passes all except two of Tango’s unittests (io.digest.Md2 and text.locale.Posix fail). DStress results also look favorable and the GtkD demos work. The chances are good that your code will work with it too!

There are several known issues, the most severe being:

LDC does not compile DWT successfully for unknown reasons
an LLVM 2.4 bug sometimes leads to linker errors when unreachable code is optimized away
LDC doesn’t fully follow the D calling convention (but it’s close!)
we inherit every DMD-frontend related bug in the D bug tracker
… there are more in our tracker

If you encounter a bug, please check our bug tracker and create a new ticket if the issue isn’t listed yet. Maybe you are feeling adventurous and want to try fixing it yourself; in that case take a look at our getting started guide.

LDC could support other platforms. Furthest along so far are

x86-64 Linux: needs people to start fixing smaller bugs, exception bug (LLVM issue)
x86-32 Mac: small runtime issues, needs tests
x86-32 Windows: exceptions not supported (LLVM issue)

but support for these platforms won’t improve on its own! Several friendly people have offered their help – we need more of those!

For those with big CTFE memory needs, we have an experimental version of LDC available which has the compile-time garbage collector enabled. In the future, we’re going to experiment with a forward reference hack, but right now it still introduces too many regressions.

Feedback and questions are appreciated and should go to the mailing list. Alternatively, we’re often seen in #ldc on FreeNode.

Tomas Lindquist Olsen, Christian Kamm

Exception handling in LDC using LLVM

Christian — Tue, 19 Aug 2008 13:24:39 +0000

Exception handling is an integral part of the D programming language. Naturally LDC, aiming to be a complying compiler, needs to provide it. Here I describe how exactly user code, generated LLVM IR, the unwinding library and the LDC runtime interact to make it all work – at least on x86 Linux.

There is some documentation on exception handling with LLVM and the pages linked from there contain further information, in particular the details on the unwinding runtime. Unfortunately, examples of actual use are hard to find, so trial and error has played a major role in learning the workings of LLVM EH. I’ll try to present a complete example here, but will assume you’ve at least skimmed through both documents.

First, the throw statement. Its basic job is simple: invoke the exception handling runtime by calling _Unwind_RaiseException with the address of an _Unwind_Exception struct. This struct contains, among some private data, an eight-byte exception class to identify the language and vendor it originates from (for LDC we set it to “D1\0\0″ and “LLDC”) and a cleanup callback. Since it is necessary to communicate the exception object that is being thrown to the handler code, this struct is embedded in a larger one. Later, the address of this surrounding struct can be computed from the address of the unwind_info member.

Consequently, the outer struct looks like this

struct _d_exception { Object exception_object; _Unwind_Exception unwind_info; }

and the code to invoke the unwinding runtime is straightforward:

void _d_throw_exception(Object e) { if (e !is null) { _d_exception* exc_struct = new _d_exception; exc_struct.unwind_info.exception_class[0..4] = "LLDC"; exc_struct.unwind_info.exception_class[4..8] = "D1\0\0"; exc_struct.exception_object = e; _Unwind_RaiseException(&exc_struct.unwind_info); } abort(); }

What happens on a throw is essentially the following:

_Unwind_RaiseException traces the stack by looking at the unwind tables and, for each frame that has a landing pad set up, calls a ‘personality function’, asking it whether it can handle the exception object.
Once one is found, it traces the stack again, this time telling the personality functions to execute the code in any intervening finally blocks.
In the end, it calls the function for the final landing pad with arguments indicating that control is to be transferred to the catch handler.

Luckily, exception handling in D can be implemented using only a single personality function for all landing pads. This personality function decides what to do for each individual landing pad by parsing the language specific area of the unwind data. This area contains three tables: the callsite table, the action table and the classinfo table.

The callsite table maps instruction address ranges to indices into the action table. These address ranges mark the beginning and end of the code in a try block.
The action table contains chains of indices into the classinfo table and values that will be used to identify the action to the handler code later. An action corresponds to a catch or finally block.
The classinfo table holds the addresses to the classinfos of each class used in a catch parameter.

When the personality function is called and given the context of a certain landing pad, it looks up the instruction pointer, finds the right entry in the callsite table and then walks the corresponding action chain. For each possible action, it checks whether the thrown exception object is derived from the class specified by the respective classinfo. Once a match is found, it knows that this landing pad is responsible for the exception. When it is called again with instructions to transfer control to the handler, the personality function passes the exception object and the index from the action table to the hander code.

If you’re interested in the code that accomplishes this, take a look here.

The last step required to make EH work is to provide the handler code and to write out the correct unwind tables. Let’s look at some user code and what it is essentially turned into by LDC (of course the actual output is LLVM IR). The situation grows considerably more complex when there are nested try-catch-finallys in the same stack frame, but I hope this snippet illustrates the basic ideas.

try { code_try(); } catch(ExceptionClass ec) { code_catch(ec); } finally { code_finally(); } // this is an invoke with // 'handler' as exception target code_try(); goto end; handler: ehptr = llvm.eh.exception(); ehsel = llvm.eh.selector( ehptr, &_d_eh_personality, ExceptionClass.classinfo, 0); switch(ehsel) { case 1: code_catch(ehptr); goto end; default: // ehsel == 0 code_finally(); _Unwind_Resume(&ehptr.unwind_info); } // unreachable end: code_finally();

The llvm.eh.* intrinsics get the exception object and the action table index that are passed in by the personality function as mentioned above. But there’s more going on here: the selector intrinsic also tells LLVM what the data in the unwind tables should be. In particular, the personality function and the exception classinfos are set here. The zero indicates the finally block. The call to code_try() has been turned into an invoke, which makes LLVM emit an entry in the callsite table for it.

As you can see, the unwinding runtime and LLVM code generator are tied closely via the two intrinsics and thus supporting other runtimes such as Windows structured exception handling will be nigh-impossible without changes to LLVM. Hopefully, getting llvm-gcc to support exception handling on Windows will be enough of an incentive for the LLVM team to provide that feature eventually.

Another thing to bear in mind is that LLVM’s exception support is, at the moment, very C++ specific. The code generator can fill the language specific data area only with the three C++ style tables mentioned above. Fortunately, D’s exceptions are similar enough that we can get the right behavior by inserting suitable values into these tables.

For now, the implementation in LDC has only been tested on x86 Linux, though the PowerPC target should work as well. EH on x86-64 Linux will supposedly be enabled in the next LLVM release. The remaining issues should be solved as LLVM matures, enabling LDC to provide correct exception handling support on more platforms.

Compilers for the D programming language: DMD, GDC and LDC

Christian — Wed, 09 Jul 2008 17:12:26 +0000

Currently, users of the D programming language can essentially choose between two mature compilers: DMD and GDC. I’ve started helping with the development of a third, LDC, and think the first public alpha version can’t be far off. In this post, I’ll give a short overview of the compilers available and say a few words about LDC.

First, there is the reference compiler DMD that is maintained and improved by D’s inventor Walter Bright. It has a stable branch for D 1.0 as well as a bleeding edge counterpart for version 2.0 of the D specification that’s still under development. One part of it, the front-end responsible for parsing code, building a syntax tree and running semantics, is open source, available under the GPL or Artistic license. Its back-end, however, is not available and thus only Walter can compile DMD.

This is not a failing in itself. Indeed most users will never have the need or desire to compile their compiler from source. Yet it also prohibits testing patches for the front-end, makes debugging nigh impossible and generally complicates helping with DMD’s development.

The alternative – and often the only alternative if your target is not x86 Linux or Windows – is GDC. David Friedman took the open front-end and tied it to the GCC back-end, he even supports D 2.0 to some extend. While it works well and is fully open source, it has, unfortunately, remained a one-man effort. For me it was a combination of GCC’s daunting code base and the fact that there had been no activity for several months that stopped me from helping out.

Instead I eventually joined a different project: LDC. It is similar to GDC in that it also takes the open DMD front-end and aims to combine it with an equally free code generator. The back-end, though, is different: as the name suggests LDC emits LLVM bitcode, which can be compiled to native code, but could also be used for Just-in-Time compilation among other things. I’m also happy to say that even though LLVM is a large and ambitious project, it remains surprisingly easy to learn and work with.

Almost the first thing I did when I started contributing to LDC a few months back, was to integrate Thomas Kühne’s exhaustive DStress test suite with LDC: it is very useful for finding bugs and regressions and can even serve as a sort of crude progress indicator (crude because there is no relationship between number of bugs and number of tests; there are about 1000 tests for inline assembly and only 30 or so for exception handling). As you can see, there are still a handful (517 at the last count) of regressions with respect to DMD, but their number has been decreasing steadily.

In the last months, Tomas Lindquist Olsen, who started the project about a year ago and is responsible for almost all of its existing functionality, and I have added the last major missing parts to the compiler: inline assembly (thanks to David Friedman for the asm parsing and rewriting code!), exception handling and the synchronized statement are now supported. With these out of the way, goals for the future are squashing bugs, fixing some linking issues and getting LDC to work properly on Windows.

Speaking of other platforms: As Tomas and I both develop on x86 Linux, other configurations didn’t get much testing yet. Some people reported partial successes on FreeBSD and Sparc but more testers would be welcome. You can generally contact us by email or in #ldc on freenode. All in all, we’ve made great progress and if we can keep it up, LDC will be in the same league as DMD and GDC soon.