[Talk] State of dynamic linking in various platforms...

Fri Aug 23 08:03:29 EST 2002

[I suspect this thread has significantly degenerated from what Luke
 was hoping for].

On 2002-Aug-22 16:14:26 +1000, Chris Maltby <chris at sw.oz.au> wrote:
>On Thu, Aug 22, 2002 at 03:20:57PM +1000, Peter Jeremy wrote:
>> Runtime linking generally implies PIC for the shared libraries - on
>> most systems, this means you lose a register (it's needed to support
>> PIC).  On the iA32, the loss of a register is a further performance
>> hit (since there aren't many to start with).
>
>That's mostly a compiler issue. I don't want to debate the strengths
>and weaknesses of ia32, but there are ways to minimise the impact of
>PIC.

Agreed, but AFAIK, the most popular ia32 compiler doesn't do an
especially wonderful job here, hence this does have an impact.  Good
L1 cache performance and hardware register renumbering minimise the
impact of this on current ia32 implementations and it's not relevant
on sane architectures.

>> And as for speed, we have some applications that take 10 seconds of
>> CPU time (on a fast Alpha) to start - courtesy of the runtime loader
>> (and that's with lazy function binding, so the 10 seconds is just to
>> bind variables).  We would love to be able to statically link them...
>
>That'll be tru64 then, wouldn't it. Are you sure there aren't linker
>options to speed up the binding? What about ld -msym... You can also
>set the LD_BIND_NOW environment variable and put some logging args in
>_RLD_ARGS to gather information. See loader(5) and ld(1).

It is Tru64.  I've experimented with the various options and whilst it
was possible to improve the binding time with "-depth_ring_search",
there were a number of cases where this would seem to change the
binding result (eg "clog" is defined as a function in libm and as a
variable in libcxx).  I can't remember if we experimented with
"-msym".  Overall, I think this application is just an instance where
the designers made a decision without considering the ramifications.
I hope there aren't too many other similar applications out in the
wild.

I agree this is an extreme case, but runtime linking does incur a
cost on every exec.  For executables that are regularly exec'd (eg
sh(1)), this cost may be significant.  Maybe we need a way for the
kernel to keep pre-bound executables around and to resurrect the
sticky bit as an indicator of when it should be used.

>> >All that being said, I reckon that the memory efficiency issue trumps
>> >all for system executables. Make everything dynamic.
>
>> Actually, memory efficiency is generally higher for static executables.
>> The rtld needs to scribble in the text segment to do address fixups
>> so you lose most of the sharability gains.  Also, if you only have
>> one application using a shared library, you wind up paying for the
>> bloat of the complete .so, instead of just that bit that the application
>> actually uses.
>
>I haven't noticed anyone implementing dynamic binding by writing on
>the code. Are you sure about the need for the rtld to do it, because it
>doesn't fit with any system I'm familiar with. The usual approach is a
>writable jump table, with each slot initialised to point to the dynamic
>binder, which then replaces itself with a pointer to the actual call.

Maybe I'd need to re-check my facts here.  This might also only be
relevant to old aout shared executables/libraries.

>I'm surprised that no-one has mentioned cache performance so far in
>this thread.

That's a good point (and one I hadn't considered).  I thought the last
(FreeBSD) figures I'd seen posted still suggested that a fully static
system was faster than a dynamically linked one.  If I get a round
tuit, I might try it myself.

Peter