This document tries to collect all kinds of information related to TLS and serves as a design document and implementation guide. Nothing fancy, just something to help us flesh out the details.
We seem to have a three dimensional problem space:
Complete means static, but is a term used in certain environments. The advantage of using complete in this context is that it allows us to use static to mean something else. The big difference between complete and shared is the presence (or absence) of a runtime linker.
This of course refers to the TLS model in use. A process can have both models in use at the same time, but certain technical restrictions apply. The big difference between static and dynamic TLS is the use of the __tls_get_addr() function to get the virtual address of a thread local variable (or not).
This means whether a threads library (libthr or libkse) is present and/or in use. The existence of the __thread keyword does not imply or mean that the process will be multi-threaded. This means that we have to deal with TLS accesses outside the context of a threaded application. The big difference between pthread and without pthread is the ability to actually have multiple threads.
Of the current tier 1 and tier 2 platforms, only i386 and ia64 have full toolchain support. This is with GCC 3.3. On ia64, the current version of binutils (2.13.2) is buggy with respect to TLS. This seems to affect dynamic TLS relocations. On alpha the TLS access sequences are not generated at all. The __thread keyword seems to be ignored. On sparc64 the compiler emits an error when the __thread keyword is used. GCC 3.4 claims to have support for TLS on alpha and sparc64. This has not been tested or verified. On amd64 the assembler does not support thread-local access relocations in 64-bit mode (binutils 2.13.2). When generating 32-bit (ILP32) code on amd64, the assembler supports TLS. This however has no practical value.
Below typical TLS access sequences, both static and dynamic, for the
platforms that do support TLS. The C code from which the access
sequences is generated is:
int __thread i = 3;
int x() { return i; }
movl %gs:0, %eax
movl i@NTPOFF(%eax), %eax
addl $_GLOBAL_OFFSET_TABLE_+[.-.L2], %ebx
leal i@TLSGD(,%ebx,1), %eax
call ___tls_get_addr@PLT
movl (%eax), %eax
popl %ebx
addl r14 = @tprel(i), tp
;;
ld4 r8 = [r14]
addl r14 = @ltoff(@dtpmod(i)), gp
addl r15 = @ltoff(@dtprel(i)), gp
;;
ld8 out0 = [r14]
ld8 out1 = [r15]
br.call.sptk b0 = __tls_get_addr
ld4 r8 = [r8]