Learning linkers

In this lab, you will gain hands-on experience with relocations, how linkers resolve them, as well as get some knowledge about static / dynamic linking. Navigate to labs/03-linkers to see the examples we've prepared for you.

Definitions and declarations

Declaration in C introduces identifier and describes its type, whether it is a type, object or a function.

Definition in C instantiates / implement the identifier. It is what linker needs in order to make references to those entities.

Take a look at following declarations:

extern int bar;
extern int mul(int a, int b);
double sum(int a, double b);
struct foo;

Take a look at main.c and fact.c provided.

int main() {
  unsigned f = fact(5);
  printf("%u\n", f);
  return 0;
}
unsigned fact(unsigned x) {
  if (x < 2)
    return 1;

  return x * fact(x - 1);
}

First, let's from here use only RISC-V toolchain:

source /opt/sc-dt/env.sh # NOTE: if you are using something other than bash, this might not work. If so, try the old fashioned path export
export CC=/opt/sc-dt/riscv-gcc/bin/
export PATH=${PATH}:/opt/sc-dt/riscv-gcc/bin
export PATH=${PATH}:/opt/sc-dt/tools/bin # For QEMU

main here does not know that fact function exists. If we try to compile main to the executable make exec, we will get following error:

/opt/sc-dt/riscv-gcc/bin/../lib/gcc/riscv64-unknown-linux-gnu/12.2.1/../../../../riscv64-unknown-linux-gnu/bin/ld: /tmp/ccqIh9oC.o: in function `main':
main.c:(.text+0xa): undefined reference to `fact'
collect2: error: ld returned 1 exit status

Linker failed to find definition for the definition for fact function.

Task 3.1

Use readelf and file utilities to investigate main.o file and its contents and answer following questions:

Format for the following assignment: answer the questions in markdown file.

  • What is the type of the file?
  • How many sections are there?
  • List all entries in the same format readelf prints it
  • Why does entries for print and fact functions have NOTYPE type?
  • Modify the following example so executable is produced correctly.

Relocations

Let's take a look at objdump output:

riscv64-unknown-linux-gnu-objdump -d main.o

We will notice that we have address of factorial function is all zeroes:

    e:	000080e7          	jalr	ra # a <main+0xa>

Now compile both files and link into a single executable and look at the call address:

make bin
riscv64-unknown-linux-gnu-objdump -d fact | grep fact

You will see that fact now has been assigned an address and main nows how to call it:

fact:     file format elf64-littleriscv
   105fc:	028000ef          	jal	ra,10624 <fact>
0000000000010624 <fact>:
   1063c:	00e7e463          	bltu	a5,a4,10644 <fact+0x20>
   10642:	a839                	j	10660 <fact+0x3c>
   1064e:	fd7ff0ef          	jal	ra,10624 <fact>

The linker managed to find fact function and insert the correct address for it. It used relocations to do it.

riscv64-unknown-linux-gnu-readelf -r main.o

Possible output:

Relocation section '.rela.text' at offset 0x268 contains 8 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000a  000c00000012 R_RISCV_CALL      0000000000000000 fact + 0
00000000000a  000000000033 R_RISCV_RELAX                        0
00000000001e  00080000001a R_RISCV_HI20      0000000000000000 .LC0 + 0
00000000001e  000000000033 R_RISCV_RELAX                        0
000000000022  00080000001b R_RISCV_LO12_I    0000000000000000 .LC0 + 0
000000000022  000000000033 R_RISCV_RELAX                        0
000000000026  000d00000012 R_RISCV_CALL      0000000000000000 printf + 0
000000000026  000000000033 R_RISCV_RELAX                        0

From the output we see that both fact and printf names calls have their relocations. These relocations are provided by compiler to asssist linker in resolving symbols.

Static libraries

The following command:

riscv64-unknown-linux-gnu-gcc main.c fact.c  -o fact

compiles program to executable. But no linker here is invoked? Or is it?

Pass --verbose flag to dive deeper into what gcc actually calls under the hood.

Find collect2 call line:

/opt/sc-dt/riscv-gcc/bin/../libexec/gcc/riscv64-unknown-linux-gnu/12.2.1/collect2 -plugin /opt/sc-dt/riscv-gcc/bin/../libexec/gcc/riscv64-unknown-linux-gnu/12.2.1/liblto_plugin.so -plugin-opt=/opt/sc-dt/riscv-gcc/bin/../libexec/gcc/riscv64-unknown-linux-gnu/12.2.1/lto-wrapper -plugin-opt=-fresolution=/tmp/cc1yCRZY.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --sysroot=/opt/sc-dt/riscv-gcc/bin/../sysroot --eh-frame-hdr -melf64lriscv -dynamic-linker /lib/ld-linux-riscv64-lp64d.so.1 -o fact /opt/sc-dt/riscv-gcc/bin/../sysroot/usr/lib64/lp64d/crt1.o /opt/sc-dt/riscv-gcc/bin/../lib/gcc/riscv64-unknown-linux-gnu/12.2.1/crti.o /opt/sc-dt/riscv-gcc/bin/../lib/gcc/riscv64-unknown-linux-gnu/12.2.1/crtbegin.o -L/opt/sc-dt/riscv-gcc/bin/../lib/gcc/riscv64-unknown-linux-gnu/12.2.1 -L/opt/sc-dt/riscv-gcc/bin/../lib/gcc -L/opt/sc-dt/riscv-gcc/bin/../lib/gcc/riscv64-unknown-linux-gnu/12.2.1/../../../../riscv64-unknown-linux-gnu/lib/../lib64/lp64d -L/opt/sc-dt/riscv-gcc/bin/../sysroot/lib/../lib64/lp64d -L/opt/sc-dt/riscv-gcc/bin/../lib/gcc/riscv64-unknown-linux-gnu/12.2.1/../../../../riscv64-unknown-linux-gnu/lib -L/opt/sc-dt/riscv-gcc/bin/../sysroot/lib64/lp64d -L/opt/sc-dt/riscv-gcc/bin/../sysroot/usr/lib64/lp64d -L/opt/sc-dt/riscv-gcc/bin/../sysroot/lib /tmp/ccOEmH9J.o /tmp/cc6v5KHP.o -lgcc --push-state --as-needed -lgcc_s --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state /opt/sc-dt/riscv-gcc/bin/../lib/gcc/riscv64-unknown-linux-gnu/12.2.1/crtend.o /opt/sc-dt/riscv-gcc/bin/../lib/gcc/riscv64-unknown-linux-gnu/12.2.1/crtn.o

collect2 is the actual command called in the process of linking.

Try examining every argument and describe what it is responsible for.

Mostly all of the arguments are paths to libraries.

Static libraries are embedded to applications code directly.

Let's create or own little static library:

riscv64-unknown-linux-gnu-ar cr libfact.o fact.o

Use nm utility to get the list of symbols available in the archive:

riscv64-unknown-linux-gnu-nm libfact.o
fact.o:
0000000000010294 r __abi_tag
0000000000012040 B __BSS_END__
0000000000012038 B __bss_start
0000000000012038 b completed.0
0000000000012000 D __DATA_BEGIN__
0000000000012000 D __data_start
0000000000012000 W data_start
000000000001048a t deregister_tm_clones
00000000000104d0 t __do_global_dtors_aux
0000000000011e18 d __do_global_dtors_aux_fini_array_entry
0000000000012030 D __dso_handle
0000000000011e20 d _DYNAMIC
0000000000012038 D _edata
0000000000012040 B _end
0000000000010524 T fact
00000000000104f2 t frame_dummy
0000000000011e10 d __frame_dummy_init_array_entry
0000000000010608 r __FRAME_END__
0000000000012020 d _GLOBAL_OFFSET_TABLE_
0000000000012800 A __global_pointer$
00000000000105cc r __GNU_EH_FRAME_HDR
0000000000011e18 d __init_array_end
0000000000011e10 d __init_array_start
0000000000012028 D _IO_stdin_used
00000000000105c2 T __libc_csu_fini
000000000001056a T __libc_csu_init
                 U __libc_start_main@GLIBC_2.27
000000000001047e t load_gp
00000000000104f4 T main
                 U printf@GLIBC_2.27
0000000000010410 t _PROCEDURE_LINKAGE_TABLE_
00000000000104a8 t register_tm_clones
0000000000012028 D __SDATA_BEGIN__
0000000000010450 T _start
0000000000012000 D __TMC_END__

To link with your static or dynamic library, pass -llib argument to compilation flags. lib is the name of library.

Note that linking directly with ld is strongly discouraged, instead, use gcc or clang driver and pass additional options to linker if needed.

Task 3.2

  • Create a separare directory with files for your static library
  • Write Makefile target which creates static library
  • Use nm to find out what
  • Write Makefile target which links
  • Create your own static library for RISC-V. It would be even better if application was useful, for instance, a custom C logging library.

Dynamic linking

Static linking is portable, because all library code is embedded in the application and no platform support required. But this makes application's code size increase dramatically.

The solution to this problem is dynamic libraries

Let's create our little dynamic library and link our application against it:

CFLAGS=-fPIC make fact
riscv64-unknown-linux-gnu-gcc -shared fact.o -o libfact.so
$ file libfact.so
libfact.so: ELF 64-bit LSB shared object, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, not stripped

Now link your library against libfact.so:

riscv64-unknown-linux-gnu-gcc main.o -o fact -lfact
/opt/sc-dt/riscv-gcc/bin/../lib/gcc/riscv64-unknown-linux-gnu/12.2.1/../../../../riscv64-unknown-linux-gnu/bin/ld: cannot find -lfact: No such file or directory
collect2: error: ld returned 1 exit status

This happened because our libfact.so is in the current working directory, and linker does not now it should look here.

You can pass paths where linker should search for the libraries with -L option:

riscv64-unknown-linux-gnu-gcc main.o -o fact -L. -lfact

We told the linker to search for libfact inside our current working directory.

file fact
fact: ELF 64-bit LSB executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, for GNU/Linux 4.15.0, not stripped

Now let's run it with qemu:

❯ qemu-riscv64 ./fact
./fact: error while loading2 shared libraries: libfact.so: cannot open shared object file: No such file or directory

What is wrong? We linker the library, didn't we?

The reason is that though we specified where to look for dynamic library, we didn't put that information in the binary.

Let's do it using rpath:

riscv64-unknown-linux-gnu-gcc main.o -o fact -L. -lfact -Wl,-rpath,.

Now let's try again:

qemu-riscv64  ./fact
./fact: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory

Now qemu failed to find the C standard library. We already know how to fix it, let's pass path to glibc:

❯ qemu-riscv64 -L . -L /opt/sc-dt/riscv-gcc/sysroot/ ./fact
120

Our factorial finally works, and we learned to create dynamic libraries.

Task 3.3

  • Create your own little dynamic library. First do it with x86 toolchain, then for RISCV.
  • Link application with the library and make it run on QEMU and on LicheePi (when available).