= Intro to MOP2 programming
Kamil Kowalczyk
2025-10-14
:jbake-type: post
:jbake-tags: MOP2 osdev
:jbake-status: published

This is an introductory post into MOP2 (my-os-project2) user application programming.

All source code (kernel, userspace and other files) are available at https://git.kamkow1lair.pl/kamkow1/my-os-project2.

Let's start by doing the most basic thing ever: quitting an application.

== AMD64 assembly

.Hello program in AMD64 assembly
[source,asm]
----
.section .text

.global _start
_start: // our application's entry point
  movq $17, %rax    // select proc_kill() syscall
  movq $-1, %rdi    // -1 means "self", so we don't need to call proc_getpid()
  int $0x80         // perform the syscall
  // We are dead!!
----

As you can see, even though we're on AMD64, we use `int $0x80` to perform a syscall.

The technically correct and better way would be to implement support for `syscall/sysret`, but `int $0x80` is
just easier to get going and requires way less setup. Maybe in the future the ABI will move towards
`syscall/sysret`.

`int $0x80` is not ideal, because it's a software interrupt and these come with a lot of interrupt overhead.
Intel had tried to solve this before with `sysenter/sysexit`, but they've fallen out of fasion due to complexity.

For purposes of a silly hobby OS project, `int $0x80` is completely fine. We don't need to have world's best
performance (yet ;) ).

=== "Hello world" and the `debugprint()` syscall

Now that we have our first application, which can quit at a blazingly fast speed, let's try to print something.
For now, we're not going to discuss IPC and pipes, because that's a little complex.

The `debugprint()` syscall came about as the first syscall ever (it even has an ID of 1) and it was used for
printing way before pipes were added into the kernel. It's still useful for debugging purposes, when we want to
literally just print a string and not go through the entire pipeline of printf-style formatting and only then
writing something to a pipe.

.Usage of `debugprint()` in AMD64 assembly
[source,asm]
----
.section .data

STRING:
  .string "Hello world!!!"
STRING_LEN:
  .quad . - STRING

.section .text

.global _start
_start: 
  movq $1, %rax     // select debugprint()
  lea STRING(%rip),     %rdi    // load STRING
  lea STRING_LEN(%rip), %rsi    // load STRING_LEN
  int $0x80

  // quit
  movq $17, %rax
  movq $-1, %rdi
  int $0x80
----

Why are we using `lea` to load stuff? Why not `movq`? Because we can't...

We can't just `movq`, because the kernel doesn't support relocatable code - everything is loaded at a fixed
address in a process' address space. Because of this, we have to address everything relatively to `%rip` 
(the instruction pointer). We're essentially writing position independent code (PIC) by hand. This is what
the `-fPIC` GCC flag does, BTW.

== Getting into C and some bits of `ulib`

Now that we've gone overm how to write some (very) basic programs in assembly, let's try to untangle, how we get
into C code and understand some portions of `ulib` - the userspace programming library.

This code snippet should be understandable by now:
._start.S
[source,asm]
----
.extern _premain

.global _start
_start:
  call _premain
----

Here `_premain()` is a C startup function that gets executed before running `main()`. `_premain()` is also
responsible for quitting the application.

._premain.c
[source,c]
----
// Headers skipped.

extern void main(void);
extern uint8_t _bss_start[];
extern uint8_t _bss_end[];

void clearbss(void) {
  uint8_t *p = _bss_start;
  while (p < _bss_end) {
    *p++ = 0;
  }
}

#define MAX_ARGS 25
static char *_args[MAX_ARGS];

size_t _argslen;

char **args(void) {
  return (char **)_args;
}

size_t argslen(void) {
  return _argslen;
}

// ulib initialization goes here
void _premain(void) {
  clearbss();

  for (size_t i = 0; i < ARRLEN(_args); i++) {
    _args[i] = umalloc(PROC_ARG_MAX);
  }

  proc_argv(-1, &_argslen, _args, MAX_ARGS);

  main();
  proc_kill(proc_getpid());
}
----

First, in order to load our C application without UB from the get go, we need to clear the `BSS` section of an
ELF file (which MOP2 uses as it's executable format). We use `_bss_start` and `_bss_end` symbols for that, which
come from a linker script defined for user apps:

.link.ld - linker script for user apps
[source]
----
ENTRY(_start)

SECTIONS {
  . = 0x400000;

  .text ALIGN(4K):
  {
    *(.text .text*)
  }
  
  .rodata (READONLY): ALIGN(4K)
  {
    *(.rodata .rodata*)
  }

  .data ALIGN(4K):
  {
    *(.data .data*)
  }

  .bss ALIGN(4K):
  {
    _bss_start = .;
    *(.bss .bss*)
    . = ALIGN(4K);
    _bss_end = .;
  }
}
----

After that, we need to collect our application's commandline arguments (like argc and argv in UNIX-derived
systems). To do that we use a `proc_argv()` syscall, which fills out a preallocated memory buffer with. The main
limitation of this approach is that the caller must ensure that enough space withing the buffer was allocated.
25 arguments is enough for pretty much all appliations on this system, but this is something that may be a little
problematic in the future.

After we've exited from `main()`, we just gracefully exit the application.