412 lines
13 KiB
Plaintext
412 lines
13 KiB
Plaintext
= Intro to MOP2 programming
|
|
Kamil Kowalczyk
|
|
2025-10-14
|
|
:jbake-type: post
|
|
:jbake-tags: MOP2 osdev
|
|
:jbake-status: published
|
|
|
|
This is an introductory post into MOP2 (my-os-project2) user application programming.
|
|
|
|
All source code (kernel, userspace and other files) are available at https://git.kamkow1lair.pl/kamkow1/my-os-project2.
|
|
|
|
Let's start by doing the most basic thing ever: quitting an application.
|
|
|
|
== AMD64 assembly
|
|
|
|
.Hello program in AMD64 assembly
|
|
[source,asm]
|
|
----
|
|
.section .text
|
|
|
|
.global _start
|
|
_start: // our application's entry point
|
|
movq $17, %rax // select proc_kill() syscall
|
|
movq $-1, %rdi // -1 means "self", so we don't need to call proc_getpid()
|
|
int $0x80 // perform the syscall
|
|
// We are dead!!
|
|
----
|
|
|
|
As you can see, even though we're on AMD64, we use `int $0x80` to perform a syscall.
|
|
|
|
The technically correct and better way would be to implement support for `syscall/sysret`, but `int $0x80` is
|
|
just easier to get going and requires way less setup. Maybe in the future the ABI will move towards
|
|
`syscall/sysret`.
|
|
|
|
`int $0x80` is not ideal, because it's a software interrupt and these come with a lot of interrupt overhead.
|
|
Intel had tried to solve this before with `sysenter/sysexit`, but they've fallen out of fasion due to complexity.
|
|
|
|
For purposes of a silly hobby OS project, `int $0x80` is completely fine. We don't need to have world's best
|
|
performance (yet ;) ).
|
|
|
|
=== "Hello world" and the `debugprint()` syscall
|
|
|
|
Now that we have our first application, which can quit at a blazingly fast speed, let's try to print something.
|
|
For now, we're not going to discuss IPC and pipes, because that's a little complex.
|
|
|
|
The `debugprint()` syscall came about as the first syscall ever (it even has an ID of 1) and it was used for
|
|
printing way before pipes were added into the kernel. It's still useful for debugging purposes, when we want to
|
|
literally just print a string and not go through the entire pipeline of printf-style formatting and only then
|
|
writing something to a pipe.
|
|
|
|
.Usage of `debugprint()` in AMD64 assembly
|
|
[source,asm]
|
|
----
|
|
.section .data
|
|
|
|
STRING:
|
|
.string "Hello world!!!"
|
|
|
|
.section .text
|
|
|
|
.global _start
|
|
_start:
|
|
movq $1, %rax // select debugprint()
|
|
lea STRING(%rip), %rdi // load STRING
|
|
int $0x80
|
|
|
|
// quit
|
|
movq $17, %rax
|
|
movq $-1, %rdi
|
|
int $0x80
|
|
----
|
|
|
|
Why are we using `lea` to load stuff? Why not `movq`? Because we can't...
|
|
|
|
We can't just `movq`, because the kernel doesn't support relocatable code - everything is loaded at a fixed
|
|
address in a process' address space. Because of this, we have to address everything relatively to `%rip`
|
|
(the instruction pointer). We're essentially writing position independent code (PIC) by hand. This is what
|
|
the `-fPIC` GCC flag does, BTW.
|
|
|
|
== Getting into C and some bits of `ulib`
|
|
|
|
Now that we've gone overm how to write some (very) basic programs in assembly, let's try to untangle, how we get
|
|
into C code and understand some portions of `ulib` - the userspace programming library.
|
|
|
|
This code snippet should be understandable by now:
|
|
._start.S
|
|
[source,asm]
|
|
----
|
|
.extern _premain
|
|
|
|
.global _start
|
|
_start:
|
|
call _premain
|
|
----
|
|
|
|
Here `_premain()` is a C startup function that gets executed before running `main()`. `_premain()` is also
|
|
responsible for quitting the application.
|
|
|
|
._premain.c
|
|
[source,c]
|
|
----
|
|
// Headers skipped.
|
|
|
|
extern void main(void);
|
|
extern uint8_t _bss_start[];
|
|
extern uint8_t _bss_end[];
|
|
|
|
void clearbss(void) {
|
|
uint8_t *p = _bss_start;
|
|
while (p < _bss_end) {
|
|
*p++ = 0;
|
|
}
|
|
}
|
|
|
|
#define MAX_ARGS 25
|
|
static char *_args[MAX_ARGS];
|
|
|
|
size_t _argslen;
|
|
|
|
char **args(void) {
|
|
return (char **)_args;
|
|
}
|
|
|
|
size_t argslen(void) {
|
|
return _argslen;
|
|
}
|
|
|
|
// ulib initialization goes here
|
|
void _premain(void) {
|
|
clearbss();
|
|
|
|
for (size_t i = 0; i < ARRLEN(_args); i++) {
|
|
_args[i] = umalloc(PROC_ARG_MAX);
|
|
}
|
|
|
|
proc_argv(-1, &_argslen, _args, MAX_ARGS);
|
|
|
|
main();
|
|
proc_kill(proc_getpid());
|
|
}
|
|
----
|
|
|
|
First, in order to load our C application without UB from the get go, we need to clear the `BSS` section of an
|
|
ELF file (which MOP2 uses as it's executable format). We use `_bss_start` and `_bss_end` symbols for that, which
|
|
come from a linker script defined for user apps:
|
|
|
|
.link.ld - linker script for user apps
|
|
[source]
|
|
----
|
|
ENTRY(_start)
|
|
|
|
SECTIONS {
|
|
. = 0x400000;
|
|
|
|
.text ALIGN(4K):
|
|
{
|
|
*(.text .text*)
|
|
}
|
|
|
|
.rodata (READONLY): ALIGN(4K)
|
|
{
|
|
*(.rodata .rodata*)
|
|
}
|
|
|
|
.data ALIGN(4K):
|
|
{
|
|
*(.data .data*)
|
|
}
|
|
|
|
.bss ALIGN(4K):
|
|
{
|
|
_bss_start = .;
|
|
*(.bss .bss*)
|
|
. = ALIGN(4K);
|
|
_bss_end = .;
|
|
}
|
|
}
|
|
----
|
|
|
|
After that, we need to collect our application's commandline arguments (like argc and argv in UNIX-derived
|
|
systems). To do that we use a `proc_argv()` syscall, which fills out a preallocated memory buffer with. The main
|
|
limitation of this approach is that the caller must ensure that enough space withing the buffer was allocated.
|
|
25 arguments is enough for pretty much all appliations on this system, but this is something that may be a little
|
|
problematic in the future.
|
|
|
|
After we've exited from `main()`, we just gracefully exit the application.
|
|
|
|
=== "Hello world" but from C this time
|
|
|
|
Now we can program our applications the "normal"/"human" way. We've gone over printing in assembly using the
|
|
`debugprint()` syscall, so let's now try to use it from C. We'll also try to do some more advanced printing
|
|
with (spoiler) `uprintf()`.
|
|
|
|
.Calling `debugprint()` from C
|
|
[source,c]
|
|
----
|
|
// Import `ulib`
|
|
#include <ulib.h>
|
|
|
|
void main(void) {
|
|
debugprint("hello world");
|
|
}
|
|
----
|
|
|
|
That's it! We've just printed "hello world" to the terminal! How awesome is that?
|
|
|
|
.`uprintf()` and formatted printing
|
|
[source,c]
|
|
----
|
|
#include <ulib.h>
|
|
|
|
void main(void) {
|
|
uprintf("Hello world %d %s %02X\n", 123, "this is a string literal", 0xBE);
|
|
}
|
|
----
|
|
|
|
`uprintf()` is provided by Eyal Rozenberg (eyalroz), which originates from Macro Paland's printf. This printf
|
|
library is super easily portable and doesn't require much in terms of standard C functions and headers. My main
|
|
nitpick and a dealbreaker with other libraries was that they advertise themsevles as "freestanding" or "made for
|
|
embedded" or something along those lines, but in reality they need so much of the C standard library, that you
|
|
migh as well link with musl or glibc and use printf from there. And generally speaking, this is an issue with
|
|
quite a bit of "freestanding" libraries that you can find online ;(.
|
|
|
|
Printf rant over...
|
|
|
|
==== Error codes in MOP2 - a small anecdote
|
|
|
|
You might've noticed is that `main()` looks a little different from standard C `main()`. There's
|
|
no return/error code, because MOP2 simply does not implement such feature. This is because MOP2 doesn't follow the
|
|
UNIX philosophy.
|
|
|
|
The UNIX workflow consists of combining many small/tiny programs into a one big commandline, which transforms text
|
|
into some more text. For eg.:
|
|
|
|
.Example bash command (Linux) to get a name of /proc/meminfo field
|
|
[source,shell]
|
|
----
|
|
cat /proc/meminfo | awk 'NR==20 {print $1}' | rev | cut -c 2- | rev
|
|
----
|
|
|
|
Personally, I dislike this type of workflow. I prefer to have a few programs that perform tasks groupped by topic,
|
|
so for eg. in MOP2, we have `$fs` for working with the filesystem or `$pctl` for working with processes. When we
|
|
approach things the MOP2 way, it turns out error codes are kind of useless (or at least they wouldn't get much
|
|
use), since we don't need to connect many programs together to get something done.
|
|
|
|
=== Printing under the hood - intro to pipes
|
|
|
|
Let's take a look into what calling `uprintf()` actually does to print the characters post formatting. The printf
|
|
library requires the user to define a `putchar_()` function, which is used to render a single character.
|
|
Personally, I think that this way of printing text is inefficient and it would be better to output and entire
|
|
buffer of memory, but oh well.
|
|
|
|
.putchar.c
|
|
[source,c]
|
|
----
|
|
#include <stdint.h>
|
|
#include <system/system.h>
|
|
#include <printf/printf.h>
|
|
|
|
void putchar_(char c) {
|
|
ipc_pipewrite(-1, 0, (uint8_t *const)&c, 1);
|
|
}
|
|
----
|
|
|
|
To output a single character we write it into a pipe. -1 means that the pipe belongs to the calling process, 0
|
|
is an ID into a table of process' pipes - and 0 means percisely the output pipe. In UNIX, the standard pipes are
|
|
numbered as 0 = stdin, 1 = stdout and 2 = stderr. In MOP2 there's no stderr, everything the application outputs
|
|
goes into the out pipe (0), so we can just drop that entirely. We're left with stdin/in pipe and stdout/out pipe,
|
|
but I've decided to swap them around, because the out pipe is used more frequently and it made sense to get it
|
|
working first and only then worry about getting input.
|
|
|
|
== Pipes
|
|
|
|
MOP2 pipes are a lot like UNIX pipes - they're a bidirectional stream of data, but there's slight difference in
|
|
the interface. Let's take a look at what ulib defines:
|
|
|
|
.Definitions for ipc_pipeXXX() calls
|
|
[source,c]
|
|
----
|
|
int32_t ipc_piperead(PID_t pid, uint64_t pipenum, uint8_t *const buffer, size_t len);
|
|
int32_t ipc_pipewrite(PID_t pid, uint64_t pipenum, const uint8_t *buffer, size_t len);
|
|
int32_t ipc_pipemake(uint64_t pipenum);
|
|
int32_t ipc_pipedelete(uint64_t pipenum);
|
|
int32_t ipc_pipeconnect(PID_t pid1, uint64_t pipenum1, PID_t pid2, uint64_t pipenum2);
|
|
----
|
|
|
|
In UNIX you have 2 processes working with a single pipe, but in MOP2, a pipe is exposed to the outside world and
|
|
anyone can read and write to it, which explains why these calls require a PID to be provided (indicates the
|
|
owner of the pipe).
|
|
|
|
.Example of ipc_piperead() - reading your applications own input stream
|
|
[source,c]
|
|
----
|
|
#include <stddef.h>
|
|
#include <stdint.h>
|
|
#include <ulib.h>
|
|
|
|
void main(void) {
|
|
PID_t pid = proc_getpid();
|
|
|
|
#define INPUT_LINE_MAX 1024
|
|
|
|
for (;;) {
|
|
char buffer[INPUT_LINE_MAX];
|
|
string_memset(buffer, 0, sizeof(buffer));
|
|
int32_t nrd = ipc_piperead(pid, 1, (uint8_t *const)buffer, sizeof(buffer) - 1);
|
|
if (nrd > 0) {
|
|
uprintf("Got something: %s\n", buffer);
|
|
}
|
|
}
|
|
}
|
|
----
|
|
|
|
`ipc_pipewrite()` is a little boring, so let's not go over it. Creating, deleting and connecting pipes is where
|
|
things get interesting.
|
|
|
|
A common issue, I've encountered, while programming in userspace for MOP2 is that I'd want to spawn some external
|
|
application and collect it's output, for eg. into an ulib `StringBuffer` or some other akin structure. The
|
|
obvious thing to do would be to (since everything is polling-based) spawn an application, poll it's state (not
|
|
PROC_DEAD) and while polling, read it's out pipe (0) and save it into a stringbuffer. The code to do this would
|
|
look something like this:
|
|
|
|
.Pipe lifetime problem illustration
|
|
[source,c]
|
|
----
|
|
#include <stddef.h>
|
|
#include <stdint.h>
|
|
#include <ulib.h>
|
|
|
|
void main(void) {
|
|
StringBuffer outsbuf;
|
|
stringbuffer_init(&outsbuf);
|
|
|
|
char *appargs = { "-saystring", "hello world" };
|
|
int32_t myapp = proc_spawn("base:/bin/myapp", appargs, ARRLEN(appargs));
|
|
|
|
proc_run(myapp);
|
|
|
|
// 4 == PROC_DEAD
|
|
while (proc_pollstate(myapp) != 4) {
|
|
int32_t r;
|
|
char buf[100];
|
|
string_memset(buf, 0, sizeof(buf));
|
|
|
|
r = ipc_piperead(myapp, 0, (uint8_t *const)buf, sizeof(buf) - 1);
|
|
if (r > 0) {
|
|
stringbuffer_appendcstr(&outsbuf, buf);
|
|
}
|
|
}
|
|
|
|
// print entire output
|
|
uprintf("%.*s\n", (int)outsbuf.count, outsbuf.data);
|
|
|
|
stringbuffer_free(&outsbuf);
|
|
}
|
|
----
|
|
|
|
Can you spot the BIG BUG? What if the application dies before we manage to read data from the pipe, taking the pipe
|
|
down with itself? We're then stuck in this weird state of having incomplete data and the app being reported as
|
|
dead by proc_pollstate.
|
|
|
|
This can be easily solved by changing the lifetime of the pipe we're working with. The *parent* process shall
|
|
allocate a pipe, connect it to it's *child* process and make it so that a child is writing into a pipe managed by
|
|
it's parent.
|
|
|
|
.Pipe lifetime problem - the solution
|
|
[source,c]
|
|
----
|
|
#include <stddef.h>
|
|
#include <stdint.h>
|
|
#include <ulib.h>
|
|
|
|
void main(void) {
|
|
PID_t pid = proc_getpid();
|
|
|
|
StringBuffer outsbuf;
|
|
stringbuffer_init(&outsbuf);
|
|
|
|
char *appargs = { "-saystring", "hello world" };
|
|
int32_t myapp = proc_spawn("base:/bin/myapp", appargs, ARRLEN(appargs));
|
|
|
|
// take a free pipe slot. 0 and 1 are already taken by default
|
|
ipc_pipemake(10);
|
|
// connect pipes
|
|
// myapp's out (0) pipe --> pid's 10th pipe
|
|
ipc_pipeconnect(myapp, 0, pid, 10);
|
|
|
|
proc_run(myapp);
|
|
|
|
// 4 == PROC_DEAD
|
|
while (proc_pollstate(myapp) != 4) {
|
|
int32_t r;
|
|
char buf[100];
|
|
string_memset(buf, 0, sizeof(buf));
|
|
|
|
r = ipc_piperead(myapp, 0, (uint8_t *const)buf, sizeof(buf) - 1);
|
|
if (r > 0) {
|
|
stringbuffer_appendcstr(&outsbuf, buf);
|
|
}
|
|
}
|
|
|
|
// print entire output
|
|
uprintf("%.*s\n", (int)outsbuf.count, outsbuf.data);
|
|
|
|
ipc_pipedelete(10);
|
|
|
|
stringbuffer_free(&outsbuf);
|
|
}
|
|
----
|
|
|
|
Now, since the parent is managing the pipe and it outlives the child, everything is safe.
|