aboba/blog/the-making-of-aboba.md

# The making of aboba (this website)

In this article I'd like to present to you the internals of this website,
how the code is architectured and some cool tricks that are used throughout the project.

## Our "engine"

![the engine](/etc/tmoa-engine.jpg)

This image is a joke, obviously.

The "engine" here is a web server that we're going to be using. I've decided to pick mongoose (-> [their page](https://mongoose.ws)),
partly because I didn't know about/couldn't find other solutions, but I'm really happy with my pick. The only "downside" here is that
mongoose is not http-specific, but also has websockets, MQTT, SNTP and even RPC. While that's really cool, I only need http and not
much else. I haven't dove deeper into mongoose, but I'd be cool if they provided some \`#ifdef\`s to just disable these protocols
(ie. strip down the code that implements them). That way I could make mongoose even more lightweight and only use the features that I need.

Here's roughly how we work with mongoose. Refer to their documentation for more context.

Let's start with \`main()\`
\`\`\`
volatile bool alive = true;

void graceful_shutdown(int no) { alive = false; }

int main(int argc, char ** argv)
{
    signal(SIGINT, &graceful_shutdown);

    // skip BS

    mg_log_set(MG_LL_DEBUG);
    struct mg_mgr mgr;
    mg_mgr_init(&mgr);

    // skip BS

    mg_wakeup_init(&mgr); // We need this for multithreading
    mg_http_listen(&mgr, CONFIG_LISTEN_URL, &event_handler, NULL);

    while (alive) {
        mg_mgr_poll(&mgr, 1000);
        // skip BS
    }

    mg_mgr_free(&mgr);

    // skip BS

    return 0;
}
\`\`\`

As you can see it's quite simple to set up mongoose. Here's what the used functions do:
- \`mg_log_set()\` - set the log level. \`MG_LL_DEBUG\` is very verbose, but it's good for
when the application breaks and we have no clue why.
- \`struct mg_mgr\` & \`mg_mgr_init()\` - this is the mongoose "manager". The detailed explaination
can be found [here](https://mongoose.ws/documentation/#2-minute-integration-guide), but it can be
essentially boiled down to "overall state of the web server".
- \`mg_wakeup_init()\` - this is needed to make our application multithreaded. In the docs it says
that it's used to "initialize the *wakeup scheme*". This basically means that we can now talk between
multiple threads using \`mg_wakeup()\`, which is the only thread-safe function provided by mongoose.
- \`mg_mgr_poll()\` - handle the next conection if there's any incoming data to work with. We can
also specify the timeout for a connection. Here we provide 1 second (1000 ms).

That's all you really need to know to get started with mongoose. Let's get to the \`event_handler()\` now.

\`\`\`
void event_handler(struct mg_connection *conn, int ev, void *ev_data)
{
    if (ev == MG_EV_HTTP_MSG) {
        // Run handler in a new thread
    } else if (ev == MG_EV_WAKEUP) {
        // We've woken up from a handler by mg_wakeup(). Send the reply back to the client
    }
}
\`\`\`

I've removed a lot of code here, because it's irrelevant at the current point. This allowes us to look at
the simplified image of the \`event_handler()\` function.

Let's stop to talk about the parameters for a second.

- \`struct mg_connection *conn\` - the structure that describes the incoming connection. We will also
use it to send back our reply.
- \`int ev\` - this is the event enumeration. Basically tells us what event we're currently handling
inside of mongoose's event loop.
- \`void *ev_data\` - additional event data. The value of this parameter differs based on the value of \`int ev\`.
More on that a little bit later.

What goes on inside the \`MG_EV_HTTP_MSG\` branch?

\`\`\`
if (ev == MG_EV_HTTP_MSG) {
    struct mg_http_message *msg = (struct mg_http_message *)ev_data;

    Route_Thread_Data *data = calloc(1, sizeof(*data));
    data->message = mg_strdup(msg->message);
    data->conn_id = conn->id;
    data->mgr = conn->mgr;
    run_in_thread(&route_thread_function, data);
\`\`\`

If we have an "HTTP Message" event incoming, ev_data is a pointer to \`struct mg_http_message\`.
This structure contains things like the message body, query parameters, the uri, the method and so on. Here we
duplicate the \`message\` field, which encompasses the entire HTTP message. We also save the reference to the
mongoose manager for later use when we will want to wake up from a thread.

Here's how the thread is spawned. This code is taken from the mongoose tutorial: https://github.com/cesanta/mongoose/blob/master/tutorials/core/multi-threaded/main.c#L11

\`\`\`
void run_in_thread(void *(*f)(void *), void *p)
{
    pthread_t tid = 0;
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
    pthread_create(&tid, &attr, f, p);
    pthread_attr_destroy(&attr);
}
\`\`\`

And then here's what goes on inside a thread:

\`\`\`
void *route_thread_function(void *param)
{
    Route_Thread_Data *data = (Route_Thread_Data *)param;

    struct mg_http_message http_msg = {0};
    int r = mg_http_parse(data->message.buf, data->message.len, &http_msg);
    if (r <= 0) {
        // Unparsable HTTP request
    }

    if (mg_match(http_msg.uri, mg_str("/etc/*"), nil)) {
        // Request for static resource
    }

    // Request for a dynamic page
}
\`\`\`

This is quite simple, so there's not much to explain here. One cool thing I'd like to mention is the
\`mg_match()\` function. Normally I'd have to implement uri string matching myself, but I'm glad that
this functionality comes with mongoose built-in. God bless you mongoose!

## Dynamic pages and static assets.

![the assets](/etc/tmoa-garbage.jpg)

Let's stop here now and talk about something different entirely (dont' worry, we'll come back later).
I'd like to show you how assets/pages are implemented in aboba, so we get the whole picture.

In most applications you typically have a distribution model that goes like this: you have your application's
binary ("aboba" or "aboba.exe" or "aboba.bin"...) and then you have an "assets"/"resources" directory, placed
somewhere within the filesystem. There are now 2 ways to go about this. You can have a fixed path to the assets
directory or let the user configure it based on how they've installed the application. In the first case the
app could require that it's assets must be located in \`/usr/share/my-app/assets\` and otherwise it will bail
out with an error "Err: Could not find assets directory blah blah" or something. In the other case the application
would require that the user configures the path to the assets directory themselves, which is not a bad solution
(it may sound like it at first). This really depends what audience are we targetting. "Are our users tech-savvy
enough to do it themselves?" is the question we'd have to ask ourselves.

**Enter embedded assets**

The downsides of both solutions is that we still have to distribute our application with an assets directory. We
have to manage 1 executable file + an entire directory. The *obvious* solution would be to just create a
windows-style setup wizard or a \`make install\` or some other install script thingy. This problem is easily
solvable by literally compiling the bytes of the assets into our program. That way we can distribute only the
executable since it's self contained and doesn't need to reach out to the filesystem at runtime to get it's resources.
This technique is fairly old and the most notable uses I can think of are: [XPM Format](https://en.wikipedia.org/wiki/X_PixMap),
[Windows resources](https://en.wikipedia.org/wiki/Resource_(Windows)) and raylib's [rres](https://github.com/raysan5/rres).
Even C23 has acknowledged the significance of embedded assets (-> [article](https://thephd.dev/finally-embed-in-c23)).

Of course there are some pitfalls of this approach. What if the files we're trying to embed are too *thicc*? We obviously
don't want to end up with a 5GB executable. Imagine the OS trying to load that program into the ram. That would just eat up 5
gigs before anything meaningful ever happens. This pattern of asset distribution is the most suitable for embedded applications
where there's no underlying filesystem to work with, games which are not too large, GUI apps that need some icons and fonts.

So how does aboba go about embedding it's assets? [incbin](https://github.com/graphitemaster/incbin) comes to the rescue!

Here are external declarations for our resources
\`\`\`
#include "incbin/incbin.h"

INCBIN_EXTERN(gpp1);

INCBIN_EXTERN(home_html);
INCBIN_EXTERN(page_missing_html);
INCBIN_EXTERN(template_blog_html);
INCBIN_EXTERN(blog_html);

INCBIN_EXTERN(simple_css);
INCBIN_EXTERN(favicon_ico);
#if MY_DEBUG
INCBIN_EXTERN(hotreload_js);
#endif
INCBIN_EXTERN(me_jpg);
INCBIN_EXTERN(tmoa_engine_jpg);

INCBIN_EXTERN(blog_welcome_md);
INCBIN_EXTERN(blog_weird_page_md);
INCBIN_EXTERN(blog_curious_case_of_gebs_md);
INCBIN_EXTERN(blog_the_making_of_aboba_md);
\`\`\`

And here's where the actual inclusion happens:

\`\`\`
INCBIN(gpp1, "./gpp1");

INCBIN(home_html, "./tmpls/home.html");
INCBIN(page_missing_html, "./tmpls/page-missing.html");
INCBIN(template_blog_html, "./tmpls/template-blog.html");
INCBIN(blog_html, "./tmpls/blog.html");

INCBIN(simple_css, "./etc/simple.css");
INCBIN(favicon_ico, "./etc/favicon.ico");
#if MY_DEBUG
INCBIN(hotreload_js, "./etc/hotreload.js");
#endif
INCBIN(me_jpg, "./etc/me.jpg");
INCBIN(tmoa_engine_jpg, "./etc/tmoa-engine.jpg");
INCBIN(tmoa_garbage.jpg, "./etc/tmoa-garbage.jpg");

INCBIN(blog_welcome_md, "./blog/welcome.md");
INCBIN(blog_weird_page_md, "./blog/weird-page.md");
INCBIN(blog_curious_case_of_gebs_md, "./blog/curious-case-of-gebs.md");
INCBIN(blog_the_making_of_aboba_md, "./blog/the-making-of-aboba.md");
\`\`\`

As you can see, with incbin embedding assets is extremely easy. In the past I've worked with my
own asset packer utility \`x2h.c\`, but it was kind of sloppy and I don't have the will to
rewrite it (maybe one day ;) ).

Incbin works (on GCC, which I'm using here) by calling into the assembler using \`__asm__()\` and then uses \`.incbin\`
directive to include the binary file. More on that can be found in the docs: https://sourceware.org/binutils/docs/as/Incbin.html

Now let the **trickery** begin...

## Tangent about templating

Don't worry, in this section I'll briefly explain how templating is implemented, so that we can circle back to our embedded assets system to better understand it.

How are templating engines implemented in general? You usually have a pseudo-HTML page (for eg. \`my-page.tmpl\`), which
contains plain old HTML, but also has some escaped blocks that can embed dynamic variables, loop over lists, include other pages
and whatnot. You then take that pseudo-page and run it through some kind of a (keyword) *preprocessor*, which given an environment,
can expand variable *macros*, unfold loops and other tricks.

Do you already see where I'm going with my wording?

I want to keep this a *mostly* a pure C project, where every bit of it that can be written in C, is written in C. Yes, even the templates
(*in a way*). In order to expand the page templates I've decided to use GPP or the General PreProcessor written by Denis Auroux and maintained
by Tristan Miller. Here's a link to GPP's website (-> [click](https://logological.org/gpp)). Why not use regular CPP that comes with the GCC
suite, you may ask. This is because GPP has a special mode - the HTML mode, which makes it so that it's better suited for working with HTML.
For example \`\#define\` vs. \`<\#define>\` or \`\#ifdef\` and \`\#endif\` vs. \`<\#ifdef>\` and \`<\#endif>\`.

How do we interact with GPP then? To preprocess our templates, we can call GPP via it's command line interface (CLI) - set HTML mode, give path
to the template file and collect the output, which we then can send back to the client.

But wait, if we call out to an external executable, doesn't that mean that we'll have to ship GPP alongside aboba? The answer is yes! Since
we follow the *principle of a single-executable-deployment*, why don't we pack GPP into our binary, just like any other asset? Another question
arises then - how do we call a packed program and how to we work with it's CLI? I'll answer this in the next part...

## Back to the assets

So now that we've established why and how we embed assets into an executable and the way we work with templates, we can now discuss the
**embedded assets system**.

Here's the core API of the system:

\`\`\`
typedef struct {
    char *key;  // path
    int value;     // memfd
} Baked_Resource;

void init_baked_resources(void);
void free_baked_resources(void);
// skip BS
bool get_baked_resource_path(char *key, char *buf, size_t size);
// skip BS
\`\`\`

The \`Baked_Resource\` struct is defined in such a way that works with \`stb_ds.h\`'s string hashmap. stb_ds.h can be found here: https://nothings.org/stb_ds/.
Let's take a closer look at the fields:
- \`key\` - a key to a file within our hashmap of baked resources
- \`value\` - a memfd associated with the baked resource

Here's the initialization and deinitialization of baked resources:

\`\`\`

void add_baked_resource(char *key, const uchar *data, size_t size)
{
    int fd = memfd_create(key, 0);
    if (fd < 0) {
        LOGE("Could not create resource %s. Aborting...", key);
        abort();
    }
    write(fd, data, size);
    shput(baked_resources.value, key, fd);
}

void init_baked_resources(void)
{
    lockx(&baked_resources);
    add_baked_resource("home.html", home_html_data, home_html_size);
    add_baked_resource("page-missing.html", page_missing_html_data, page_missing_html_size);
    add_baked_resource("template-blog.html", template_blog_html_data, template_blog_html_size);
    add_baked_resource("blog.html", blog_html_data, blog_html_size);
    add_baked_resource("gpp1", gpp1_data, gpp1_size);
    add_baked_resource("simple.css", simple_css_data, simple_css_size);
    add_baked_resource("favicon.ico", favicon_ico_data, favicon_ico_size);
#if MY_DEBUG
    add_baked_resource("hotreload.js", hotreload_js_data, hotreload_js_size);
#endif
    add_baked_resource("me.jpg", me_jpg_data, me_jpg_size);
    add_baked_resource("tmoa-engine.jpg", tmoa_engine_jpg_data, tmoa_engine_jpg_size);
    add_baked_resource("tmoa-garbage.jpg", tmoa_garbage_jpg_data, tmoa_garbage_jpg_size);
    add_baked_resource("blog-welcome.md", blog_welcome_md_data, blog_welcome_md_size);
    add_baked_resource("blog-weird-page.md", blog_weird_page_md_data, blog_weird_page_md_size);
    add_baked_resource("blog-curious-case-of-gebs.md", blog_curious_case_of_gebs_md_data, blog_curious_case_of_gebs_md_size);
    add_baked_resource("blog-the-making-of-aboba.md", blog_the_making_of_aboba_md_data, blog_the_making_of_aboba_md_size);
    unlockx(&baked_resources);
}

void free_baked_resources(void)
{
    lockx(&baked_resources);
    for (size_t i = 0; i < shlen(baked_resources.value); i++) {
        close(baked_resources.value[i].value);
    }
    shfree(baked_resources.value);
    unlockx(&baked_resources);
}
\`\`\`

Here we use memfd API to convert a baked-in file into a file that has a file descriptor associated with it. Why? We do this, because
we have no way of passing files down to GPP's CLI. We only have the raw bytes of a file, which we can't really work with. memfds
allow us to create a virtual memory-mapped file and get it's file descriptor. Using said file descriptor we can then write our file's
bytes into the virtual file, making it accessible via Linux's VFS. The virtual file can be accessed via a path like \`/proc/<PID>/fd/<THE FILE>\`.
Now that we've successfully converted a baked-in file into a "*pathed*" file, we can then pass the path down to GPP. Heck, we can even run
GPP itself from a memory-mapped file!

Here's how we get the memory-mapped file's path in aboba:

\`\`\`
bool get_baked_resource_path(char *key, char *buf, size_t size)
{
    if (shgeti(baked_resources.value, key) != -1) {
        int fd = shget(baked_resources.value, key);
        snprintf(buf, size, "/proc/%d/fd/%d", getpid(), fd);
        unlockx(&baked_resources);
        return true;
    }
    return false;
}
\`\`\`

And then we run GPP like so:

\`\`\`
bool gpp_run(char *path, NString_List *env, String_Builder *out)
{
    Cmd cmd = {0};
    defer { cmd_free(&cmd); }

    char gpp1[PATH_MAX];
    if (!get_baked_resource_path("gpp1", gpp1, sizeof(gpp1))) {
        return false;
    }

    cmd_append(&cmd, gpp1);
    cmd_append(&cmd, "-H");
    cmd_append(&cmd, "-x");
    cmd_append(&cmd, "--nostdinc");
    cmd_append(&cmd, path);

    for (size_t i = 0; i < env->count; i++) {
        cmd_append(&cmd, env->items[i]);
    }

    return cmd_run_collect(&cmd, out) == 0;
}
\`\`\`

In the logger we can now see commands like this: \`Info: cmd /proc/1210675/fd/8 -H -x --nostdinc /proc/1210675/fd/6 ...\`, where
/proc/1210675/fd/8 is memory-mapped file for GPP and /proc/1210675/fd/6 is a memory-mapped file for the template. Pretty cool, eh?

## Left out topics

I think this is out of scope of this article, so I'm not going to talk about it here, but a big part of this project was making
live hotreloading. I can basically edit the website inside of my editor and it auto-refreshes in the browser, kinda like a vite js
project. A video of this can be found here: https://www.reddit.com/r/C_Programming/comments/1lbzjvi/webdev_in_c_pt2_true_live_hotreloading_no_more/

## Summary

During this project I've learned a lot about Linux, the memfd API and webdev in general. Normally I wouldn't pick up a website project
simply because I'm tired of webdev. I've been through a webdev phase and it sucked. Using JavaScript, 300MB of node_modules for a bare
react hello world project, npm installing countless slop libraries and so on.
Then on the backend you have the same amount of C# or Java or TypeScript slop, but now you can call yourself a *backend engineer* or
whatever the hell. Writing this website in C put me on a different view of webdev and made it actually fun to write.

You can go check out the code for aboba @ http://git.kamkow1lair.pl/kamkow1/aboba.git.