> Blame UNIX for that, and the fork() system call. At least that design failure ...

drougge · on April 5, 2023

I'm someone who likes to use fork() and then actually use both processes as they are, with shared copy-on-write memory. I'm happy to use it on things consuming much more than 100MB of memory. In fact that's where I like it the most. I'm probably a terrible person.

But what would be better? This way I can massage my data in one process, and then fork as many other processes that use this data as I like without having to serialise it to disk and and then load it again. If the data is not modified after fork it consumes much less memory (only the page tables). Usually a little is modified, consuming only a little memory extra. If all of it is modified it doesn't consume more memory than I would have otherwise (hopefully, not sure if the Linux implementation still keeps the pre-fork copy around).

(And no, not threads. They would share modifications, which I don't want. Also since I do this in python they would have terrible performance.)

reisse · on April 5, 2023

So if I got it right, you're using fork(2) as a glorified shared memory interface. If my memory is (also) right, you can allocate shared read-only mapping with shm_open(3) + mmap(2) in parent process, and open it as a private copy-on-write mapping in child processes.

jacquesm · on April 5, 2023

No, he's using fork the way it is intended.

Shared memory came much later than fork did.

Asooka · on April 5, 2023

I have used fork as a stupid simple memory arena implementation. fork(); do work in the child; only malloc, never free; exit. It is much, much heavier than a normal memory arena would be, but also much simpler to use. Plus, if you can split the work in independent batches, you can run multiple children at a time in parallel.

As with all such stupid simple mechanisms, I would not advise its use if your program spans more than one .c file and more than a thousand lines.

saagarjha · on April 6, 2023

This isn't advisable in many more contexts than that: for example, your calls to malloc can block indefinitely if locks were held at the time of fork.

panzi · on April 5, 2023

posix_spawn() is great, but Linux doesn't implement it. glibc does based on fork()+exec(). Other Unix(-like) OSes do implement posix_spawn() as system call. Also while you can use posix_spawn() in the vast majority of cases, if it doesn't cover certain process setup options that you need you still have to use fork()+exec(). But yeah, it would be good if Linux had it as a system call. It would probably help PostgreSQL.

the8472 · on April 5, 2023

glibc uses vfork+exec to implement posix_spawn, which makes it much faster than fork+exec.

panzi · on April 6, 2023

Yes, of course. Did remember its not fork(), but some other *fork() and couldn't remember the name. But just the kind of thing it does. Its also not exec(), but probably execvp() or execvpe() or something like that.

seba_dos1 · on April 7, 2023

It's important to note because vfork is exactly what makes it close to posix_spawn (it does not copy the page tables), as opposed to regular fork.

panzi · on April 7, 2023

When using vfork() what are you allowed to do? Can you even do IO redirection (piping)? The man page says:

> ... the behavior is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value from vfork() ...

But the time between fork and exec is exactly where you do a lot of setup, like IO redirection, dropping privileges, setuid(), setting a signal mask (nohup) etc. and I don't think you can do that without setting any variables. You certainly write to the stack when calling a function.

If you can't do these things you can't really use it to implement posix_spawn(). I guess it could use vfork() in the case no actions are required, but only then.

AnIdiotOnTheNet · on April 5, 2023

Can a modern distro really be used without over provisioning? Because the last time I tried it either the DE or display server hard locked immediately and I had to reboot the system.

Having this ridiculous setting as the default has basically ensured that we can never turn it off because developers expect things to work this way. They have no idea what to do if malloc errors on them. They like being able to make 1TB allocs without worrying about the consequences and just letting the kernel shoot processes in the head randomly when it all goes south. Hell, the last time this came up many swore that there was literally nothing a programmer could do in the event of OOM. Learned helplessness.

It's a goddamned mess and like many of Linux's goddamned messes not only are we still dealing with it in 2023, but every effort to do anything about it faces angry ranty backlash.

lxgr · on April 5, 2023

Almost everything in life is overprovisioned, if you think about it: Your ISP, the phone network, hospitals, bank reserves (and deposit insurance)...

What makes the approach uniquely unsuitable for memory management? The entire idea of swapping goes out of the window without overprovisioning as well, for better or worse.

AnIdiotOnTheNet · on April 5, 2023

Perhaps there is some confusion because I used "overprovision" when the appropriate term here is "overcommit", but Windows manages to work fine without unix-style overcommit. I suspect most OSs in history do not use unix's style of overcommit.

> What makes the approach uniquely unsuitable for memory management?

The fact that something like OOM killer even needs to exist. Killing random processes to free up memory you blindly promised but couldn't deliver is not a reasonable way to do things.

Edit: https://lwn.net/Articles/627725/

mrguyorama · on April 5, 2023

What an absurdly whataboutism filled response. Meanwhile Windows has been doing it the correct way for 20 years or more and never has to kill a random process just to keep functioning.

lxgr · on April 5, 2023

So you're saying the correct way to support fork() is to... not support it? This seems pretty wasteful in the majority of scenarios.

For example, it's a common pattern in many languages and frameworks to preload and fully initialize one worker process and then just fork that as often as required. The assumption there is that, while most of the memory is theoretically writable, practically, much of it is written exactly once and can then be shared across all workers. This both saves memory and the time needed to uselessly copy it for every worker instance (or alternatively to re-initialize the worker every single time, which can be costly if many of its data structures are dynamically computed and not just read from disk).

How do you do that without fork()/overprovisioning?

I'm also not sure whether "giving other examples" fits the bill of "whataboutism", as I'm not listing other examples of bad things to detract from a bad thing under discussion – I'm claiming that all of these things are (mostly) good and useful :)

dale_glass · on April 5, 2023

> How do you do that without fork()/overprovisioning?

You use threads. The part that fork() would have kept shared is still shared, the part that would have diverged is allocated inside each thread independently.

And if you find dealing with locking undesirable you can use some sort of message system, like Qt signals to minimize that.

lxgr · on April 6, 2023

> the part that would have diverged is allocated inside each thread independently

That’s exactly my criticism of that approach: It’s conceptually trickier (fork is opt-in for sharing; threads are opt-out/require explicit copying) and requires duplicating all that memory, whether threads end up ever writing to it or not.

Threads have their merits, but so do subprocesses and fork(). Why force developers to use one over the other?

dale_glass · on April 6, 2023

> Threads have their merits, but so do subprocesses and fork(). Why force developers to use one over the other?

I used to agree with you, but fork() seems to have definitely been left behind. It has too many issues.

* fork() is slow. This automatically makes it troublesome for small background tasks.

* passing data is inconvenient. You have to futz around with signals, return codes, socketpair or shared memory. It's a pain to set up. Most of what you want to send is messages, but what UNIX gives you is streams.

* Managing it is annoying. You have to deal with signals, reaping, and doing a bunch of state housekeeping to keep track of what's what. A signal handler behaves like an annoying, really horribly designed thread.

* Stuff leaks across easily. A badly designed child can feed junk into your shared filehandles by some accident.

* It's awful for libraries. If a library wants to use fork() internally that'll easily conflict with your own usage.

* It's not portable. Using fork() automatically makes your stuff UNIX only, even if otherwise nothing stops it from working on Windows.

I think the library one is a big one -- we need concurrency more than ever, but under the fork model different parts of the code that are unaware of each other will step over each other's toes.

lxgr · on April 5, 2023

Unless you're sure that you're going to write to the majority of the copy-on-write memory resulting from fork(), this seems like overkill.

Maybe there should be yet another flavor of fork() that does copy-on-write, but treats the memory as already-copied for physical memory accounting purposes? (Not sure if "copy-on-write but budget as distinct" is actually representable in Linux's or other Unixes' memory model, though.)

magicalhippo · on April 5, 2023

> Maybe there should be yet another flavor of fork() that does copy-on-write, but treats the memory as already-copied for physical memory accounting purposes?

How about a version which copies the pages but marks them read-only in the child process, except for a set of ranges passed to fork (which would be copy-on-write as now). The child process then has to change any read-only pages to copy-on-write (or similar) to modify them.

This allows the OS to double-count and hence deny fork if the range of pages passed to fork leads to out of memory situation. It also allows the OS to deny the child process changing any read-only pages if it would lead to an out of memory situation. Both of those scenarios could be gracefully handled by the processes if they wish.

It would also keep the current positive behavior of the forked process having read access to the parent memory for data structures or similar.

saagarjha · on April 6, 2023

What's the benefit here?