One exercise on pwn.college in their kernel security module is they have a progr...

CodesInChaos · on April 5, 2023

By default memory is wiped before it's handed out again, not when it's freed. This improves performance, but means secrets can remain in RAM for longer than necessary, where they can be accessed by privileged attackers (software running as root, DMA without MMU, or hardware attacks). For unprivileged processes eager and lazy zeroing look the same.

seri4l · on April 5, 2023

Apparently there's a kernel config flag to zero the memory on free (CONFIG_INIT_ON_FREE_DEFAULT_ON) but it has a quite expensive performance cost (3-5% according to the docs). I wonder in what kind of scenario it would make sense to enable it.

bhawks · on April 5, 2023

You want to enable this if your concerned about forensic attacks. A simple example would be someone has physical access to your device. They're able to power it down, and boot it with their own custom kernel. If the memory has not been eagerly zeroed they may be able to extract from RAM sensitive data.

This flag puts an additional obstacle in the attacker's path. If you have private key material protecting valuable property, you definitely want to throw up as many roadblocks as possible.

l33t233372 · on April 5, 2023

I don’t understand why this would help prevent cold boot attacks.

Wouldn’t the memory need to bee free’d first for this to have any effect?

bhawks · on April 5, 2023

Yes it would - either through the free syscall or a process exit. This is a defense in depth strategy and not 100% perfect. If you yanked the power cord and a long lived process had sensitive data in memory you're still vulnerable. But if you had a clean power down or very short lifetimes of sensitive data being active in RAM it would afford you additional security.

WalterBright · on April 5, 2023

?? Cutting the power means the RAM contents vanish.

SllX · on April 5, 2023

They vanish eventually which is usually measured in seconds. This can be extended to minutes or hours if someone performs a cold boot attack: https://security.stackexchange.com/questions/10643/recover-t...

l33t233372 · on April 5, 2023

I find that phrasing weird.

A cold boot attack relies on a cold boot of the system to evade kernel protections(as opposed to a warm boot where the kernel can zero memory.)

The name has nothing to do with reducing the temperature of the ram to extend the time it takes bytes to vanish in ram.

SllX · on April 5, 2023

I think it’s a little bit of column A and a little bit of column B, but admit while I remember reading about using technique a long time ago, I’m not sure of the history of the nomenclature. From the StackExchange:

> For those who think this is only theoretical: They were able to use this technique to create a bootable USB device which could determine someone's Truecrypt hard-drive encryption key automatically, just by plugging it in and restarting the computer. They were also able to recover the memory-contents 30 minutes+ later by freezing the ram (using a simple bottle of canned-air) and removing it. Using liquid nitrogen increased this time to hours.

l33t233372 · on April 5, 2023

Reducing the temperature of the RAM can be done to make a cold boot attack easier, but it’s not the origin of the name.

For more details, see the paper Lest We Remember.

SllX · on April 6, 2023

Thanks, TIL! I'll check it out.

WalterBright · on April 5, 2023

i didn't know that. Thanks!

ender341341 · on April 5, 2023

the idea is that well written software would release memory as soon as possible so with it enabled you'd have the secret in memory for as little time as possible.

Though in my mind well written software should be zeroing the memory out before freeing if it held sensitive data.

jamesfinlayson · on April 6, 2023

Yes I thought that was common practice - I remember reading patch notes for something years ago where the program had been updated to always zero the password buffer after checking that it matches (I think in some cases it was kept around for a bit).

bhawks · on April 6, 2023

From a defense in depth perspective you definitely want the implementation to be robust (zero the memory after reading). However you should also consider:

1) abnormal program termination due to signals, memory pressure/oom killer, aborts in other threads serving different requests, and so on. These events could race with the memory zeroing.

2) bugs in the implementation where memory isn't zeroed in all paths

3) interactions between compiling, standard libraries, language runtimes and optimization passes causing memory zeroing to be skipped.

All these cases have happened time and time again in the wild. Hence having additional safety nets is useful.

These patches were endorsed by folks working on chromeos and Android security. I would suppose that they want them to put additional safeguards behind full disk encryption keys and may also be concerned with quality of implementation issues in 3rd party or vendor blobs.

Wowfunhappy · on April 5, 2023

How is the attacker powering down the device while retaining the contents of its RAM?

l33t233372 · on April 5, 2023

Perhaps by using a can of compressed air[0].

[0] https://www.usenix.org/legacy/event/sec08/tech/full_papers/h...

soulofmischief · on April 5, 2023

If your PC is connected to a power strip, it's my understanding that law enforcement can attach a live male-to-male power cable to the power strip and then remove the power strip from the wall while still powering the computer. That, and yeah freezing ram.

ghostpepper · on April 6, 2023

So technically that's removing power, not "powering it down". I guess you'd then warm-boot with your own kernel and hope that the contents of RAM are mostly untouched?

Gracana · on April 5, 2023

Data fades slowly from DRAM, especially if you freeze it first.

MR4D · on April 5, 2023

So, if it's only 3-5% slower, then for $50-100 I could buy a slightly faster processor and never know the difference?

Just trying to check my understanding of what the 3-5% delta is. Seems like a tiny tradeoff for any workstation (I wouldn't notice the difference at least). The tradeoff for servers might vary depending on what they are doing (shared versus owned, etc)

postalrat · on April 5, 2023

How many thousand tradeoffs like this are you willing to pay for?

soulofmischief · on April 5, 2023

This seems beneficial in systems where security concerns trump performance concerns. The above poster has probably made many such trade-offs already and would likely make more. (Full disk encryption, virtualization, protection rings, spectre mitigations, MMIO, ECC, etc.)

With exponentially increasing processor performance it does make sense for workstations where physical access should be considered in the threat model.

MR4D · on April 5, 2023

Lots.

But then again, I run a few companies that deal with sensitive data. If I were just a gamer, I wouldn't care.

gus_massa · on April 5, 2023

I don't understand why it is slower. It has to be zeroed anyway.

In the normal configuration:

Is it not zeroed if the memory is assigned to the same process???

Is it zeroed when the system is idle???

Is it zeroed in batches that are more memory friendly???

KMag · on April 5, 2023

> I don't understand why it is slower. It has to be zeroed anyway.

Memory pages freed from userspace might be reused in kernelspace.

If, for instance, the memory is re-used in the kernel's page cache, then the kernel doesn't need to zero it out before copying the to-be-cached data into the page.

Edit: I seem to remember back in the 1990s that the kernel at least in some cases wouldn't zero-out pages previously used by the kernel before giving them to userspace, sometimes resulting in kernel secrets being leaked to arbitrary userspace processes. Maybe I'm missremembering, and it was just leakage of secrets between userspace processes. In any case, in the 1990s, Linux was way too lax about leaking data from freed pages.

wongarsu · on April 5, 2023

Two reasons:

- lots of code is written under the assumption that free is fast

- memory is zeroed in the background, unless memory pressure forces the kernel to zero when handing it out

vlovich123 · on April 5, 2023

Yes. It’s zeroed in a low priority background process to avoid interfering with foreground apps.

dathinab · on April 5, 2023

> Is it zeroed when the system is idle???

yes mainly that,

and if the system isn't idle but also doesn't use all phys. memory it might not be zeroed for a very long time

> Is it not zeroed if the memory is assigned to the same process???

idk. what the current state of this is in linux but at least in the past for some systems for some use cases related to mapped memory this was the case

shawabawa3 · on April 5, 2023

My guess is it won't always have to be zeroed

e.g. if your code is doing

    ptr = malloc()
    memcpy(mydata, ptr)

You can presumably optimise out the zeroing of the memory

KMag · on April 5, 2023

As far as I know, the Linux kernel never inspects the userspace thread to adjust behavior based on what the thread is going to do next. This would be a very brittle sort of optimization.

More importantly, it's not safe. Another thread in the same process can see ptr between the malloc and the memcpy!

Edit: also, of course, malloc and memcpy are C runtime functions, not syscalls, so checking what happens after malloc() would require the kernel to have much more sophisticated analysis than just looking a few instructions ahead of the calling thread's %%eip/%%rip. While handling malloc()'s mmap() or brk() allocation, the kernel would need to be able to look one or two call frames up the call stack, past the metadata accounting that malloc is doing to keep track of the newly acquired memory, perhaps look at a few conditional branches, trace through the GOT and PLT entries to see where the memcpy call is actually going, and do so in a way that is robust to changes in the C runtime implementation. (Of course, in practice, most C compilers will inline a memcpy implementation, so in the common case, it wouldn't have to chase the GOT and PLT entries, but even then, it's way too complicated for the kernel to figure out if anything non-trivial is happening between mmap()/brk() and the memory being overwritten.)

Edit 2: To be robust in the completely general case, even if it were trivial to identify the inlined memcpy implementation, and it were clearly defined "something non-trivial happens", determining if "something non-trivial happens" between mmap()/brk() and memcyp() would involve solving the halting problem. (Imposssible in the general case.)

mjevans · on April 5, 2023

I'm NOT an expert here, but offhand.

  malloc() == 'reservation' (but not paged in!) memory
  // If touched / updated THEN the memory's paged in

A copy _might_ not even become a copy if the kernel's smart enough / able to setup a hardware trigger to force a copy on writes to that area, at which point the physical memory backing two distinct logical memory zones would be copied and then different.

KMag · on April 5, 2023

That's a good point that Linux doesn't actually allocate the pages until they're faulted in by a read or write. So, if it were doing some kind of thread inspection optimization, it would presumably just need to check if the faulting thread is currently in a loop that will overwrite at least the full page.

However, that wouldn't solve the problem of other threads in the same process being able to see the page before it's fully overwritten, or debugging processes, or using a signal handler to invisibly jump out of the initialization loop in the middle, etc. There are workarounds to all of these issues, but they all have performance and complexity costs.

sumtechguy · on April 5, 2023

malloc gets memory from the heap which may or may not be paged in/reused. That means you may get reused memory from the heap (which is up to the CRT).

If you want make sure it is zero you will want calloc. If you know you are going to copy something in on the next step like your example you probably can skip calloc and just us malloc. calloc is nice for when you are doing thigs like linked lists/trees/buffers and do not want to have steps to clean out the pointers or data.

matt_heimer · on April 5, 2023

Just a guess but since apps can fail to free memory correctly you probably have to zero it on allocation and deallocation (to be secure) when you enable the feature. So you aren't swapping one for the other, you are now doing both.

Denvercoder9 · on April 5, 2023

> Just a guess but since apps can fail to free memory correctly

That's not relevant here; from the perspective of the kernel pages are either assigned to a process, or they're not. If an application fails to free memory correctly, that only means it'll keep having pages assigned to it that it no longer uses, but eventually those pages will always be released (by the kernel upon termination of the process, in the worst case).

dfox · on April 5, 2023

That is the worst case if the process had leaked that part of the heap, but it is an optimal case on process exit. On OS with any kind of process isolation walking over most of the heap before exiting as to "correctly free it" is pure waste of the CPU cycles and in worst case even IO bandwidth (when it causes parts of the heap to be paged in).

samus · on April 5, 2023

Pages can be completely avoided to be paged in if the intention is to just zero them. The kernel could either just "forget" them, or use copy-on-write with a properly zeroed out page as a base.

dfox · on April 6, 2023

The point is that you do not want to do any kind of heap cleanup before exit. The intention isn't to zero the pages, but to outright discard all of them (which is going to be done by the kernel anyway).

CodesInChaos · on April 5, 2023

I'd like the ability to control this at a process or even allocation (i.e. as a flag on mmap) level. That way a password manager could enable this, while a game could disable it.

Bender · on April 5, 2023

Do you mean init_on_alloc=1 and init_on_free=1? Here [1] is a thread on the options and performance impact. FWIW I use it on all my workstations but these days I am not doing anything that would be greatly impacted by it. I've never tried it on a gaming machine and never tried it on a large memory hypervisor.

I wish there were flags similar to this for the GPU memory. Even something that zero's GPU memory on reboot would be nice. I can always see the previous desktop after a reboot for a brief moment.

[1] - https://patchwork.kernel.org/project/linux-mm/patch/20190617...

bayindirh · on April 5, 2023

Any multi-user system where users don't know each other and handle sensitive data.

usr1106 · on April 6, 2023

So every simple home PC with more than 1 tab open where the user is logged in to more than 1 site.

bayindirh · on April 6, 2023

No, more like companies which run HPC clusters to do defense stuff and similar things, and maybe commercial HPC providers.

dathinab · on April 5, 2023

When running non performance sensitive but security sensitive code. Even adding protections summing up to much higher performance penalties can be very acceptable.

E.g. on a crypto key server. Less if it's a server which encrypts data en mass, but e.g. one which signs longer valid auth tokens or one which hold middle layer certificates which are once every few hours used to create a cert used to encrypt/sign data en mass used on a different server etc.

ape4 · on April 5, 2023

I believe the docs but I would have thought that memset() would be really quick - implemented in hardware?

dataflow · on April 5, 2023

"Real quick" is human speak. For large amounts of memory it's still bound by RAM speed for a machine, which is much lower (a couple orders of magnitude I believe) than, say, cache speed. Things might be different if there was a RAM equivalent of SSD TRIM (making the RAM module zero itself without transferring lots of zeros across the bus), but there isn't.

throwaway894345 · on April 5, 2023

I'm completely unfamiliar with how the CPU communicates with the memory modules, but is there not a way for the CPU to tell the memory modules to zero out a whole range of memory rather than one byte/sector/whatever-the-standard-unit-is at a time?

As I type this, I'm realizing how little I know about the protocol between the CPU and the memory modules--if anyone has an accessible link on the subject, I'd be grateful.

dataflow · on April 5, 2023

That's what I referred to as "TRIM for RAM". I'm not aware of it being a thing. And I don't know the protocol, but I'm also not sure it's just a matter of protocol. It might require additional circuitry per bit of memory that would increase the cost.

mjevans · on April 5, 2023

'trim' for RAM is a virtual to physical page table hack. Memory that isn't backed by a page is just a zero, it doesn't need to be initialized. Offhand it's supposed to be before it's handed to a process, but I don't know if there are E.G. mechanisms to use some spare cycles to proactively zero non-allocated memory that's a candidate for being attached to VM space.

andrewf · on April 5, 2023

Oldie but a goodie: https://people.freebsd.org/~lstewart/articles/cpumemory.pdf

MarkSweep · on April 5, 2023

Some processors have “hardware store elimination” that makes writing all zeros a bit faster than writing other values.

https://travisdowns.github.io/blog/2020/05/13/intel-zero-opt...

vlovich123 · on April 5, 2023

No. Memset (and bzero) aren’t HW accelerated. There is a special CPU instruction that can do it but in practice it’s faster to do it in a loop. In user space you can frequently leverage SIMD instructions to speed it up (of course those aren’t available in the kernel because it avoids saving/restoring those and FP registers on every syscall (only when you switch contexts).

What could be interesting if there were a CPU instruction to tell the RAM to do it. Then you would avoid the memory bandwidth impact of freeing the memory. But I don’t think there’s any such instruction for the CPU/memory protocol even today. Not sure why.

Arrath · on April 5, 2023

That seems wild to be honest. I know how easy it is to say "well they can just.."

But...wouldn't it be relatively trivial to have an instruction that tells the memory controller "set range from address y to x to 0" and let it handle it? Actually slamming a bunch of 0's out over the bus seems so very suboptimal.

mlyle · on April 5, 2023

> But...wouldn't it be relatively trivial to have an instruction that tells the memory controller "set range from address y to x to 0" and let it handle it?

Having the memory controller or memory module do it is complicated somewhat because it needs to be coherent with the caches, needs to obey translation, etc. If you have the memory controller do it, it doesn't save bandwidth. But, on the other hand, with a write back cache, your zeroing may never need to get stored to memory at all.

Further, if you have the module do it, the module/sdram state machine needs to get more complicated... and if you just have one module on the channel, then you don't benefit in bandwidth, either.

A DMA controller can be set up to do it... but in practice this is usually more expensive on big CPUs than just letting a CPU do it.

It's not really tying up a processor because of superscalar, hyperthreading, etc, either; modern processors have an abundance of resources and what slows things doing is things that must be done serially or resources that are most contended (like the bus to memory).

Arrath · on April 5, 2023

Thanks for the answer!

dathinab · on April 5, 2023

Through modern CPUs are explicitly build to make sure such a loop is fast.

And in some cases on some systems the DRM controller might zero the memory in some situations, in which cases you could say it was done by hardware.

pflanze · on April 5, 2023

> DRM controller

Did you mean DMA controller? Or do you have more information?

dathinab · on April 5, 2023

yes DMA, not the direct rendering manager ;=)

saagarjha · on April 6, 2023

dc zva?

dathinab · on April 5, 2023

really quick still doesn't mean it's free, especially if you always have to zero all the allocated pages even if the process might just have used part of the page.

Also the question is what is this % in relation to?

Probably that freeing get up to 5% slower, which is reasonable given that before you often could use idle time to zero many of the pages or might not have zeroed some of the pages at all (as they where never reused).

dathinab · on April 5, 2023

> don't clean up after a process exits

exactly, the only guarantee is that things are zeroed before handing them out to a different process, but there is some potential time gap between releasing memory back to the kernel and it being cleaned, a gap which can outlive the live of a process

> and you can get the contents of memory when you have kernel privileges. This is not so easy [..] as root

yes, root has much less privileges then the kernel, but often can gain kernel privileges.

But this is where e.g. lockdown mode comes in which denies the root users such privilege escalation (oversimplified, it's complicated). Main problem is that lockdown mode is not yet compatible with suspend to disk (hibernation), even through its documentation implies it is, if your have a encrypted hibernation. (This is misleading as it refers to a not yet existing feature where the kernel creates a encrypted image which is also tamper proof even if root tries to tamper. On the other hand suspend to an encrypted partition is possible in Linux, but not enough for lockdown mode to work.)

zamnos · on April 5, 2023

here, I made it clickable: http://pwn.college

gaudat · on April 5, 2023

Oh it is free too. Gotta try this.

djbusby · on April 5, 2023

The memory wipe is a kernel build-time config option.

CodesInChaos · on April 5, 2023

Do you know what the option is called?

edit: apparently the runtime option is called `init_on_alloc` and the compile-time option (which determines the default of the runtime option) is called `CONFIG_INIT_ON_FREE_DEFAULT_ON`.

jwilk · on April 5, 2023

There are two parameters:

• init_on_alloc (default set by CONFIG_INIT_ON_ALLOC_DEFAULT_ON)

• init_on_free (default set by CONFIG_INIT_ON_FREE_DEFAULT_ON)

djbusby · on April 6, 2023

I didn't even know of run time opt. Thanks!