You have to consider what kind of risk you are protecting yourself against.
It's highly unlikely that you would be the target of such a highly sophisticated attack, but a hacker could get into a place where you left your computer without surveillance (such as your home or a hotel) for about 15 minutes, and install it inside your computer.
If you think you could be the target of such an attack, you could maybe enable an alert in the settings of your UEFI if your computer has been opened (I know that my ThinkPad has that option), or the better option is to always keep your laptop with you.
I'm mostly asking because the original poster was painting a process that can be sniffed off the bus (that is - buy a stolen laptop off ebay, try to boot it, sniff the key off the bus) with a process that requires active targeting and multiple breakins to work as equivalent.
It seems like these security discussions always devolve into rather funny moving of goalposts without actually considering how much work each exploit requires.
The goalposts haven't moved in my mind, but I suppose I didn't make them clear in my first post.
Basically the TPM provides a set of features that are really useful for corporate Windows deployments. No more forgotten passwords, because the self-unlocking disk encryption sends the user straight to the Windows login screen, and helpdesk can reset forgotten Windows passwords remotely.
And for casual home Windows users, it lets them log in with a 4-digit PIN or with biometrics, so it's got usability benefits for them too. If every OS now needs Microsoft's signature of approval, or a really fiddly setup process? Well they were running Windows anyway, so no problem.
These usability/support benefits rely on self-unlocking disk encryption, which is vulnerable to sniffing if someone gets a stolen laptop on ebay.
For the kind of technically sophisticated, security enthusiast users who comment on blog posts about TPMs? We're more than happy to key in a strong unique password at every boot, and if we forget the password and lose access to everything on that disk that's just the system working as it's supposed to.
For us, the benefits of TPMs and measured boot for personal use are a lot more obscure. You'll sometimes hear people claim it protects against 'evil maid attacks' where an attacker repeatedly gets physical access to your laptop. The truth is it provides no such protection.
> For us, the benefits of TPMs and measured boot for personal use are a lot more obscure. You'll sometimes hear people claim it protects against 'evil maid attacks' where an attacker repeatedly gets physical access to your laptop. The truth is it provides no such protection.
TPMs give you fine and adequate protections in many scenarios, even physical ones.
They also provide you with better protection for private key material.
> TPMs give you fine and adequate protections in many scenarios [...] my `ssh-tpm-agent` project
I agree that's adequate, in the sense that keeping the an SSH key as a password-protected file on disk is adequate, and having it be a password-protected secret in the TPM is no less secure than that.
But the whole point of binding a key to hardware is to be secure even if a remote attacker has gotten root on your machine. An attacker with root can simply replace the software that reads your PIN with a modified version that also saves it somewhere. Then they can use the key whenever your computer is online, even if they can't copy the key off. And although that's a bit limiting, once they've SSHed to a host as me once they can add their own key to authorized_keys in many cases.
That's why Yubikeys and U2F keys and suchlike have a physical button.
TPMs would be a lot more useful if the spec had mandated a physical button for user presence.
> But the whole point of binding a key to hardware is to be secure even if a remote attacker has gotten root on your machine. An attacker with root can simply replace the software that reads your PIN with a modified version that also saves it somewhere. Then they can use the key whenever your computer is online, even if they can't copy the key off.
It protects against extraction, not usage on the machine itself. Of course they can use the secret on the compromised machine.
> And although that's a bit limiting, once they've SSHed to a host as me once they can add their own key to authorized_keys in many cases.
Assuming they can edit the file.
> That's why Yubikeys and U2F keys and suchlike have a physical button.
The TPM spec has a policy setup to account for some fingerprint reader that can be used to authenticate. I haven't been able to figure out how/what/whys of the implementation here but this is very much a thing.
> It protects against extraction, not usage on the machine itself. Of course they can use the secret on the compromised machine.
Yes, this is why I was careful to say that the benefits are obscure, rather than saying they're entirely nonexistent.
I'll admit that's a benefit, but it seems very small benefit considering the far-reaching changes it's needed like kernel lockdown mode, the microsoft-signed shim, distro-signed initrd, the difficulties it creates with DKMS, and so on.
Whereas people who need to bind their SSH key to hardware can get a higher degree of security with a far smaller attack surface by simply spending an hour's wages on a Yubikey.
> I'll admit that's a benefit, but it seems very small benefit considering the far-reaching changes it's needed like kernel lockdown mode, the microsoft-signed shim, distro-signed initrd, the difficulties it creates with DKMS, and so on
None of this is needed to take advantage of TPMs.
> Whereas people who need to bind their SSH key to hardware can get a higher degree of security with a far smaller attack surface by simply spending an hour's wages on a Yubikey.
Yubikeys are expensive devices, and TPMs are ubiquitous. Better tooling solves this problem.
> None of this is needed to take advantage of TPMs.
You're not binding the secret to PCR values? I thought TPM fans loved those things?
I don't blame you - they look like a design-by-committee house of cards to me, with far too many parties involved and far too much attack surface. Just like the rest of the TPM spec.
> You're not binding the secret to PCR values? I thought TPM fans loved those things?
Binding things to PCR values doesn't imply you need Secure Boot, signed initrd, lockdown mode, shim and signed kernel modules. All of these things are individual security measures that can be combined depending on your threat model.
> I don't blame you - they look like a design-by-committee house of cards to me, with far too many parties involved and far too much attack surface. Just like the rest of the TPM spec.
The v2.0 version of TPM doesn't really make PCR policies easier to use, so I've had troubles getting them properly integrated into the tools I write as you need to deal with a key to sign updated policies. `systemd-pcrlock` might solve parts of this but it's all a bit.. ugly to deal with really.
The entire TPM specc is not great. But I find TPMs too useful to ignore.
> Basically the TPM provides a set of features that are really useful for corporate Windows deployments. No more forgotten passwords, because the self-unlocking disk encryption sends the user straight to the Windows login screen, and helpdesk can reset forgotten Windows passwords remotely.
Unclear why this requires a TPM. Boot the system from a static unencrypted partition containing no sensitive data, display the login screen, when the user authenticates the system uses their credentials to get the FDE decryption key from the directory server. Bonus: Now the FDE keys are stored in the directory server and if the system board fails in the laptop you can remove the drive and recover the data.
An attacker with physical access could modify the unencrypted partition to compromise the user's password the next time the user logs in, but they could do the same thing with a hardware keylogger.
> And for casual home Windows users, it lets them log in with a 4-digit PIN or with biometrics, so it's got usability benefits for them too.
This could be implemented the same way using Microsoft's servers, given that they seem to insist you create a Microsoft account these days anyway.
It's not clear that unsophisticated users actually benefit from default-FDE though. They're more likely to lose their data to it than have it protect them from theft, and losing your family photos is generally more of a harm than some third party getting access to your family photos.
If the machine is already on but asleep, the keys are in memory, they only have to be downloaded from the server on first login. If the machine has been off and you have no network connection then you need the long password to unlock it instead of the short one, but for most users that is already irrelevant because everything else requires a network connection too.
Ah ok, so I'll need to memorize the super long password whenever I'm out and about and want to just check something real quick. I guess I'll just put that on the sticky note on the bottom of the computer.
You want to check something real quick on what... the internet? Then you have internet access. You also have access to the local data on the machine as long as it was asleep rather than off, which will be the case the vast majority of the time.
Keeping the key stored on the machine, TPM or no, is also less secure than keeping it somewhere else. If someone steals your laptop, you deny all access to the key on the server and they can't get it even if they could guess the pin (or the user wrote that on the bottom of the computer), and there is no way to use an offline method to extract the key from the TPM because it isn't there.
So the sole legitimate use case for a TPM is when you're somewhere with neither cellular service nor Wi-Fi (rare) and your portable device is off rather than asleep (rare) and you can't remember a long passphrase, which doesn't have to be unmemorable, it's just less convenient to type.
This seems like it isn't worth the cost in authoritarianism?
For that matter you could still implement even that with just a secure enclave that will only release the key given the correct PIN (and then rate limits attempts etc.), but then does actually release the key in that case and doesn't do any kind of remote attestation or signing.
> a secure enclave that will only release the key given the correct PIN
So...a TPM?
> This seems like it isn't worth the cost in authoritarianism?
You know what's really authoritarian? Having your computer practically only decryptable by some remote directory server, potentially not even under your control.
A similar project is rr[0], which is freely available. Like you said, I find that reversible debuggers are a huge improvement over regular debuggers because of the ability to record an execution and then effectively bisect the trace for issues.
The memory is already mapped by the BIOS/EFI firmware, before the kernel takes control.
By default, whenever the memory modules used in all different channels have the same size, e.g. two 8 GB modules, the firmware maps the modules with interleaved addresses, to ensure a double throughput for 2 channels, or triple/quadruple/etc. for workstation/server motherboards with more memory channels.
Someone as far ahead of the curve as you clearly are might enjoy the second chapter of Coders at Work, which is an interview with Brad Fitzpatrick. bfitz wrote everything from memcached to big chunks of TailScale and much in between.
He went to university in CS but would have been bored to sleep if he didn’t have something else going on, so he founded and ran LiveJournal simultaneously.
I bet someone on this thread knows him and I bet he’d take the time to offer some pointers to an up-and-comer like yourself. I’ve never met him, but I did some business with Six Apart in a previous life and people say he’s a really nice guy.
Unfortunately I haven’t had the time to do a proper benchmark, and the fpng test executable only decodes/encodes a single image which produces very noisy/inconclusive results. However, I’m under the impression that it doesn’t make a large difference in terms of overall time.
fpnge (which I wasn’t aware of until now) appears to already be using a very similar (identical?) algorithm, so I suspect the relative performance of fpng and fpnge would not be significantly impacted by this change.
As someone who has been recently optimising fpnge, Adler32 computation is pretty much negligible regarding overall runtime.
The Huffman coding and filter search take up most of the time. (IIRC fpng doesn't do any filter search, but Huffman encoding isn't vectorized, so I'd expect that to dominate fpng's runtime)
Not sure if any of these would result in meaningful performance gains, but a few ideas I had:
* An avx96/avx128 version, which requires more care than avx32/avx64 because you will overflow a 16 bit signed number if you simply extend the coefficient vectors from 0..32 to 0..96/128 (e.g. 255*96 + 254*96 > 32767), but looking at it now, I realize you shouldn't actually need more than one 0..32 coefficient vector.
* The chunk length could be longer because there are 8 separate 32 bit counters in each vector, which can be summed into a uint64_t instead of a uint32_t when computing the modulo.
* As you said, aligning the loads and deferring the `_mm256_madd_epi16` outside of the loop. For deferring the madd specifically, using two separate sum2 vectors and splitting the `mad` vector into two by using `_mm256_and_si256(mad, _mm256_set1_epi32(0xFFFF)` and `_mm256_srli(mad, 16)` which should improve upon the 5 cycle latency hit incurred by the madd.
Plus I am sure there are many other opportunities to optimize this I have not thought of :)
> For deferring the madd specifically, using two separate sum2 vectors and splitting the `mad` vector into two
Actually, the idea was to accumulate into 16-bit sums, and only do madd to 32-bit every 4 loop cycles.
I'm not sure splitting it up like that actually helps, since the latency can be easily hidden by an OoO processor, and could actually be detrimental adding more uOps.
One thing to note is that you've got a dependent add chain on sum2_v, so using two independent sums instead of one could help.
> Plus I am sure there are many other opportunities to optimize this I have not thought of :)
You can debug programs that ran in the past using debuggers like rr[0], which support both recording execution for later debugging, or stepping backward in a running process.