More

timClicks · 2025-12-09T08:13:07 1765267987

Sorry if this is somewhat pedantic, but I believe that only US companies (and possibly only Delaware corporations?) are bound by the requirement to maximize shareholder value and then only by case law rather than statue. Other jurisdictions allow the directors more discretion, or place more weight on the company's constitution/charter.

timClicks · 2025-11-25T03:37:48 1764041868

References only have a single bit available as a niche (the null byte), which Option makes use of for null pointer optimization (https://doc.rust-lang.org/std/option/index.html#representati...).

In principle, you Rust could create something like std::num::NonZero and its corresponding sealed trait ZeroablePrimitive to mark that two bits are unused. But that doesn't exist yet as far as I know.

Rusky · 2025-11-25T04:32:18 1764045138

There are also currently the unstable rustc_layout_scalar_valid_range_start and rustc_layout_scalar_valid_range_end attributes (which are used in the definition of NonNull, etc.) which could be used for some bit patterns.

Also aspirations to use pattern types for this sort of thing: https://github.com/rust-lang/rust/issues/135996

timClicks · 2025-11-23T09:05:53 1763888753

Asciidoc corresponds directly to DocBook XML. They're two formats with exactly the same semantics.

timClicks · 2025-11-17T06:28:16 1763360896

MCP is an example of "worse is better". Everyone knows that it's not very good, but it gets the job done.

timClicks · 2025-08-24T20:17:33 1756066653

It's been a while since I've played in the area, but is PCA still the go to method for dimensionality reduction?

wenc · 2025-08-24T20:41:33 1756068093

PCA (essentially SVD) the one that makes the fewest assumptions. It still works really well if your data is (locally) linear and more or less Gaussian. PLS is the regression version of PCA.

There are also nonlinear techniques. I’ve used UMAP and it’s excellent (particularly if your data approximately lies on a manifold).

https://umap-learn.readthedocs.io/en/latest/

The most general purpose deep learning dimensionality reduction technique is of course the autoencoder (easy to code in PyTorch). Unlike the above, it makes very few assumptions, but this also means you need a ton more data to train it.

ChadNauseam · 2025-08-25T06:33:00 1756103580

> PCA (essentially SVD) the one that makes the fewest assumptions

Do you mean it makes the *strongest* assumptions? "your data is (locally) linear and more or less Gaussian" seems like a fairly strong assumption. Sorry for the newb question as I'm not very familiar with this space.

wenc · 2025-08-25T07:01:37 1756105297

You’re correct in a mathematical sense: linearity and Gaussian are restrictive assumptions.

However I meant it colloquially in that those assumptions are trivially satisfied by many generating processes in the physical and engineering world, and there aren’t a whole lot of other requirements that need to be met.

Lerc · 2025-08-25T12:36:51 1756125411

From 2019 but a decent overview by Leland McInnes https://www.youtube.com/watch?v=9iol3Lk6kyU

There's a newer thing called PacMap which is an interesting thing that handles difference cases better. Not as robustly tested as UMAP but that could be said of any new thing. I'm a little wary that it might be overfitted to common test cases. To my mind it feels like PacMap seems like a partial solution of a better way of doing it.

The three stage process of PacMap is either asking to be developed into either a continuous system or finding a analytical reason/way to conduct a phase change.

wenc · 2025-08-26T03:04:21 1756177461

Leland McInnes is amazing. He's also the author of UMAP.

baq · 2025-08-24T20:42:25 1756068145

PCA is nice if you know relationships are linear. You also want to be aware of TSNE and UMAP.

wenc · 2025-08-24T20:47:52 1756068472

A lot of relationships are (locally) linear so this isn’t as restrictive as it might seem. Many real-life productionized applications are based on it. Like linear regression, it has its place.

T-SNE is good for visualization and for seeing class separation, but in my experience, I haven’t found it to work for me for dimensionality reduction per se (maybe I’m missing something). For me, it’s more of a visualization tool.

On that note, there’s a new algorithm that improves on T-SNE called PaCMAP which preserves local and global structures better. https://github.com/YingfanWang/PaCMAP

a_bonobo · 2025-08-25T00:42:39 1756082559

There's also Bonsai, it's parameter-free and supposedly 'better' than t-SNE, but it's clearly aimed at visualisation purposes (except that in Bonsai trees, distances between nodes are 'real' which is usually not the case in t-SNE)

https://www.biorxiv.org/content/10.1101/2025.05.08.652944v1....

energy123 · 2025-08-25T06:57:48 1756105068

I'd add that PCA/OLS is linear in the functional form (linear combination), but the input variables can be non-linear (X_new := X_{old,1}*X_{old,2}^2), so if the non-linearities are simple, then basic feature engineering to strip out the non-linearities before fitting PCA/OLS may be acceptable.

timClicks · 2025-08-06T03:42:44 1754451764

This reminds me of when I provided some impressions of Erlang as a newcomer to their mailing list.

One of my suggestions was that they include hash tables, rather than rely on records (linked lists with named key). Got flamed as ignorant, and I've never emailed that mailing list again. A while later, they ended up adding hash tables to the language.

timClicks · 2025-04-15T09:53:58 1744710838

There are a few commercially valid strategies.

1. Goodwill and mindshare. If you're known as "the best" or "the most innovative", then you'll attract customers.

2. Talent acquisition. Smart people like working with smart people.

3. Becoming the standard. If your technology becomes widely adopted, and you've been using it the longest, then you're suddenly be the best placed in your industry to make use of the technology while everyone retools.

4. Deception. Sometimes you publish work that's "old" internally but is still state of the art. This provides your competition with a false sense of where your research actually is.

5. Freeride on others' work. Maybe experimenting with extending an idea is too expensive/risky to fund internally? Perhaps a wave of startups will try. Acquire one of them that actually makes it work.

6. Undercut the market leader. If your industry has a clear market leader, the others can use open source to cooperate to erode that leadership position.

timClicks · 2025-04-14T16:19:02 1744647542

That's a curious remark, although I guess it doesn't look high level from the eyes of someone looking at programming languages today.

C has always been classed as a high level language since its inception. That term's meaning has shifted though. When C was created, it wasn't assembly (middle) or directly writing CPU op codes in binary/hex (low level).

timClicks · 2025-04-09T05:20:30 1744176030

Perhaps the most generous interpretation is that the authors were writing an article for people who do the naïve thing without reading the docs. There are quite a few people in that category.

timClicks · 2025-04-08T20:58:14 1744145894

> The secretive nature of Cyc has multiple causes. Lenat personally did not release the source code of his PhD project or EURISKO, remained unimpressed with open source, and disliked academia as much as academia disliked him.

One thing that's not mentioned here, but something that I took away from Wolfram's obituary of Lenat (https://writings.stephenwolfram.com/2023/09/remembering-doug...) was that Lenat was very easily distracted ("Could we somehow usefully connect [Wolfram|Alpha and the Wolfram Language] to CYC? ... But when I was at SXSW the next year Doug had something else he wanted to show me. It was a math education game.").

My armchair diagnosis is untreated ADHD. He might have had had discussing the internals of CYC on his todo list since its first prototype, but the draft was never ready.