More

miketheman · 2026-01-01T01:50:20 1767232220

Whoops, sorry about that. Should be fixed now. Happy New Year!

nmstoker · 2026-01-01T10:22:22 1767262942

Thanks! I confirm they're all working for me now!

Happy New Year!

miketheman · 2026-01-01T01:36:10 1767231370

PyPI responses are cached at 99% or higher, with less infrastructure to run.

Search is an unbounded context and does not lend itself to caching very well, as every search can contain anything

bastawhiz · 2026-01-01T02:02:42 1767232962

Pypi has fewer than one million projects. The searchable content for each package is what? 300 bytes? That's a 200mb index. You don't even need fancy full text search, you could literally split the query by word and do a grep over a text file. No need for elasticsearch or anything fancy.

And anyway, hit rates are going to be pretty good. You're not taking arbitrary queries, the domain is pretty narrow. Half the queries are going to be for requests, pytorch, numpy, httpx, and the other usual suspects.

froh · 2026-01-01T04:03:12 1767240192

I wonder how a PyPi search index could be statically served and locally evaluated on `pip search`?

firesteelrain · 2026-01-01T04:25:20 1767241520

PyPI servers would have to be constantly rebuilding a central index and making it available for download. Seems inefficient

froh · 2026-01-01T18:48:05 1767293285

that depends on how it can be downloaded incrementally.

ptx · 2026-01-01T16:15:35 1767284135

Debian is somehow able to manage it for apt.

firesteelrain · 2026-01-01T17:17:53 1767287873

1. Debian is local first via client side cache

2. apt repositories are cryptographically signed, centrally controlled, and legally accountable.

3. apt search is understood to be approximate, distro-scoped, and slow-moving. Results change slowly and rarely break scripts. PyPI search rankings change frequently by necessity

4. Turning PyPI search into an apt-like experience would require distributing a signed, periodically refreshed global metadata corpus to every client. At PyPI’s scale, that is nontrivial in bandwidth, storage, and governance terms

5. apt search works because the repository is curated, finite, and opinionated

froh · 2026-01-01T18:52:09 1767293529

isn't this an incrementally updatable tree that is managed with a Merkle tree? git-like, essentially?

firesteelrain · 2026-01-01T19:21:22 1767295282

The install side is basically Merkle-friendly (immutable artifacts, append-only metadata, hashes, mirrors). Search isn’t. Search results are derived, subjective, and frequently rewritten (ranking tweaks, spam/malware takedowns, popularity signals). That’s more like constantly rebasing than appending commits.

You can Merklize “what files exist”; you can’t realistically Merklize “what should rank for this query today” without freezing semantics and turning CLI search into a hard API contract.

woodruffw · 2026-01-01T04:55:07 1767243307

The searchable context for a distribution on PyPI is unbounded in the general case, assuming the goal is to allow search over READMEs, distribution metadata, etc.

(Which isn’t to say I disagree with you about scale not being the main issue, just to offer some nuance. Another piece of nuance is the fact that distributions are the source of metadata but users think in terms of projects/releases.)

coldtea · 2026-01-01T17:42:48 1767289368

>The searchable context for a distribution on PyPI is unbounded in the general case, assuming the goal is to allow search over READMEs, distribution metadata, etc.

Even including those, it's what? Sub-20-30GB.

bastawhiz · 2026-01-01T15:34:52 1767281692

> assuming the goal is to allow search over READMEs, distribution metadata, etc.

Why would you build a dedicated tool for this instead of just using a search engine? If I'm looking for a specific keyword in some project's very long README I'm searching kagi, not npm.

I'd expect that the most you should be indexing is the data in the project metadata (setup.py). That could be unbounded but I can't think of a compelling reason not to truncate it beyond a reasonable length.

woodruffw · 2026-01-01T15:43:15 1767282195

You would definitely use a search engine. I was just responding to a specific design constraint.

(Note PyPI can’t index metadata from a `setup.py` however, since that would involve running arbitrary code. PyPI needs to be given structured metadata, and not all distributions provide that.)

Kwpolska · 2026-01-01T10:42:38 1767264158

How does the big white search box at https://pypi.org/ work? Why couldn’t the same technology be used to power the CLI? If there’s an issue with abuse, I don’t think many people would mind rate limiting or mandatory authentication before search can be used.

firesteelrain · 2026-01-01T13:42:35 1767274955

The PyPI website search is implemented using a real search backend (historically Elasticsearch/OpenSearch–style infrastructure) layered behind application logic on Python Package Index. Queries are tokenized, ranked, filtered, logged, and throttled. That works fine for humans interacting through a browser.

The moment you expose that same service to a ubiquitous CLI like pip, the workload changes qualitatively.

PyPI has the /simple endpoint that the CDN can handle.

It’s PyPI philosophy that search happens on the website and pip has aligned to that. Pip doesn’t want to make a web scraper understandably so the function of searching remains disabled

miketheman · 2025-11-01T19:00:43 1762023643

Alpha Omega is proud to share the newly published white paper from Seth Larson of the Python Software Foundation, titled “Slippery Zips and Sticky Tar Pits: Security and Archives.” Seth serves as Python’s Security Developer in Residence, a role sponsored by Alpha Omega, focused on improving the safety and trustworthiness of open source software that powers systems and applications everywhere.

miketheman · 2025-09-16T21:09:47 1758056987

Incident report of a recent attack campaign targeting GitHub Actions workflows to exfiltrate PyPI tokens, our response, and steps to protect your projects.

miketheman · 2025-08-19T21:25:51 1755638751

This largely depends on the ICANN policies and their definitions of Renewal and Registration Grace Periods. The Renewal period is variable, but the Registration Grace Period is pretty much 30 days everywhere.

https://www.icann.org/en/contracted-parties/consensus-polici...

Here's an example from denic.de: https://www.denic.de/en/domains/de-domains/domain-deletion#c...

TLDRisk · 2025-08-19T21:58:03 1755640683

The ERRP only covers gTLDs, right? Have you seen any ICANN policies requiring ccTLDs to adopt the same grace periods. As far as I know, ccTLDs can do whatever they want.

michalpleban · 2025-08-20T04:59:34 1755665974

ICANN policies only govern global domains. Country domains set their own policies; for example, .eu expiration period is 45 days, not 30.

WHO IS policies also vary wildly, for example .de domains do not show registration date in the WHO IS, so it's not possible to know if a domain was dropped and re-registered.

miketheman · 2025-08-18T16:27:45 1755534465

PyPI now checks for expired domains to prevent domain resurrection attacks, a type of supply-chain attack where someone buys an expired domain and uses it to take over PyPI accounts through password resets.

miketheman · 2025-07-28T14:46:54 1753714014

There is an active phishing attack targeting PyPI users.

• Threat: Emails from noreply@pypj.org (with a 'j') link to a fake login page.

• Action: Do not click any links. If you already did, change your PyPI password ASAP.

• Note: PyPI itself has not been breached.

miketheman · 2025-07-17T00:01:41 1752710501

Sadly the majority of this data is not externally visible.

miketheman · 2025-07-16T21:06:34 1752699994

Thank you!

miketheman · 2025-05-08T13:53:55 1746712435

Excellent idea, and something I tried a little while back. The `pytest-postgresql` plugin used has the ability to do this natively, but when we tried it out we found that we had other issues with developing on a Linux machine.

Attempt: https://github.com/pypi/warehouse/pull/15365

Revert: https://github.com/pypi/warehouse/pull/15444

If you've got experience with making this kind of thing work on Linux development machines, it'd be great to have some help getting that back.