Hacker Newsnew | past | comments | ask | show | jobs | submit | froh's commentslogin

I wonder how a PyPi search index could be statically served and locally evaluated on `pip search`?

PyPI servers would have to be constantly rebuilding a central index and making it available for download. Seems inefficient

Debian is somehow able to manage it for apt.

1. Debian is local first via client side cache

2. apt repositories are cryptographically signed, centrally controlled, and legally accountable.

3. apt search is understood to be approximate, distro-scoped, and slow-moving. Results change slowly and rarely break scripts. PyPI search rankings change frequently by necessity

4. Turning PyPI search into an apt-like experience would require distributing a signed, periodically refreshed global metadata corpus to every client. At PyPI’s scale, that is nontrivial in bandwidth, storage, and governance terms

5. apt search works because the repository is curated, finite, and opinionated


isn't this an incrementally updatable tree that is managed with a Merkle tree? git-like, essentially?

The install side is basically Merkle-friendly (immutable artifacts, append-only metadata, hashes, mirrors). Search isn’t. Search results are derived, subjective, and frequently rewritten (ranking tweaks, spam/malware takedowns, popularity signals). That’s more like constantly rebasing than appending commits.

You can Merklize “what files exist”; you can’t realistically Merklize “what should rank for this query today” without freezing semantics and turning CLI search into a hard API contract.


that depends on how it can be downloaded incrementally.

Hamburg has a similar arrangement, however they make a very clear and unmistakable audio announcement in both English and German.

im.surprised this not to be the case in Munich??


> _lazy_ search function developers

doing non-ascii first needs awareness and then quickly becomes tricky (encodings yay).

getting combining characters and/or homoglyphs right is hard.

and if you're still bored out: have fun with Unicode confusables.txt ...

with this in mind I dare to give them lazy bums the honor of the doubt and rather call them something between naïve and scared.


ok, fine. :)

Isn't there a library out there for this common set of problems? I know Unicode provides normalization tables, though I don't know how good they are and I don't know if Unicode also provides a library.


this discussion took off, however...


Isn't pipx addressing exactly that?

once the script is non-trivial, 'install' it using pipx, in editable mode when you work on the script and as normal pipx installed cli utility otherwise.

the venv part is then completely under the hood.


thank you for this uplifting thread!

---

I had some young family drama which kept me from studying for my first oral university exam. so I talked with the prof about it. he told me to bring a sick leave attestation from Dr such and such - or to come and give it shot. gave it a shot. "you can do much better that's obvious. I'll give you the weakest passing grade or I fail you and you redo the exam. your choice." wow.


Is there some literate programming LSP server around, which under the hood tangles the code chunks for language specific child LSP servers, and proxies those? so you have LSP support in the litprog source?

it would probably also semi-weave the source into a standard, say, markdown or latex or asciidoc and proxy that LSP server on those woven files.


That's The Original Unedutorialed Title Copy And Paste...

just sayin'...


Fair Enough


> I've never heard of Little Prince before.

interesting. may I ask which region of the world you live in?

> I don't think it's as popular as the article claims.

that may be telling more about the region you live or yourself.

I suggest go to it's Wikipedia article and check the books impact.

ps: and to get a physical copy and to read it...


It was even taught in all schools in Iran!


America, could this be a European thing?


I am also American (born and raised in Chile). The Little Prince is extremely well-known over there. I am personally very fond of it.


I lived in America (NJ and CA) for 25 years and plenty of people knew about The Little Prince.

I've even seen people wearing shirts with the drawing of the snake that ate the elephant.


It's not that obscure, even in the US. Anyone who takes French in US high school has probably read it in French (it's very easy to read), and even in English it's one of the most common classic children's books.


Apparently James Dean loved the book from an early age, so guessing it must have had some popularity in the USA.


Ohio, extremely popular. My son's playroom is all Le Petit Prince and a neighbor teaches it at school


I think it's rather a kind-of- schooling-and-education thing.

for schools in a "humanistic" tradition I dare to bet it's canon.

it's a very beautiful read and when you have time, go and grab a sweet illustrated full text paper copy in your language of choice, it has been translated in all languages of the world, and there are wonderful editions of the book. I treasure a large pop up one.

At first glance it looks and feels like a childrens book, but really, is it? Antoine de Saint-Exupéry offers a very unique and poetic look at humankind and a truly timeless masterpiece, touching not so children topic's like different types of vanity, several perspectives on the rat race, addiction, love of course, both "caritas" and "amor" and at an idealistic level also "eros", responsibility for nature, it even touches on assisted suicide, but all of these little essays which are woven into a story arc are told with deep love and tenderness and clarity.

fine dining, if you wish, a gourmet story, really.

you can tell I like it :-D


I live in the Netherlands for almost 50 years and never heard of it either.


> The spirit of the GPL is the freedom of the user, not the code being freely shared.

who do you mean by "user"?

the spirit is that the person who actually uses the software also has the freedom to modify it, and that the users recovering these modifications have the same rights.

is that what you meant?

and while technically that's the spirit of the GPL, the license is not only about users, but about a _relationship_, that of the user and the software and what the user is allowed to do with the software.

it thus makes sense to talk about "software freedom".

last not least, about a single GPL function --- many GPL _libraries_ are licensed less restrictively, LGPL.


I don't think you understand the GPL.

> "the user is allowed to do with the software"

The GPL does not restrict what the user does with the software.

It can be USED for anything.

But it does restrict how you redistribute it. You have responsibilities if you redistribute it. You must provide the source code, and pass on the same freedoms you received to the users you redistribute it to.


Thinking on though, if the models are trained on any GPL code then one could consider that they contain that GPL code, and are constantly and continually updating and modifying that code, thus everything the model subsequently outputs and distributes should come under the GPL too. It’s far from sufficient that, say, OpenAI have a page on their website to redistribute the code they consume in their models if such code becomes part of the model’s training data that is resident in memory every time it produces new code for users. In the spirit of the GPL all that derivative code seems to also come under the GPL, and has to be made available for free, even if upon every request the generated code is somehow novel or unique to that user.


Riffing on this:

If the LLM can reproduce the entire GPL'd code, with licence and attribution intact, then that would satisfy the GPL, correct?

If the LLM can invent new code, inspired by but not copied from the GPL'd code, that new code does not require a GPL licence.

This is essentially the same as we humans do: I read some GPL code and go "huh, neat architecture!" and then a year later solve a similar problem using an architecture inspired by that code. This is not copying, and does not require me to GPL the code I'm producing. But if I copy-paste a function from the GPL code into my code base, I need to respect the licence conditions and GPL at least part of my code base.

I think the argument that the author is talking about is if the model itself should be GPL'd because it contains copies of GPL'd code that can be reproduced. I don't buy this because that GPL code is not being run as part of the model's functioning. To use an analogy: if I create a code storage system, and then use it to store some GPL code, I don't have to GPL the code storage system itself. As long as it can reproduce the GPL code together with its licence and attribution, then the GPL is not being infringed at any point. The system is not using or running the GPL code itself, it is just storing the GPL code. This is what the LLM is doing.


> Thinking on though, if the models are trained on any GPL code then one could consider that they contain that GPL code, and are constantly and continually updating and modifying that code, thus everything the model subsequently outputs and distributes should come under the GPL too.

If you ask a model to output a task scheduler in C, and the training data contained a GPL-licensed implementation of the Fibonacci function in Haskell, the output isn't likely to bear a lot of resemblance to that input. It might even be unrelated enough that adding that function to the training data doesn't affect what the model outputs for that prompt at all.

The nasty thing in terms using code generated by these things is that if you ask the model to output a task scheduler in C and the training data contained a GPL-licensed implementation of a task scheduler in C, the output plausibly could bear a strong resemblance to that input. Without you knowing that. And then if you go incorporate that into something you're redistributing, what happens?


fundemental architecture of networks, compilers, disk operating systems, databases and more are implemented in GPL family LICENSE code; high value targets to acquire and master.


first I thought you'd go into the nuance of gpl2 vs 3 or lgpl vs gpl vs agpl? patents, tivoization, cloud use?

:-)

I agree, I didn't make any statement what you can do with the software as long as you are licensed to use it

you are allowed to build atomic bombs, nuclear power plants, tanks, whatever.

but only as long as you comply i.e. give your downstream the freedom you've received.

if you fail at that, you're no longer allowed to use the software for anything.

see section 8 Termination for details

https://www.gnu.org/licenses/gpl-3.0.html#license-text


> first I thought you'd go into

... I doubt that would clarify the clarity in clearness.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: