Why would you need to know anything but hits/month and ad-revenue/month? Why don...

mvdtnz · on May 18, 2023

He gave several valid examples in the comment you're replying to.

kmlx · on May 18, 2023

that's easy: to improve the service.

austinjp · on May 18, 2023

This is the generic reason that is always given, but I don't think I've ever seen exactly how the service will be improved and why cookies (or any uniquely-identifying data) are the only way to achieve the desired outcomes.

janejeon · on May 18, 2023

I'm sorry, but have you tried making and hosting a website? Product Analytics is a very popular category for a reason.

austinjp · on May 18, 2023

Many, many times :)

My point is that it's never described. Even in the detailed options for those cookie banners that permit you to tweak things. Sure, there are (some) details about "our partners" and advertising etc, but exactly how the service will be improved is never explained.

The reason is simple and obvious: nobody knows. Companies collect data in the belief it will be useful in improving the service, but generally chuck it into a data-swamp and occasionally rake it over to extract basic info like navigation routes.

I'm also rarely convinced that improvements can't be better determined by focus groups and other similar methods.

"Improving our service" is a glib catch-all that rarely stands up to scrutiny.

There may be specific examples where a cookie is genuinely the best method to improve a feature -- in which case: name the feature, list the metrics, declare success/fail criteria, and stop collecting the cookie after the decision has been made.

Edit: typos.

dingledork69 · on May 18, 2023

Okay, how? Provide details, thanks.

seri4l · on May 18, 2023

>hash the IP-address

How would that work? I can't think of any approach where getting the original IP back from the hash isn't trivial.

jackdoe · on May 18, 2023

you dont need to get the original ip back, just need to know how many unique ips are there, so sha(ip) is good enough

quesomaster9000 · on May 18, 2023

With little over 4 billion IPv4 addresses.

From a stackoverflow post from 12 years ago:

> I know I do 622 million SHA-256's per sec on a Radeon HD5830.

Which would take around 6 seconds to brute force a 32bit address space.

jackdoe · on May 18, 2023

you can just salt it with some stable random thing

toast0 · on May 18, 2023

In which case you can just take 6 seconds to generate all the new hashes and build a new lookup table.

operator-name · on May 18, 2023

You can further add bucketing, and eventually move closer to FLoC.

But this is aside the point, as the spirit of the law only allows "processing for legitimate interests". The use of technology, cookies or on the server is irrelevant. If thread OP has evaluated their collection[0] as legitimate, they can use whatever technology within guidelines. Otherwise, even a cookie less data collection would require consent.

[0]: https://ico.org.uk/for-organisations/guide-to-data-protectio...

seri4l · on May 18, 2023

I apologize for the double negative. What I meant is that hashing doesn't improve privacy because if you know the hash and the hashing function it's easy to build a hashmap of all the possible IPv4s (around 3.5B). Unless the hash uses some sort of expensive key derivation function, but that doesn't scale.

GordonS · on May 18, 2023

You could simply salt the hash, though you'd need to treat the salt as a secret.

Alternatively, you could use a new salt every day, which would only allow you to track an individual for a 24 hour period (likely enough for many).

andirk · on May 18, 2023

?? sha256 the string and you are not going to be able to get back to the original from that output.

Edit: The small amount of IP addresses makes it easy to brute force through all of them.

tourmalinetaco · on May 18, 2023

The hashing doesn’t matter when IPv4 has such a limited dataset. IPv4 has a little under 4.3B addresses, and a cheaper GPU such as the 1080TI has a hash rate of around 4300MH/s, so it crushes that in a few seconds at most.

From there, you have a direct correlation between the IP and its resulting hash. Meaning you can easily see what the original input was.

You don’t need to break a hash to know what the original input was.

titaniczero · on May 18, 2023

ipv4 space is very limited and you can easily compute all the hashes. There is salting and combined with rotating salts it could work but no one guarantees that you’re not storing them