Hacker Newsnew | past | comments | ask | show | jobs | submit | dhruvdh's commentslogin

El Capitan can also do FP8. HPC requires double precision generally but people are trying to make low precision work.


I'm particularly fond of the Ozaki scheme https://arxiv.org/html/2306.11975v4 and its recent refinements. Hopefully it trickles down to standard HPC libraries soon.


To be fair, you can buy ~3 of these for the price Nvidia charges for 24GB/32GB models.


If people want more VRAM the 24GB 7900xtx is right there and has been there for years.


Yes but you can't easily put 3 GPUs in 1 PC


To add, AMD only makes _parts_ of an MI300X server.

It's like asking a tire manufacturer to give you a car for free.


Great analogy!

Just uploaded some pictures of how complex these machines really are...

https://imgur.com/gallery/dell-xe9860-amd-mi300x-bGKyQKr


He explained the reasoning:

> Now, why don't they send me the two boxes? I understand when I was asking for firmware to be open sourced that that actually might be difficult for them, but the boxes are on eBay with a simple $$ cost. It was never about the boxes themselves, it was a test to see if software had any budget or power. And they failed super hard


I know this is someone else's reasoning, so you can't answer this question, but, doesn't this just test if they want to spend the budget on this specific thing?

If I ask a company for a $100,000 grant, and they're not willing, it doesn't seem like correct logic to assume that means they don't have the budget for it. Maybe they just don't want to spend $100,000 on me.

Why does this mean they don't have a budget or power?


He assumes the software department wants to do this, which - yes - seems to be flawed logic on his side.

Let's imagine he's indeed correct. He receives the hardware, get's hacking and solves all of AMDs problem, the stock surges and tinygrad becomes a major deep learning framework.

That would be a collosal embarrassment for AMDs software department.



FWIW that login panel is controlled by a single feature flag


They should be more interested in selling product than ego


"and they failed" from his PoV... but not from us looking at things from the other side of the table.


Chip vendors regularly send out free hardware to software developers. In this case I don't think the cost is the issue; AMD simply doesn't want what Geohot is offering.


Considering that AMD is only really supporting their datacenter GPUs with ROCm, this is the worst possible response. It means compute on AMD GPUs is only meant for the elite of the elite and forever out of reach for the average consumer and that Nvidia is not only outcompeting AMD on quality but also on cost.


I wish more people would just try to do things just like this and blog about their failures.

> The published version of a proof is always condensed. And even if you take all the math that has been published in the history of mankind, it’s still small compared to what these models are trained on.

> And people only publish the success stories. The data that are really precious are from when someone tries something, and it doesn’t quite work, but they know how to fix it. But they only publish the successful thing, not the process.

- Terence Tao (https://www.scientificamerican.com/article/ai-will-become-ma...)

Personally, I think failures on their own are valuable. Others can come in and branch off from a decision you made that instead leads to success. Maybe the idea can be applied to a different domain. Maybe your failure clarified something for someone.


Thank you for saying this. I agree which is why I wrote this up.


I wish more people would just try to do things just like this and blog about their failures.

Came here to say the same thing. Actually, I guess I did say the same thing, just in a much more long-winded form. Needless to say, I concur with you 100%.


Disappointed that there wasn’t anything on inference performance in the article at all. That’s what the major customers have announced they use it for.


Which algorithm you pick for what shape of matrices is different and not straightforward to figure out. AMD currently wants you to “tune” ops and likely search for the right algorithm for your shapes while Nvidia has accurate heuristics for picking the right algorithm.


Nvidia's heuristics are not accurate, and it's not possible to achieve peak performance without search.


> despite them being fabless

That's not how it works. You need to pump money into fabs to get them working, and Intel doesn't have money. If AMD had fabs to light up their money, they would also have a much lower valuation.

The market is completely irrational on AMD. Their 52-week high is ~225$ and 52-week low is ~90$. 225$ was hit when AMD was guiding ~3.5B in datacenter GPU revenue. Now, they're guiding to end the year at 5B+ datacenter GPU revenue, but the stock is ~140$?

I think it's because of how early Nvidia announced Blackwell (it isn't any meaningful volume yet), and the market thinks AMD needs to compete with GB200 while they're actually competing with H200 this quarter. And for whatever reason the market thinks that AMD will get zero AI growth next year? I don't know how to explain the stock price.

Anyway, they hit record quarterly revenue this Q3 and are guiding to beat this record by ~1B next quarter. Price might move a lot based on how AMD guides for Q1 2025.


> That's not how it works.

Being fabless does have an impact because it caps AMD's margins and makes x86 their only moat. They can only extract value if they remain competitive on price. Sure that does not impact Nvidia, but they get to have fat margins because they have virtually no competition.

> The market is completely irrational on AMD. Their 52-week high is ~225$ and 52-week low is ~90$.

That's volatility not irrationality. As I wrote AMD's valuation is built on the basis that they will keep executing in the DC space, Intel will keep shitting the bed and their MI series will eventually be competitive with Nvidia. These facts make investor skittish and any news about AMD causes the stock to move.

> the market thinks AMD needs to compete with GB200 while they're actually competing with H200 this quarter. And for whatever reason the market thinks that AMD will get zero AI growth next year?

The only hyperscaler that picked up MI300X is Azure and they GA'ed it 2 weeks ago, both GCP and Azure are holding off. The uncertainty on when (if) it will catch on is a factor but the growing competition from those same hyperscaler building their own chip means that the opportunity window could be closing.

It's ok to be bullish on AMD the same way that I am bearish on it, but I would maintain that the swings have nothing to do with irrationality.


> The only hyperscaler that picked up MI300X is Azure and they GA'ed it 2 weeks ago

What does “GA” mean in this context?

I’m usually pretty good at deciphering acronyms, but in this case, I have no idea.


Sorry the corporate lingo is eating into my brain.

GA means Generally Available. To GA something is a shorthand for "to make X generally available".


AMD keeps on projecting a message of: it's all about hardware.

Many "influencers" have been convinced that: it is all about software - especially in AI. (I happen to agree, but my opinion doesn't matter).

It doesn't matter how well a company is doing if they are targeting the wrong point - their future will be grim. And stock is all about the future.


> Performance per watt was better for Intel

No, not its not even close. AMD is miles ahead.

This is a Phoronix review for Turin (current generation): https://www.phoronix.com/review/amd-epyc-9965-9755-benchmark...

You can similarly search for phoronix reviews for the Genoa, Bergamo, and Milan generations (the last two generations).


You're thinking strictly about core performance per watt. Intel has been offering a number of accelerators and other features that make perf/watt look at lot better when you can take advantage of them.

AMD is still going to win a lot of the time, but Intel is better than it seems.


That is true, but the accelerators are disabled in all cheap SKUs and they are enabled only in very expensive Xeons.

For most users it is like the accelerators do not exist, even if they increase the area and the cost of all Intel Xeon CPUs.

This market segmentation policy is exactly as stupid as the removal of AVX-512 from the Intel consumer CPUs.

All users hate market segmentation and it is an important reason for preferring AMD CPUs, which are differentiated only on quantitative features, like number of cores, clock frequency or cache size, not on qualitative features, like the Intel CPUs, for which you must deploy different program variants, depending on the cost of the CPU, which may provide or not provide the features required for running the program.

The Intel marketing has always hoped that by showing nice features available only in expensive SKUs they will trick the customers into spending more for the top models. However any wise customer has preferred to buy from the competition instead of choosing between cheap crippled SKUs and complete but too expensive SKUs.


I think Intel made a strategic mistake in recent years by segmenting its ISA variants. E.g., the many flavors of AVX-512.

Developers can barely be bothered to recompile their code for different ISA variants, let alone optimize it for each one.

So often we just build for 1-2 of the most common, baseline versions of an ISA.

Probably doesn't help that (IIRC) ELF executables for the x86-64 System V ABI have now way to indicate precisely which ISA variants they support. So it's not easy during program-loading time to notice if your going to have a problem with unsupported instructions.

(It's also a good argument for using open source software: you can compile it for your specific hardware target if you want to.)


Wise customers buy the thing that runs their workload with the lowest TCO, and for big customers on some specific workloads, Intel has the best TCO.

Market segmentation sucks, but people buying 10,000+ servers do not do it based on which vendor gives them better vibes. People seem to generally be buying a mix of vendors based on what they are good at.


Intel can offer a low TCO only for the big customers mentioned by you, who buy 10000+ servers and have the force to negotiate big discounts from Intel, buying the CPUs at prices several times lower that their list prices.

On the other hand, for any small businesses or individual users, who have no choice but to buy at the list prices or more, the TCO for the Intel server CPUs has become unacceptably bad. Before 2017, until the Broadwell Xeons, the TCO for the Intel server CPUs could be very good, even when bought at retail for a single server. However starting with the Skylake Server Xeons, the price for the non-crippled Xeon SKUs has increased so much that they have been no longer a good choice, except for the very big customers who buy them much cheaper than the official prices.

The fact that Intel must discount so much their server CPUs for the big customers is likely to explain a good part of their huge financial losses during the last quarters.


Are generic web server workloads going to use these features? I would assume the bulk of e.g. EC2 spent its time doing boring non-accelerated “stuff”.


Intel does a lot of work developing sdks to take advantage of its extra CPU features and works with open source community to integrate them so they are actually used.

Their acceleration primitives work with many TLS implementations/nginx/SSH amongst many others.

Possibly AMD is doing similar but I'm not aware.


ICC, IPP, QAT, etc are definitely an edge.

In AI world they have OpenVINO, Intel Neural Compressor, and a slew of other implementations that typically offer dramatic performance improvements.

Like we see with AMD trying to compete with Nvidia software matters - a lot.


AMD is not doing similar stuff yet.


But those accelerators are also available for AMD platforms - even if how they're provided is a bit different (often on add-in cards instead of a CPU "tile").

And things like the MI300A mean that isn't really a requirement now either.


They are not, at the moment. Google "QAT" for one example - I'm not talking about GPUs or other add-in cards at all.


You might not be, but the parent poster is.

QAT is an integrated offering by Intel, but there are competing products delivered as add-in cards for most of the things it does, and they have more market presence than QAT. As such, QAT provides much less advantage to Intel than Intel marketing makes it seem like. Because yes, Xeon (including QAT) is better than bare Epyc, but Epyc + third party accelerator beats it handily. Especially in cost, the appearance of QAT seems to have spooked the vendors and the prices came down a lot.


I've only used a couple QAT accelerators and I don't know that field much... What relatively-easy-to-use and not-super-expensive accelerators are available around?


Is AMD behind hyperscaler in-house efforts? Outside of Google I don't think so.


In many cases they are doing their own designs (done by engineers they hired away from the various electronics companies) and having either TSMC or Intel fab the chips for them. They can design specifically for their workloads to get better price/performance than they can buy from AMD or Nvidia.


Oh, maybe also change the title? I flagged it because of the title/url not matching.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: