Hacker Newsnew | past | comments | ask | show | jobs | submit | jauws's commentslogin

Thanks! Anecdotally, I'd tend to say that Claude 3.7 tends to improve the most, but it seems like (via the leaderboard), some people really prefer Grok-3 lol.


Thanks for the comment! Do you mind linking the site - would love to check it out! That's a very fair point about the technical error aspect. Though with all the confounding variables (author skill differences, model selection based on price/speed, etc.) I'd say it's probably the most mature signal we have right now, but still far from ideal.

Really interested in what you've been working on for the past year! Are you doing custom fine-tuning or more on the prompting/post-processing side? Also I definitely need to check out the Midjourney onboarding, it sounds super interesting for inspo regarding your point about personalization + taste!


My 2nd most recent submission has a link to it

Most of it has been fine-tuning (SFT/DPO/GRPO), but also a lot of prompting and adding steps between the user's prompt and the output


This is an amazing suggestion! Will definitely try to figure out a way to incorporate this into the leaderboard without making it a constant each time. I'm currently using OpenRouter's default parameters which is totally a brainfart on my part.


Thanks Johnny! I totally agree with you, really appreciate you for checking out my project!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: