hckrnws
>Requirements
>A will to live (optional but recommended)
>LLVM is NOT required. BarraCUDA does its own instruction encoding like an adult.
>Open an issue if theres anything you want to discuss. Or don't. I'm not your mum.
>Based in New Zealand
Oceania sense of humor is like no other haha
The project owner strongly emphasize the no LLM dependency, in a world of AI slope this is so refreshing.
The cheer amount of knowledge required to even start such project, is really something else, and prove the manual wrong on the machine language level is something else entirely.
When it comes to AMD, "no CUDA support" is the biggest "excuse" to join NVIDIA's walled garden.
Godspeed to this project, the more competition the less NVIDIA can continue destroying the PC parts pricing.
> The project owner strongly emphasize the no LLM dependency, in a world of AI slope this is so refreshing.
The project owner is talking about LLVM,a compiler toolkit, not an LLM.
They also said "hand written", implying that no LLMs whirred, slopped and moonwalked all over the project.
I mean.. I'm one of the staunchest skeptics of LLMs as agents, but they're amazing as supercharged autocomplete and I don't see anything wrong with them in that role. There's a position between handwritten and slopped that's pareto.
When I want to autocomplete my code with IP scraped from github with all licensing removed, nothing beats an LLM.
They can take away our jobs, but by god they cannot take away our autism!
It's actually quite easy to spot if LLMs were used or not.
Very few total number of commits, AI like documentation and code comments.
But even if LLMs were used, the overall project does feel steered by a human, given some decisions like not using bloated build systems. If this actually works then that's great.
Since when is squashing noisesum commits an AI activity instead of good manners?
The first commit was 17k lines. So this was either developed without using version control or at least without using this gh repo. Either way I have to say certain sections do feel like they would have been prime targets for having an LLM write them. You could do all of this by hand in 2026, but you wouldn't have to. In fact it would probably take forever to do this by hand as a single dev. But then again there are people who spend 2000 hours building a cpu in minecraft, so why not. The result speaks for itself.
> The first commit was 17k lines. So this was either developed without using version control or at least without using this gh repo.
Most of my free-time projects are developed either by my shooting the shit with code on disk for a couple of months, until it's in a working state, then I make one first commit. Alternatively, I commit a bunch iteratively, but before making it public I fold it all into one commit, which would be the init. 20K lines in the initial commit is not that uncommon, depends a lot on the type of project though.
I'm sure I'm not alone with this sort of workflow(s).
Can you explain the philosophy behind this? Why do this, what is the advantage? Genuinely asking, as I'm not a programmer by profession. I commit often irrespective of the state of the code (it may not even compile). I understand git commit as a snapshot system. I don't expect each commit to be pristine, working version.
Lot of people in this thread have argued for squashing but I don't see why one would do that for a personal project. In large scale open source or corporate projects I can imagine they would like to have clean commit histories but why for a personal project?
I do that because there's no point in anyone seeing the pre-release versions of my projects. They're a random mess that changed the architecture 3 times. Looking at that would not give anyone useful information about the actual app. It doesn't even give me any information. It's just useless noise, do it's less confusing if it's not public.
I don't care about anyone seeing or not seeing my unfinished hobby projects, I just immediately push to GitHub as another form of backup.
I don't care about backing up unfinished hobby projects, I just write/test until arbitrarily sharing, or if I'm completely honest, potentially abandoning it. I may not 'git init' for months, let alone make any commits or push to any remotes.
Reasoning: skip SCM 'cost' by not making commits I'd squash and ignore, anyway. The project lifetime and iteration loop are both short enough that I don't need history, bisection, or redundancy. Yet.
Point being... priorities vary. Not to make a judgement here, I just don't think the number of commits makes for a very good LLM purity test.
Comment was deleted :(
you should push to a private working branch- and freqently. But, when merging your changes to a central branch you should squash all the intermediate commits and just provide one commit with the asked for change.
Enshrining "end of day commits", "oh, that didn't work" mistakes, etc is not only demoralizing for the developer(s), but it makes tracing changes all but impossible.
> I don't expect each commit to be pristine, working version.
I guess this is the difference, I expect the commit to represent a somewhat working version, at least when it's in upstream, locally it doesn't matter that much.
> Why do this, what is the advantage?
Cleaner I suppose. Doesn't make sense to have 10 commits whereas 9 are broken half-finished, and 10 is the only one that works, then I'd just rather have one larger commit.
> they would like to have clean commit histories but why for a personal project?
Not sure why it'd matter if it's personal, open source, corporate or anything else, I want my git log clean so I can do `git log --short` and actually understand what I'm seeing. If there is 4-5 commits with "WIP almost working" between each proper commit, then that's too much noise for me, personally.
But this isn't something I'm dictating everyone to follow, just my personal preference after all.
> If there is 4-5 commits with "WIP almost working" between each proper commit, then that's too much noise for me, personally.
Yep, no excuse for this, feature branches exist for this very reason. wip commits -> git rebase -i master -> profit
> I guess this is the difference, I expect the commit to represent a somewhat working version,
On a solo project I do the opposite: I make sure there is an error where I stopped last. Typically I put in in a call to the function that is needed next so i get a linker error.
6 months later when I go back to the project that link error tells me all I need to know about what comes next
How does that work out if you want to use `git bisect` to find regressions or similar things?
I dont do bisects on each individual branch. I'll bisect on master instead and find the offending merge.
From that point bisect is not needed.
Fair enough. Thanks for the clarification. Personally, I think, everything before a versioned release (even something like 0.1) can be messy. But from your point I can see it that a cleaner history will have advantages.
Further, I guess if author is expecting contributions to the code in the future, it might be more "professional" for the commits to only the ones which are relevant.
My own projects, I consider, are just for my own learning and understanding so I never cared about this, but I do see the point now.
Regardless, I think it still remains a reasonable sign of someone doing one-shot agent-driven code generation.
One point I missed, that might be the most important, since I don't care about it looking "professional" or not, only care about how useful and usable something is: if you have commits with the codebase being in a broken state, then `git bisect` becomes essentially useless (or very cumbersome to use), which will make it kind of tricky to track down regressions unless you'd like to go back to the manual way of tracking those down.
> Regardless, I think it still remains a reasonable sign of someone doing one-shot agent-driven code generation.
Yeah, why change your perception in the face of new evidence? :)
I see the point.
Regarding changing the perception, I think you did not understand the underlying distrust. I will try to use your examples.
It's a moderate size project. There are two scenarios: author used git/some VCS or they did not use it. If they did not use it, that's quite weird, but maybe fine. If they did use git, then perhaps they squashed commits. But at certain point they did exist. Let's assume all these commits were pristine. It's 16K loc, so there must be decent number of these pristine commits that were squashed. But what was the harm in leaving them?
So these commits must have been made of both clean commits as well as broken commits. But we have seem this author likes to squash commits. Hmm, so why didn't they do it before and only towards the end?
Yes, I have been introduced to a new perception but it's the world does not work "if X, then not Y principles." And this is a case where the two things being discussed are not mutually exclusive like you are assuming. But I appreciate this conversation because I learnt importance and advantages of keeping clean commit history and I will take that into account next time reaching to the conclusion that it's just another one-shot LLM generated project. But nevertheless, I will always consider the latter as a reasonable possibility.
I hope the nuance is clear.
Or first thousand commits were squashed. First public commit tells nothing about how this was developed. If I were to publish something that I have worked on my own for a long time, I would definitely squash all early commits into a single one just to be sure I don't accidentally leak something that I don't want to leak.
>leak what
For example when the commits were made. I would not like to share publicly for the whole world when I have worked with some project of mine. Commits themselves could also contain something that you don't want to share or commit messages.
At least I approach stuff differently depending if I am sharing it with whole world, with myself or with people who I trust.
Scrubbing git history when going from private to public should be seen totally normal.
Hmm I can see that. Some people are like that. I sometimes swear in my commit messages.
For me it's quite funny to sometimes read my older commit messages. To each of their own.
But my opinion on this is same as it is with other things that have become tell-tale signs of AI generated content. If something you used to do starts getting questioned as AI generated content, it's better to change that approach if you find it getting labelled as AI generated, offensive.
Leak what?
If you have for example a personal API key or credentials that you are using for testing, you throw it in a config file or hard code it at some point. Then you remove them. If you don't clean you git history those secrets are now exposed.
Timestamps
a lot of ppl dont use git. and just chuck stuff in there willynilly when they want to share it.
people are to keen to say something was produced with an LLM if they feel its something they cannot produce themselves readily..
I would be very concerned about someone working on a 16k loc codebase without a VCS.
Can you prove that this is what happened?
this type of project is the perfect project for an llm, llvm and cuda work as harnesses, easy to compare.
What do you mean by harnesses?
agentic ai harness for harness (ai)
Says the clawdbot
It's quite amusing the one time I did not make an anti-AI comment, I got called a clanker myself.
I'm glad the mood here is shifting towards the right side.
This project very most definitely has significant AI contributions.
Don't care though. AI can work wonders in skilled hands and I'm looking forward to using this project
Hello! I didn't realise my project was posted here but I can actually answer this.
I do use LLM's (specifically Ollama) particularly for test summarisation, writing up some boilerplate and also I've used Claude/Chatgpt on the web when my free tier allows. It's good for when I hit problems such as AMD SOP prefixes being different than I expected.
I looked through several of the source files and if you had said it's 100% handrolled I would have believed you too.
It looks like a project made by a human and I mean that in a good way.
since nobody else seems to have said it, this is exciting! keep up the fun work!
> Oceania sense of humor is like no other haha
Reminded me of the beached whale animated shorts[1].
[1]: https://www.youtube.com/watch?v=ezJG0QrkCTA&list=PLeKsajfbDp...
> /* 80 keywords walk into a sorted bar */
https://github.com/Zaneham/BarraCUDA/blob/master/src/lexer.c...
LLVM, nothing to do with LLMs
> >LLVM is NOT required. BarraCUDA does its own instruction encoding like an adult.
> The project owner strongly emphasize the no LLM dependency, in a world of AI slope this is so refreshing.
"Has tech literacy deserted the tech insider websites of silicon valley? I will not beleove it is so. ARE THERE NO TRUE ENGINEERS AMONG YOU?!"
i loled hard in public transport
> LLVM is NOT required. BarraCUDA does its own instruction encoding like an adult.
This is not an advantage since you will now not benefit from any improvements in LLVM.
Nor will they be restricted by the LLVM design. That project is huge and generic trying to be everything and target everything (and takes ages to rebuild if you need some changes). Sometimes it's better to go simple and targeted - time will tell if that's the right choice.
Zluda used LLVM and ended up bundling a patched version to achieve what they wanted https://vosen.github.io/ZLUDA/blog/zluda-update-q4-2025/
> Although we strive to emit the best possible LLVM bitcode, the ZLUDA compiler simply is not an optimizing, SSA-based compiler. There are certain optimizations relevant to machine learning workloads that are beyond our reach without custom LLVM optimization passes.
That's a much better argumentation than "we did it because we are adults".
(except that it applies to Zluda, not necessarily this project)
I think Zig is trying to get rid of it as well, much harder to debug iirc.
> in a world of AI slope
The scientific term for this is “gradient descent”.
The Descent of (artificial) Man.
>A will to live (optional but recommended)
Ah I'm glad it's just optional, I was concerned for a second.
I'm still blown away that AMD hasn't made it their top priority. I've said this for years. If I was AMD I would spend billions upon billions if necessary to make a CUDA compatibility layer for AMD. It would certainly still pay off, and it almost certainly wouldn't cost that much.
They've been doing it all the time and it's called HIP. Nowadays it works pretty well on a few supported GPUs (CDNA 3 and RDNA 4).
Please. If HIP worked so well they would be eating into Nvidia's market share.
First, it's a porting kit, not a compatibility layer, so you can't run arbitrary CUDA apps on AMD GPUs. Second, it only runs on some of their GPUs.
This absolutely does not solve the problem.
HIP is just one of many examples of how utterly incompetent AMD is at software development.
GPU drivers, Adrenalin, Windows chipset drivers...
How many generations into the Ryzen platform are they, and they still can't get USB to work properly all the time?
AMD doesn't do USB, they source the controller IP from ASMedia, who also developed most of their chipsets.
it's astounding to me how many people pop off about "AMD SHOULD SUPPORT CUDA" not knowing that HIP (and hipify) has been around for literally a decade now.
Please explain to me why all the major players are buying Nvidia then? Is HIP a drop in replacement? No.
You have to port every piece of software you want to use. It's ridiculous to call this a solution.
Major players in China don't play like that. MooreThreads, Lisuan, and many other smaller companies all have their own porting kits, which are basically copied from HIP. They just port every piece of software and it just works.
If you want to fight against Nvidia monopoly, then don't just rant, but buy a GPU other than Nvidia and build on it. Check my GitHub and you'll see what I'm doing.
> Is HIP a drop in replacement? No.
You don't understand what HIP is - HIP is AMD's runtime API. it resembles CUDA runtime APIs but it's not the same thing and it doesn't need to be - the hard part of porting CUDA isn't the runtime APIs. hipify is the thing that translates both runtime and kernels. Now is hipify a drop-in replacement? No of course but because the two vendors have different architectures. So it's absolutely laughable to imagine that some random could come anywhere near "drop-in replacement" when AMD can't (again: because of fundamental architecture differences).
Who said "some random"? Read the whole thread. I was suggesting AMD invest BILLIONS to make this happen. You're aguing with a straw man.
I think you misunderstand what's fundamentally possible with AMD's architecture. They can't wave a magic wand for a CUDA compatibility layer any better than Apple or Qualcomm can, it's not low-hanging fruit like DirectX or Win32 translation. Investing billions into translating CUDA on raster GPUs is a dead end.
AMD's best option is a greenfield GPU architecture that puts CUDA in the crosshairs, which is what they already did for datacenter customers with AMD Instinct.
This is a big part of AMD still not having a proper foothold in the space: AMD Instinct is quite different from what regular folks can easily put in their workstation. In Nvidia-land I can put anything from mid-range gaming cards, over a 5090 to an RTX 6000 Pro in my machine and be confident that my CUDA code will scale somewhat acceptably to a datacenter GPU.
This is where I feel like Khronos could contribute, making a Compute Capability-equivalent hardware standard for vendors to implement. CUDA's versioning of hardware capabilities plays a huge role in clarifying the support matrix.
...but that requires buy-in from the rest of the industry, and it's doubtful FAANG is willing to thread that needle together. Nvidia's hedged bet against industry-wide cooperation is making Jensen the 21st century Mansa Musa.
I do not misunderstand.
Let's say you put 50-100 seasoned devs on the problem, and within 2-3 years, probably get ZLUDA to the point where most mainstream CUDA applications — ML training/inference, scientific computing, rendering — run correctly on AMD hardware at 70-80% of the performance you'd get from a native ROCm port. Even if its not optimal due to hardware differences, it would be genuinely transformative and commercially valuable.
This would give them runway for their parallel effort to build native greenfield libraries and toolkits and get adoption, and perhaps make some tweaks to future hardware iterations that make compatibility easier.
Before the "ZLUDA" project completion, they would be facing a lawsuit for IP infringement, since CUDA is owned by NVIDIA.
They would win, compatibility layers are not illegal.
Win against who? AMD is the one that asked them to take it down: https://www.tomshardware.com/pc-components/gpus/amd-asks-dev...
And while compatibility layers aren't illegal, they ordinarily have to be a cleanroom design. If AMD knew that the ZLUDA dev was decompiling CUDA drivers to reverse-engineer a translation layer, then legally they would be on very thin ice.
ROCm is supported by the minority of AMD GPUs, and is accelerated inconsistently across GPU models. 70-80% of ROCm's performance is an unclear target, to the point that a native ROCm port would be a more transparent choice for most projects. And even then, you'll still be outperformed by CUDA the moment tensor or convolution ops are called.
Those billions are much better-off being spent on new hardware designs, and ROCm integrations with preexisting projects that make sense. Translating CUDA to AMD hardware would only advertise why Nvidia is worth so much.
> it would be genuinely transformative and commercially valuable.
Bullshit. If I had a dime for every time someone told me "my favorite raster GPU will annihilate CUDA eventually!" then I could fund the next Nvidia competitor out of pocket. Apple didn't do it, Intel didn't do it, and AMD has tried three separate times and failed. This time isn't any different, there's no genuine transformation or commercial value to unlock with outdated raster-focused designs.
No I'm arguing with someone who clearly doesn't understand GPUs
> invest BILLIONS to make this happen
As I have already said twice, they already have, it's called hipify and it works as well as you'd imagine it could (ie poorly because this is a dumb idea).
Wow you're so very smart! You should tell all the llm and stablediffusion developers who had no idea it existed! /s
HIP has been dismissed for years because it was a token effort at best. Linux only until the last year or two, and even now it only supports a small number of their cards.
Meanwhile CUDA runs on damn near anything, and both Linux and Windows.
Also, have you used AMD drivers on Windows? They can't seem to write drivers or Windows software to save their lives. AMD Adrenalin is a slow, buggy mess.
Did I mention that compute performance on AMD cards was dogshit until the last generation or so of GPUs?
AMD did hire someone to do this and IIRC he did, but they were afraid of Nvidia lawyers and he released it outside of the company?
Just allow me to doubt that one (1) programmer is all AMD would need to close up the software gap to NVIDIA...
Are you suggesting that CUDA is the entirety of the "software gap", because it's a lot more than that. That seems like a strawman argument.
Andrzej Janik.
Starter at Intel working on it, they passed because there was no business there.
AMD picked it up and funded it from 2022. They stopped in 2024, but his contract allowed the release of the software in such an event.
Now it's ZLUDA.
Surely they could hire some good lawyers if that means they make billions upon billions? AFAIK there's nothing illegal about creating compatibility layers. Otherwise WINE would have shut down long ago.
Depends on what code they wrote. If they used LLMs to write it, it could contain proprietary nvidia parts. Someone would then have to review that, but can't, because maybe the nvidia code that came from the LLM isn't even public.
So the strategy to publish independently, wait and see if nvidia lawyers have anything to say about it, would be a very smart move.
The Oracle case was about the stubs of the API being considered copyrighted I believe. The argument wasn't that Google used any of their code, it was that by using the same functional names they were making a thought crime.
The last time something similar happened (Google vs Oracle), the legal battles lasted more than a decade. It would be a very bold decision by ARM to commit to this strategy (implement "CUDA" and fight it out in courts).
That means Google got to use Java for a decade. A decade-long legal battle is great news for whoever seems to be in the wrong, as long as they can still afford lawyers. Remember, they don't claw back dividends or anything.
Moving target, honestly just get PyTorch working fully (loads of stuff just doesn’t work on AMD hardware) and also make it work on all graphics cards from a certain generation. The matrix of support needed GFX cards, architectures and software together is quite astounding but still yes that should have at least that working and equivalent custom kernels.
The headline that PyTorch has full compatibility on all AMD GPUs would increase their stock by > $50 billion overnight. They should do it even if it takes 500 engineers and 2 years.
Does anybody really understand why this hasn‘t been done? I know about ongoing efforts but is it really THAT difficult?
You know it’s probably a combination of things but mostly that AMD do not have a capable software team… probably not the individuals but the managers likely don’t have a clue.
That would be a great start.
it's so funny to watch the people who pearl clutch over AI expose that they don't even know the difference between LLVM and LLM rofl
Unrelated: just returned from a month in NZ. Amazing people.
Hope you enjoyed it!!
"If this doesn't work, your gcc is broken, not the Makefile." ... bruh.. the confidence.
> The project owner strongly emphasize the no LLM dependency, in a world of AI slope this is so refreshing.
Huh? This is obvious AI slop from the readme. Look at that "ASCII art" diagram with misaligned "|" at the end of the lines. That's a very clear AI slop tell, anyone editing by hand would instinctively delete the extra spaces to align those.
Hello!
Didn't realise this was posted here (again lol) but where I originally posted, on the R/Compilers subreddit I do mention I used chatgpt to generate some ascii art for me. I was tired and it was 12am and I then had to spend another few minutes deleting all the Emojis it threw in there.
I've also been open about how I use AI use to people who know me, and I work with in the OSS space. I have a lil Ollama model that helps me from time to time, especially with test result summaries (if you've ever seen what happens when a Mainframe emulator explodes on a NIST test you'd want AI too lol, 10k lines of individual errors aint fun to walk through) and you can even see some Chatgpt generated Cuda in notgpt.cu which I mixed and mashed a little bit. All in all, I'm of the opinion that this is perfectly acceptable use of AI.
>No LLVM. No HIP translation layer. No "convert your CUDA to something else first." Just ......
Another obvious tell.
https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing#...
https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing#...
Oh gosh, Emdashes are already ruined for me and now I can't use that to? I've already had to drop boldface in some of my writings because it's become prolific too.
This is also just what I intentionally avoided when making this by the way. I don't really know how else to phrase this because LLVM and HIP are quite prolific in the compiler/GPU world it seems.
My mistake, sorry. HN has me feeling extra paranoid lately.
For what it's worth - for people who're in this space - this project is awesome, and I hope you keep going with it! The compiler space for GPUs really needs truly open-source efforts like this.
Code/doc generators are just another tool. A carpenter uses power tools to cut or drill things quickly, instead of screwing everything manually. That doesn't mean they're doing a sloppy job, because they're still going to obsessively pore over every detail of the finished product. A sloppy carpenter will be sloppy even without power tools.
So yeah, I don't think it's worth spending extra effort to please random HN commenters, because the people who face the problem that you're trying to solve will find it valuable regardless. An errant bold or pipe symbol doesn't matter to people who actually need what you're building.
TIL: I'm an LLM.
> This is obvious AI slop from the readme
I keep hoping that low-effort comments like these will eventually get downvoted (because it's official HN policy). I get that it's fashionable to call things AI slop, but please put some effort into reading the code and making an informed judgment.
It's really demeaning to call someone's hard work "AI slop".
What you're implying is that the quality of the work is poor. Did you actually read the code? Do you think the author didn't obsessively spend time over the code? Do you have specific examples to justify calling this sloppy? Besides a misaligned "|" symbol?
And I doubt you even read anything because the author never talked about LLMs in the first place.
My beef isn't with you personally, it's with this almost auto-generated trend of comments on HN calling everyone's work "AI slop". One might say, low-effort comments like these are arguably "AI slop", because you could've generated them using GPT-2 (or even simple if-conditionals).
I was responding to the person I was replying to, who confused LLVM with LLM, and who had brought up the slop term. I was surprised that they didn't think it was slop, because of the obvious tells (even with the fixed diagram formatting, there's a lot about that README and ASCII art that say that it was generated by or formatted by an LLM).
One of the reasons that slop gets such an immediate knee-jerk reaction, is that it has become so prolific online. It is really hard to read any programming message boards without someone posting something half baked, entirely generated by Claude, and asking you to spend more effort critiquing it than they ever did prompting for it.
I glanced through the code, but I will admit that the slop in the README put me off digging into it too deeply. It looked like even if it was human written, it's a very early days project.
Yeah, calling something slop is low effort. It's part of a defense mechanism against slop; it helps other folks evaluate if they want to spend the time to look at it. It's an imperfect metric, especially judging if it's slop based only on the README, but it's gotten really hard to participate in good faith in programming discussions when so many people just push stuff straight out of Claude without looking at it and then expect you to do so.
While I would not call this AI slop, the probability that LLMs were used is high.
> It's really demeaning to call someone's hard work "AI slop".
I agree. I browsed through some files and found AI-like comments in the code. The readme and several other places have AI-like writing. Regarding author not spending time on this project, this is presumably a 16k loc project that was commited in a single commit two days ago. So the author never commited any draft/dev version in the time. I find that quite hard to believe. Again my opinion is that LLMs were used, not that the code is slop. It may be. It may not be.
Yes this whole comment chain is the top comment misreading LLVM as LLMs which is hilarious.
> My beef isn't with you personally, it's with this almost auto-generated trend of comments on HN calling everyone's work "AI slop".
Now this doesn't necessarily is about this particular project but if you post something on a public forum for reactions then you are seeking the time of the people who will read and interact with it. So if they encounter something that the original author did not even bother to write, why should they read it? You're seeing many comments like that because there's just a lot of slop like that. And I think people should continue calling that out.
Again, this project specifically may or may not be slop. So here the reactions are a bit too strong.
Hello, I’m the project author. I don’t think In any of this and some of the criticisms I’ve received on this forum have people realised I’m not the original poster. I posted this on R/compilers and as of now that’s pretty much it. In terms of the comments. I use intellisense from time to time, I put my own humour into things and because that’s who I am. I’m allowed to do these things.
I’m self taught in this field. I was posting on R/compilers and shared this around with some friends who work within this space for genuine critique. I’ve been very upfront with people on where I use LLMs. It’s actually getting a bit “too much” with the overwhelming attention.
I understand your position. If I were in your place where someone else posted my personal (?)/hobby project on a public forum where it got discussed more on the point if it was LLM generated or not rather than the more interesting technical bits, I would also be frustrated.
Regarding the writing style, it's unfortunate that LLMs have claimed a lot of writing styles from us. My personal opinion is to avoid using these AI-isms but I completely get that for people who wrote like that from the start, it's quite annoying that their own writing is now just labelled as LLM generated content.
> this is presumably a 16k loc project that was commited in a single commit two days ago. So the author never commited any draft/dev version in the time
It's quite common to work locally and publish a "finished" version (even if you use source control). The reasons can vary, but I highly doubt that Google wrote Tilt Brush in 3 commits - https://github.com/googlevr/tilt-brush
All I'm saying is assuming everyone one-shots code (and insulting them like people do on HN), is unnecessary. I'm not referring to you, but it's quite a common pattern now, counter to HN's commenting guidelines.
> found AI-like comments in the code
Sure, but respectfully, so what? Like I posted in a [separate comment](https://news.ycombinator.com/item?id=47057690), code generators are like power tools. You don't call a carpenter sloppy because they use power tools to drill or cut things. A sloppy carpenter will be sloppy regardless, and a good carpenter will obsess over every detail even if they use power tools. A good carpenter doesn't need to prove their worth by screwing in every screw by hand, even if they can. :)
In some cases, code generators are like sticks of dynamite - they help blow open large blocks of the mountain in one shot, which can then be worked on and refined over time.
The basic assumption that annoys me is to assume that anyone who uses AI to generate code is incompetent and that their work is of poor quality. Because that assumes that people just one-shot the entire codebase and release it. An experienced developer will mercilessly edit code (whether written by an AI or by a human intern), and edit it until it fits the overall quality and sensibility. And large projects have tones of modules in them, it's sub-optimal to one-shot them all at once.
For e.g. with tests, I've written enough tests in my life that I don't need to type every character from scratch each time. I list the test scenarios, hit generate, and then mercilessly edit the output. The final output is exactly what I would've written anyway, but I'm done with it faster. Power tool. The final output is still my responsibility, and I obsessively review every character that's shipped in the finished product - that is my responsibility.
Sure plenty of people one-shot stuff, just like plenty of Unity games are asset flips, and plenty of YouTube videos are just low-effort slop.
But assuming everything that used AI is crap is just really tiring. Like [another commenter said](https://news.ycombinator.com/item?id=47054951), it's about skilled hands.
> something that the original author did not even bother to write
Again, this is an assumption. If I give someone bullet points (the actual meat of the content), and someone else puts them into sentences. Did the sentences not reflect my actual content? And is the assumption that the author didn't read what was finally written, and edit it until it reflected the exact intent?
In this case, the author says they used AI to generate the ASCII art in question. How does that automatically mean that the author AI-generated the entire readme, let alone the entire project? I agree, the knee-jerk reactions are way out of proportion.
Where do you draw the line? Will you not use grammar tools now? Will you not use translation tools (to translate to another language) in order to communicate with a foreign person? Will that person argue back that "you" didn't write the text, so they won't bother to read it?
Should we stop using Doxygen for generating documentation from code (because we didn't bother with building a nice website ourselves)?
Put simply, I don't understand the sudden obsession with hammering every nail and pressing every comma by hand, whereas we're clearly okay with other tools that do that.
Should we start writing assembly code by hand now? :)
I mostly I agree with what you said. Comparison with a google project is bad though. That's a corporate business with a lot of people that might touch that codebase. Why are you comparing that to someone's personal project?
Also I can see you and I both agree that it's disingenuous to call all LLM generated content slop. I think slop has just become a provocative buzzword at this point.
Regarding drawing the line, at the end, it comes down to the person using the tools. What others think as these tools become more and more pervasive will become irrelevant. If you as a person outsourced your thinking than it's you who will suffer.
In all my comments, I personally never used the word slop for this project but maintained that LLMs were used significantly. I still think that. Your other comparison of LLMs with things like doxygen or translation tools is puzzling to me. Also points about hammering every nail and every comma are just strawman. 5-6 years ago from today people used these things and nobody had any issues. There's a reason why people dislike LLM use though. If you cannot understand why it frustrates people, then I don't know what to say.
Also people do write assembly by hand when it is required.
> If you as a person outsourced your thinking than it's you who will suffer.
Using a code generator != outsourcing your thinking. I know that's the popular opinion, and yes, you can use it that way. But if you do that, I agree you'll suffer. It'll make sub-optimal design decisions, and produce bloated code.
But you can use code generators and still be the one doing the thinking and making the decisions in the end. And maintain dictatorial control over the final code. It just depends on how you use it.
In many ways, it's like being a tech lead. If you outsource your thinking, you won't last very long.
It's a tool, you're the one wielding it, and it takes time, skill and experience to use it effectively.
I don't really have much more to say. I just spoke up because someone who built something cool was getting beat up unnecessarily, and I've seen this happen on HN way too many times recently. I wasn't pointing fingers at you at any point, I'm glad to have had this discussion :)
> anyone editing by hand would instinctively delete the extra spaces to align those
I think as a human I am significantly more likely to give up on senseless pixelpushing like this than an LLM.
Parent confused LLVM with LLM
> and prove the manual wrong on the machine language level
I'll be the party pooper here, I guess. The manual is still right, and no amount of reverse-engineering will fix the architecture AMD chose for their silicon. It's absolutely possible to implement a subset of CUDA features on a raster GPU, but we've been doing that since OpenCL and CUDA is still king.
The best thing the industry can do is converge on a GPGPU compute standard that doesn't suck. But Intel, AMD and Apple are all at-odds with one another so CUDA's hedged bet on industry hostility will keep paying dividends.
[dead]
The first issue created by someone other than the author is from geohot himself.. the goat: https://github.com/Zaneham/BarraCUDA/issues/17
I would love to see these folks working together on this to break apart nvidia's strangehold on gpu market (which, according to internet, allows them to have an insane 70% profit margins, thereby, raising costs for all users, worldwide).
> # It's C99. It builds with gcc. There are no dependencies.
> make
Beautiful.
You gotta love it, simple and straight to the point.
Wouldn't it funny and sad if a bunch of enthusiasts pulled off what AMD couldn't :)
The lack of CUDA support on AMD is absolutely not that AMD "couldn't" (although I certainly won't deny that their software has generally been lacking), it's clearly a strategic decision.
Supporting CUDA on AMD would only build a bigger moat for NVidia; there's no reason to cede the entire GPU programming environment to a competitor and indeed, this was a good gamble; as time goes on CUDA has become less and less essential or relevant.
Also, if you want a practical path towards drop-in replacing CUDA, you want ZLUDA; this project is interesting and kind of cool but the limitation to a C subset and no replacement libraries (BLAS, DNN, etc.) makes it not particularly useful in comparison.
Even disregarding CUDA, NVidia has had like 80% of the gaming market for years without any signs of this budging any time soon.
When it comes to GPUs, AMD just has the vibe of a company that basically shrugged and gave up. It's a shame because some competition would be amazing in this environment.
What about PlayStation and Xbox? They use AMD graphics and are a substantial user base.
Because AMD has the APU category that mixes x86_64 cores with powerful integrated graphics. Nvidia does not have that.
Nvidia has a sprawling APU family in the Tegra series of ARM APUs, that span machines from the original Jetson boards and the Nintendo Switch all the way to the GB10 that powers the DGX Spark and the robotics-targeted Thor.
There has been a rumor that some OEMs will releasing gaming oriented laptops with Nvidia N1X Arm CPU + some form of 5070-5080 ballpark GPU, obviously not on x86 windows so it would be pushing the latest compatibility layer.
Aren't their APUs sufficient for a gaming laptop?
PlayStation and Xbox are two extremely low-margin, high volume customers. Winning their bid means shipping the most units of the cheapest hardware, which AMD is very good at.
Agreed on ZLUDA being the practical choice. This project is more impressive as a "build a GPU compiler from scratch" exercise than as something you'd actually use for ML workloads. The custom instruction encoding without LLVM is genuinely cool though, even if the C subset limitation makes it a non-starter for most real CUDA codebases.
ZLUDA doesn't have full coverage though and that means only a subset of cuda codebases can be ported successfully - they've focused on 80/20 coverage for core math.
Specifically:
CuBLAS (limited/partial scope), cuBLASLt (limited/partial scope), cuDNN (limited/partial scope), cuFFT, cuSPARSE, NVML (very limited/partial scope)
Notably Missing: cuSPARSELt, cuSOLVER, cuRAND, cuTENSOR, NPP, nvJPEG, nvCOMP, NCCL, OptiX
I'd estimate it's around 20% of CUDA library coverage.
They've already ceded the entire GPU programming environment to their competitor. CUDA is as relevant as it always has been.
The primary competitors are Google's TPU which are programmed using JAX and Cerebras which has an unrivaled hardware advantage.
If you insist on an hobbyist accessible underdog, you'd go with Tenstorrent, not AMD. AMD is only interesting if you've already been buying blackwells by the pallet and you're okay with building your own inference engine in-house for a handful of models.
Many projects turned out to be far better than proprietary because open-source doesn't have to please shareholders.
What sucks is that such projects at some point become too big, and make so much noise forcing big techs to buy them and everybody gets fuck all.
All it requires to beat proprietary walled garden, is somebody with knowledge and a will to make things happen. Linus with git and Linux is the perfect example of it.
Fun fact, BitKeeper said fuck you to the Linux community in 2005, Linus created git within 10 days.
BitKeeper make their code opensource in 2016 but by them, nobody knew who they were lol
So give it time :)
I think it was the other way around. It was the community that told bitkeeper to fuck off.
It all ended up good because of one mans genius but let's not rewrite history.
> couldn't
More like wouldn't* most of the time.
Well isn't that the case with a few other things? FSR4 on older cards is one example right now. AMD still won't officially support it. I think they will though. Too much negativity around it. Half the posts on r/AMD are people complaining about it.
Because FSR4 is currently slower on RDNA3 due to lack of support of FP8 in hardware, and switching to FP16 makes it almost as slow as native rendering in a lot of cases.
They're working the problem, but slandering them over it isn't going to make it come out any faster.
> Because FSR4 is currently slower on RDNA3 due to lack of support of FP8 in hardware, and switching to FP16 makes it almost as slow as native rendering in a lot of cases.
It works fine.
> They're working the problem, but slandering them over it isn't going to make it come out any faster.
You have insider info everyone else doesn't? They haven't said any such thing yet last I checked. If that were true, they should have said that.
> It works fine.
That is incorrect, and the FP8 issue is both the officially stated reason and the reason that the community has independently verified.
> You have insider info everyone else doesn't?
AMD has been rather open about it.
We have HIP at home.
Hah, the capitalization of the title of this post only just now made me realize why the GPU farm at my job is called "barracuda". That's pretty funny.
This is likely supremely naive but I would think the lift in getting coverage for an entire library to a target hardware's native assembly is largely a matter of mapping/translating functions, building acceptance tests and benchmarking/optimization - all three of those feel like they should be greatly assisted by LLM augmented workflows.
Is OpenCL a thing anymore? I sorta thought thats what is was supposed to solve.
But I digress, just a quick put around... I don't know what I'm looking at. But it's impressive.
> Is OpenCL a thing anymore?
I guess CUDA got a lot more traction and there isn't much of a software base written for OpenCL. Kind of what happened with Unix and Windows - You could write code for Unix and it'd (compile and) run on 20 different OSs, or write it for Windows, and it'd run on one second-tier OS that managed to capture almost all of the desktop market.
I remember Apple did support OpenCL a long time ago, but I don't think they still do.
Nah, it's all Vulkan now. SYCL is the spiritual successor to OpenCL but unfortunately it doesn't work reliably on anything.
Not familiar with CUDA development, but doesn't CUDA support C++ ? Skipping Clang/LLVM and going "pure" C seems to be quite limiting in that case.
Im parsing the features of c++ CUDA actually uses, not the full c++ spec as that would take a very large amount of time. The Compiler itself being written in c99 is just because that's how I write my C and is a separate thing.
What lift would be required to make a runtime-delayed compilation (like shader compilation in a game/emulator) and have this be the NVRTC component inside ZLUDA?
Comment was deleted :(
I'm also wondering this. The compiler itself is written in C99, but looking from the tests, it can parse some C++ features such as templates.
Honestly I'm not sure how good is LLVM's support for AMD GX11 machine code. It's a pretty niche backend. Even if it exists, it may not produce ideal output. And it's a huge dependency.
Quite good, it’s first party supported by AMD (ROCm LLVM, with a lot upstreamed as well) where it’s fairly widely used in production.
This project is a super cool hobby/toy project but ZLUDA is the “right” drop in CUDA replacement for almost any practical use case.
Real developer never depended on AI to write good quality code, in fact, the amount of slope code flying left and right is due to LLM.
Open-source projects are being inundated with PR from AIs, not depending on them doesn't limit a project.
That project owner seems pretty knowledgeable of what is going on and keeping it free of dependencies is not an easy skill. Many developers would have written the code with tons of dependency and copy/paste from LLM. Some call the later coding :)
LLVM and LLM are not the same thing
LLVM (Low Level Virtual Machine) != LLM (Large Language Model)
How feasible is it for this to target earlier AMD archs down to even GFX1010, the original RDNA series aka the poorest of GPU poor?
Hey, I am actually working on making this compatible on earlier AMD's as well because I have an old gaming laptop with an RX5700m which is GFX10. I'm reading up on the ISA documentation to see where the differences are, and I'll have to adjust some binary encoding to get it to work.
I mean this with respect to the other person though please don't vibe code this if you want to contribute or keep the compiler for yourself. This isn't because I'm against using AI assistance when it makes sense it's because LLMs will really fail in this space. Theres's things in the specs you won't find until you try it and LLMs find it really hard to get things right when literal bits matter.
I really like the minimal approach you've taken here - it's refreshing to see this built completely from the ground up and it's clearly readable and for me, very educational.
But help me understand something. BarraCuda does its own codegen and therefore has to implement its own optimisation layer? It's increbibly impressive to get "working" binaries, but will it ever become a "viable" alternative to nvidia's CUDA if it has to re-invent decades of optimisation techniques? Is there a performance comparison between the binaries produced by this compiler and the nvidia one? Is this something you working on as an interesting technical project to learn from and prove that this "can be done"? Or are you trying to create something that can make CUDA a realistic option on AMD GPUs?
Don't let anyone dissuade you, it's going to be annoying but it can be done. When diffusion was new and rocm was still a mess I was manually patching a lot to get a vii, 1030, then 1200 working well enough.
It's a LOT less bad than it used to be, amd deserves serious credit. Codex should be able to crush it once you get the env going
> No LLVM. No HIP translation layer. No "convert your CUDA to something else first."
What is the problem with such approaches?
> No HIP translation layer.
Storage capacity everywhere rejoices
Perusing the code, the translation seems quite complex.
Shout out to https://github.com/vosen/ZLUDA which is also in this space and quite popular.
I got Zluda to generally work with comfyui well enough.
This, this and this! Was really inspired by ZLUDA when I made this.
I was hoping AMD would keep making gaming cards, now that NVIDIA is an AI company. Somebody has to, right?
They have their APUs, as does Intel. I guess that, for gaming, they are adequate.
Nowadays, all you need is Vulkan 1.2 compliance and Linux to run most of Steam's library. A lot of AI-oriented hardware is usable for gaming.
At overpriced retail prices while the AI companies get their sweet deals. Hell no. Personal computing is being destroyed.
That's news to me. I just bought a 1080p gaming card for $50 a few weeks ago.
Running AI inference workloads on Nvidia GPUs , and the cost is a real pain point. Projects like this matter because GPU vendor lock-in directly affects what startups can afford to build. Would love to see how this performs on common inference ops like conv2d and attention layers.
No, please no! AMD GPUs are still somewhat affordable. Does this mean their cards become compatible with CUDA based AI software? Don't ruin the market for desktop GPUs completely, please don't. AI is costing me hundreds of extra Euros in hardware already. I hate this so much.
AMD should sponsor this. World need to get rid of this NVIDIA monopoly.
<checks stock market activity>
In the old days we had these kinds of wars with cpu instruction sets & extensions (SSE, MMX, x64,). In a way I feel that CUDA should be opened up & generalized so that other manufacturers can use it too, the same way cpu's equalled out on most intruction sets. That way the whole world won't be beholden to one manufacturer (Big Green) and would calm down the scarcity effect we have now. I'm not an expert on gpu tech, would this be something that is possible? Is CUDA a driver feature or a hardware feature?
Nice! It was only a matter of time until someone broke Nvidia's software moat. I hope Nvidia's lawyers don't know where you live.
This isn't a production grade effort though.
Love to see just a simple compiler in C with a Makefile instead of some amalgamation of 5 languages 20 libraries and some autotools cmake shit.
I think ChipStar is better. less IP issues
Note that this targets GFX11, which is RDNA3. Great for consumer, but not the enterprise (CDNA) level at all. In other words, not a "cuda moat killer".
Hello,
I'm not the one who posted to HN but I am the project author. I'm working my way into doing multiple architectures as well as more modern GPUs too. I only did this because I used LLVM to check my work and I have an AMD GFX 11 card on my partners desktop (Which I use to test on sometimes when its free).
If you do have access to this kind of hardware and you're willing to test my implementations on it then I'm all ears! (You don't have too obviously :-) )
I can give you access to MI300x. Also, reach out to Anush please. He commented on here.
Putting a registered trademark in your project's name is quite a brave choice. I hope they don't get a c&d letter when they get traction...
BarraCUDA is also a bioinformatics toolset? https://www.biocentric.nl/biocentric/nvidia-cuda-bioinformat...
Maybe a rename to Barra. Everyone will still get the pun :)
... or Baccaruda or Baba-rara-cucu-dada (https://youtu.be/2tvIVvwXieo)
Or bacaruda.
I wonder if they could change the name to Barracuda if pressed. The capitalization is all that keeps it from being a normal English word, right?
Are you thinking of Seagate Barracuda?
They mean the CUDA part
There's a lot of people in this thread that don't seem to have caught up with the fact that AMD has worked very hard on their cuda translation layer and for the most part it just works now, you can build cuda projects on amd just fine on modern hardware/software.
Nice repeat of history given that AMD started out emphasizing x86 compatibility with Intel's CPUs. It's a good strategy. And open sourcing it means it might be be adapted to other hardware platforms too.
Also in this world of accelerator programming, people are writing very specialized codes that target a specific architecture, datatype, and even input shape. So with that in mind, how useful is it to have a generic kernel? You still need to do all the targetted optimization to make it performant.
If you want portablitiy you need a machine learning compiler ala TorchInductor or TinyGrad or OpenXLA.
What's the benefit of this over tinygrad?
Completely different layer; tinygrad is a library for performing specific math ops (tensor, nn), this is a compiler for general CUDA C code.
If your needs can be expressed as tensor operations or neural network stuff that tinygrad supports, might as well use that (or one of the ten billion other higher order tensor libs).
Will this run on cards that don’t have ROCM/latest ROCM support? Because if not, its only gonna be a tiny subset of a tiny subset of cards that this will allow cuda to run on.
Yes. It outputs a hsaco binary that just runs on the GPU (as long as you have the driver). No ROCm needed.
Wow!! Congrats to you on launch!
Seeing insane investments (in time/effort/knowledge/frustration) like this make me enjoy HN!!
(And there is always the hope that someone at AMD will see this and actually pay you to develop the thing.. Who knows)
I don’t understand the elitism about avoiding LLMs.
Good luck -
He's avoiding LLVM which is a compiler framework. Not LLMs as has been stated a few times in the comments already.
Great work!
See also: https://scale-lang.com/
Write CUDA code. Run Everywhere. Your CUDA skills are now universal. SCALE compiles your unmodified applications to run natively on any accelerator, ending the nightmare of maintaining multiple codebases.
[dead]
[dead]
Crafted by Rajat
Source Code