![]() |
|
![]() |
| I’m hardly surprised, this is primarily a programming discussion website, and the highest voltage read on an average day here is in mV. It’s natural to be leery of things you have no experience in. |
![]() |
| Actually, putting each PSU on its own circuit is crazy dangerous. In the scenario of your suggestion, if one goes out, you are in for a fire. Highly recommend against that. |
![]() |
| This is funny as a european, since we have many, many groups where we reguarly will run 2kW, and some, loads. Really no issue, but I guess lower voltage makes it a problem. |
![]() |
| A single 3090 won’t even fit the model. OP is talking about running a 405B model, it needs close to 200gb of memory just to open it which is why we’re talking about mac studio and it’s 192gb unified memory.
This is exactly like when the AMD fanboys got a burr up their ass about the “$50k Mac Pro” with 2tb of memory… when you could the same thing with a threadripper with 256gb of memory for $5k, and it’s just as fast in Cinebench! https://old.reddit.com/r/Amd/comments/f1a0qp/15000_mac_pro_v... your gaming scores on your 3090 with 24gb are just as irrelevant to this 200gb workload as the threadripper is to those Mac Pro workloads lol |
![]() |
| Yes though for inference memory bandwidth is more important than tflops. Not sure how Apple compares in that regard. The OP does train models too though which is more compute heavy. |
![]() |
| It's kinda hard to believe that someone would stumble onto the landmine of AI performance comparison between Apple Silicon and Nvidia hardware. People are going to be rude because this kinda behavior is genuinely indistinguishable from bad-faith trolling. From benchmarks alone, you can easily tell that the performance-per-watt of any Mac Studio gets annihilated by a 4090: https://browser.geekbench.com/opencl-benchmarks
If Apple Silicon was in any way a more scalable, better-supported or more ubiquitous solution, then OpenAI and the rest of the research community would use their hardware instead of Nvidia's. Given Apple's very public denouncement of OpenCL and the consequences of them refusing to sign Nvidia drivers, Apple's falling-behind in AI is like the #1 topic in the tech sector right now. Apple Silicon for AI training is a waste of time and a headache that is beyond the capacity of professional and productive teams. Apple Silicon for AI inference is too slow to compete against the datacenter incumbents fielded by Nvidia and even AMD. Until Apple changes things and takes the datacenter market seriously (and not just advertise that they are), this status quo will remain the same. Datacenters don't want to pay the Apple premium just so they can be treated like a traitorous sideshow. |
![]() |
| Yes and they get deep discounts which we don't. Can be 40% or more!
Of course the vendor can't make a profit with such discounts so they inflate the RRP. But we do end up paying that. |
![]() |
| Most of the cheap drives here are refurbs with questional quality. And those Exoses here are much more expensive sadly, especially if you choose only legit vendors on Amazon. |
![]() |
| "Why PCIe Risers suck and the importance of using SAS Device Adapters, Redrivers, and Retimers for error-free PCIe connections."
I'm a believer! Can't wait to hear more about this. |
![]() |
| An adjacent project for 8 GPUs could convert used 4K monitors into a borderless mini-wall of pixels, for local video composition with rendered and/or AI-generated backgrounds, https://theasc.com/articles/the-mandalorian
> the heir to rear projection — a dynamic, real-time, photo-real background played back on a massive LED video wall and ceiling, which not only provided the pixel-accurate representation of exotic background content, but was also rendered with correct camera positional data.. “We take objects that the art department have created and we employ photogrammetry on each item to get them into the game engine” |
![]() |
| I have a similar setup in my basement! Although its multiple nodes, with a total of 16x3090s. Also needed to install a 30A 240V circuit as well. |
![]() |
| That last part is often overlooked. This is also why sometimes it’s just not worth going local especially if you don’t need all that compute power beyond a few days. |
![]() |
| The motherboard has 7 PCie slots and there are 8 GPUs. So where does the spare one connect to? Is he using two GPUs in the same slot limiting the bandwidth? |
![]() |
| It’s an epyc server board, it probably has actual U.2/MCIO pcie ports on the board that can be merged back into a 16x slot in the bios. I had/have several boards like that. |
![]() |
| Folding at home does not use a blockchain, further proving non-grifters don’t need it. That was the point being discussed, not distributed computing as a concept. |
![]() |
| This is something people often say without even attempting to do a major AI task. If Mac Studio were that great they’d be sold out completely. It’s not even cost efficient for inference. |
![]() |
| You are probably swapping. On M3 max with similar memory bandwidth the output is around 4t/s which is normally on par with most people's reading speed. Try different quants. |
![]() |
| Even if you are running it constantly, the per token power consumption is likely going to be in a similar range, not to mention you'd need 10+ macs for the throughput. |
Part of the main reason I built this was data privacy, I do not want to hand over my private data to any company to further train their closed weight models; and given the recent drop in output quality on different platforms (ChatGPT, Claude, etc), I don't regret spending the money on this setup.
I was also able to do a lot of cool things using this server by leveraging tensor parallelism and batch inference, generating synthetic data, and experimenting with finetuning models using my private data. I am currently building a model from scratch, mainly as a learning project, but I am also finding some cool things while doing so and if I can get around ironing out the kinks, I might release it and write a tutorial from my notes.
So I finally had the time this weekend to get my blog up and running, and I am planning on following up this blog post with a series of posts on my learnings and findings. I am also open to topics and ideas to experiment with on this server and write about, so feel free to shoot your shot if you have ideas you want to experiment with and don't have the hardware, I am more than willing to do that on your behalf and sharing the findings
Please let me know if you have any questions, my PMs are open, and you can also reach me on any of the socials I have posted on my website.