3D Printed cluster case for Rock5b

PigLover · December 2, 2022, 7:02pm

I built a5-node cluster case for my Rock5Bs. Sharing in case anyone else needs something similar.

Powered by PoE using WaveShare high power PoE hats (type C) and a Hasivo 2.5g PoE switch. Running Armbian server load and K3s. Longhorn for storage. Biggest challenge getting it all going was the Armbian kernel didn’t have ISCSI support built in so Longhorn didn’t work - but it was really easy to follow their instructions and build my own image with ISCI included (I also included the Ceph kernel support just in case I want to test with that in the future).

Not the cheapest build I’ve done but the first time I’ve got an ARM SoC K8s cluster that has enough CPU/Ram/Network to be more than a science project and actually do useful work.

DarkevilPT · December 2, 2022, 7:22pm

Now that metal heatsink.

where did u get it ? is it efficient?

PigLover · December 2, 2022, 7:48pm

Its actually a piece of junk. I found a bunch of them in a parts bin - I think they are from an Odroid NanoPC-T4 or something like that. Its just a piece of badly cast aluminum with some fins.

I dremeled off the corner to allow the Waveshare PoE hat to fit then attached it with thermal tape and the screw in one corner. Printed a 1.8mm spacer to make it level. A real hack job - I tried not to show it in the pictures but I guess I missed one!

It is secure. It does the job. Stress testing all five SBCs in the case at 100% CPU load and continuous writes to the NVMe drive temps held ~59c or lower for over 2 hours. It may not be enough if I ever light up the NPU and GPU but the PoE hat will probably not be able to deliver enough power for that so not sure it matters. So this little piece of aluminum with small fins, the PoE hat fan and the 8010 exhaust fans do the job.

Longer term I’ll be getting the Radxa heatsink/fan and powering them via 12v from a power brick. Everything is all set up in the case for that and I showed how the pigtail will route in a couple of the pictures.

RadxaYuntian · December 5, 2022, 2:17am

Indeed a tru pig lover.

dominik · December 16, 2022, 7:53am

Thanks for sharing
I also plan to include some rock5 boards into my kubernetes cluster, also with longhorn and additional sata ssds (via m.2 e slot). I did not figure out power yet, but it’s nice to see that POE+ is working on Your cluster and 2.5G.
Have You don some stress tests already? what about stability? Earlier armbian version was hanging after hours (on idle), but this maybe changed.

BTW: nano pc t4 is still one of the best rk3399 boards, rock4 is smaller but still they are ahead of almost everything at that time Get it out of junk and just add to cluster

Etienne · December 16, 2022, 7:59am

Crazy setup. Just curious what personal usage a kubernetes cluster is for?

dominik · December 16, 2022, 8:21am

For any service You may need in home, starts from home assistant, all zigbee things and smart home automation, network tools (firewalls, adlocks), monitoring including AI (frigate), dashboards, then other services like media players, backup, nas, maybe personal hosting and dev server. This will become private cloud hosted at home. Of course You can easily buy most of that in ready to use devices or some cloud but You may be carefull what and where You share and where. Kubernetes is extremely elasting as well as resilient, it may be hard to start but sooner or later there will be more and more solutions based on that. I already saw devices that comes with linux and k8s just for one service on it.
Search for k8s at home and You will get ton of ideas

Etienne · December 16, 2022, 11:50am

Well I just wondered if author had something specifically in mind just to put 5 board together! (like maybe building a custom AWS or Azure like VM center at home ^^)
Because it would definitely be overkill for automation.
Even for my usage (development server) I think a single board will probably be enough considering how powerfull it is.
Currently I’m just starting to use docker for hosting my different services (like code-server, nginx, nodejs stuff, …), Probably just enough for now, will check if k8s can be of any use later.

dominik · December 16, 2022, 12:20pm

I would rather say that 5 is good start to get some level of resilience (odd number of masters) and flexibility to turn off something when it’s needed for maintenance or upgrade. Of course each node doesn’t need to be so powerful and I think it’s good to include smaller resources too for less demanding stuff like backups.
For sure there is never to much capacity on cluster And it’s different concept than only one, but powerful machine where You would not test any kind of scalability and single failure will kill everything

dominik · December 16, 2022, 12:30pm

Of course I can’t speak for @PigLover usage, but definitely it’s compact, cost and energy effective cluster with right amount of resources to run several services for home and dev environment with some scalability and redundancy. I’ll have 4x rock5 boards in my cluster as well as some other smaller and larger resources.

Etienne · December 16, 2022, 12:37pm

Was also thinking could be useful for distributed computing if you have heavy computation or compilation tasks.
if only we could addition graphics power so that we can render heavy games with that ^^

dominik · December 16, 2022, 12:49pm

Both is possible, just require right software to divide and assemble back results.
And yes, it’s possible to pass graphics resources for containers, I’ve done it once for product renders paired with gpu - that speeded up whole process about 1000x times
If we already have some game streaming services then it’s just a matter of time when there will be such solution for gaming on cluster.

Etienne · December 16, 2022, 12:56pm

Yeah game streaming services are already able to perform that I suppose.
Sending 60fps over network is quite unbelievable when think about it and heavy compression algorithms are probably used behind like for YT videos.
But keeping latency low is an exploit. Will see if someday opensource alternatives is available for that.

PigLover · December 16, 2022, 5:25pm

Did I say the NanoPC’s were in a junk drawer? The original heatsinks from them were tossed in a bin, but the devices themselves are humming away in a prior version of my 5-node cluster (proudly wearing FriendlyElec’s active heatsink that they released later).

This thing is all part of a journey to find the happy-place of SBC-based clusters. Started with rpi-4, but the poor storage options become a bottleneck. Built out 5-node NanoPC-T4 and its NVMe storage options made it worlds better - but it still stumbles as the 4gb RAM capacity proves too limiting for me due to the large-ish required memory footprint of Prometheus. So its still running but not doing anything useful right now.

No pictures of this one because my 3d design skills were still forming and the result is too ugly to share.

Next stop was a rpi CM4 based cluster. I wanted to use TuringPi-2 but their kickstarter is taking (explative)-forever to deliver. Even if they do manage to get something out this year the compute modules are just flat out unavailable right now.

I also did a 6-node CM4 based cluster using the DeskPi Super6c:

This one was better. 8gb ram each and an NVMe drive. Really works well, Kube-Prometheus-Stack runs nicely (though whichever node Prometheus gets launched on is still under some memory pressure). Longhorn is a bit high latency (not sure why but disk latency averages about 4ms with frequent jitters >50ms). Issue with this is the lack of access to each node for troubleshooting. If any node (other than node 1) faults the only available repair is to power-cycle the whole cluster. There is no way to reset them individually (ugh!). This defeats a lot of the resiliency value of clustering. This is complicated by the fact that the nodes seem to randomly drop their NMVe drive. No idea why this happens. Could be something about the CM4, the Super6c or rpi Linux. But it averages one node faulted/week and this makes it hard to commit “real work” that you might count on to the cluster.

I really believe that this cluster with the Rock5’s is hitting the sweet sport for me. So far its rock solid reliable. If I do need to work on any node I can plug in HDMI and a keyboard from the front. For more aggressive resets I just drop power from the node, which I can even do remotely from the PoE switch. With the 16gb memory footprint the monitoring stack just fits nicely. Longhorn seems to work with much better performance due, I think, to the combination of PCIe x4 and the 2.5g networking.

I am really happy with this result. I may do some experiments with adding the NanoPCs and Super6c nodes in as additional compute in the future.

PigLover · December 16, 2022, 5:29pm

I run a number of services in my home today under Docker on more traditional servers. HomeAssistant and its associated IoT stack services (ZwaveJS, Zigbee2mqtt, MosquitoMQTT, node-red, esphome, etc). I also have a monitoring stack to let me know when something isn’t happy, a few cameras, unifi controller for my switches/APs, etc., etc. I do plan to move most/all of that onto a cluster when I get happy with its performance - and this version might be “it”.

Mostly for now its just a massive curiosity project to understand k8s and its associated tools.

PigLover · December 16, 2022, 5:39pm

Just thought I’d add that before I commit “real work” to the cluster I really want a sustainable base software load to work with.

Radxa’s Linux is too far from ready and don’t really show any promise that they will become ready anytime soon.

Armbian is in good shape but as of today I have to run my own custom kernel in order to get Longhorn running (stock Armbian does not have the kernel module required to support iscsid and Longhorn requires that). I don’t want to have to continue maintaining a custom kernel build in order to maintain the cluster. For example, Armbian has recently fixed the “boot from NVMe” issue but in order for me to take advantage of that I have to rebuild and re-install my kernel. Without proper debs of my build to update from that means re-imaging each node. Not fun…

Before I put things I depend upon onto the cluster I need to reach the point that I can un-freeze the kernel and do “apt upgrade” without worrying that the node will break.

igorp · December 16, 2022, 5:43pm

Whatever you find that is missing, make a PR. Such things go in at once https://github.com/armbian/build/pulls

PigLover · December 16, 2022, 5:49pm

That is my plan. I want to make sure its fully stable running first, which is something I’m sure the Armbian team will appreciate!

Etienne · December 17, 2022, 9:40am

This answer seems the more realistic

Emmanuel_Nyachoke · January 5, 2023, 9:38am

@PigLover could you share the change you had to make to the Kernel to support ISCSI