Rock Pi 4a failure - only me? Problem solved


#1

I’m building a 48 node pi cluster and to make sure I get the highest reliability I’m testing a couple of pi brands with my application. This application is doing calculations 24/7 so all cores are at 100% all the time. I have two Rock Pi4a boards running together with original Pi3b+ and two versions of the Asus Tinkerboard - total 13 pi boards. Only board to fail is one of the Rock Pi 4a boards.

This setup is well cooled with heatsinks and fans. The core temp in the rock pi 4a is around 68-70 degrees celsius so that should not be any problem.

Anyone else had a board failing ?

I just ordered four more Rock Pi 4a boards to see if the failed board was just a unfortunate mishap. I really want to use this board because it’s the fastest board I found so far. But if reliability is a issue I will have to look elsewhere.


#2

Interesting. How are you going to connect them? Through ethernet? I’m wondering if it’s possible to make a USB chain, as I have Rock Pi 4 and Google Coral board connected through a 6 inch USB cable, the connection may go on and on, but need more Pi 4 to try out.


#3

By failing, are you referring hardware defects? So you mean it’s working before but not any more?


#4

Yes hardware failure, I had it running about a month but a couple of nights ago it died. I tried to connect a screen and keyboard but no HDMI signal and the simple test to turn caps-lock on/off to see if the LED on the keyboard react but nothing. Reflashed the emmc but no difference, tried the emmc from the working Rock Pi board but difference so I really think its dead for good. And yes the green LED is on so it has power :slight_smile:

I’ve made a backplane board with some intelligence built in. Each backplane has 16 Pi boards making it fit perfect in a 19" rack box (40cm wide), three backplane boards, one standard Pi3 used as management board and a 48+4 switch built in. Each backplane has a small ucontroller controlling 16 5V switches as there is no reset in the 20pin Pi header + it measure the board temp. The ucontroller on each backplane talks with the management Pi through SPI, the management board control the fans as well. My plan is to use syslog logging status from each Pi to the management board and as a ‘watch-dog’. If my application stops responding at one Pi and can’t be restarted or if one Pi stops responding totaly I can power-cycle each individual board that causes the trouble.

I got the backplane boards from the circuit board maker a couple of days ago so now I’ve got some work todo :slight_smile: Very simple design and no complex software to write and it’s only for fun, i’m not selling anything.


#5

For the defective unit, please ask your distributor for RMA. We will check what happened.


#6

Finaly I got some time to look at the faulty Rock pi. I run it through´a circuit board cleaner to be sure that dust and moist air was not causing the problem. Reflashed the old emmc again and … it’s alive :slight_smile: Dust and moist can do strange things when having the board running without any cover.

First backplane up and running, 2 original pi3b, 4 Asus Tinkerboard, 5 Asus Tinkerboard S and 5 Rock Pi 4. It has been running for three days now at 100% on all cores without any issues. The Rock Pi is clearly a winner when it comes to performance compared to the other boards . Comparing calculation speeds I get 68 calculations ready/hour with the Pi3b, 267/hour with the Tinkerboard and 342/hour with the Rock Pi4 making it a clear winner.

I had to modify the Rock Pi cooler to get the board density I wanted but it’s just a minor reduction in cooling. Now I have to put the other two backplanes together so I get my 48node ‘cluster’. Have to dig deep in my wallet to buy another 32 Rock Pi 4 boards :frowning: