PENTA SATA KIT - Fan randomly stops causing overheating

Hi,

I recently purchased a ROCK Pi 4A+ and the corresponding PENTA SATA KIT. I got it all assembled and set up running Armbian 21.08.1 and OMV5. I ran the script for the fan and OLED display. The OLED display works flawlessly, but the fan is giving me some trouble. My rockpi-penta.conf is:

[fan]

When the temperature is above lv0 (35’C), the fan at 25% power,

and lv1 at 50% power, lv2 at 75% power, lv3 at 100% power.

When the temperature is below lv0, the fan is turned off.

You can change these values if necessary.

lv0 = 25
lv1 = 35
lv2 = 45
lv3 = 50

[key]

You can customize the function of the key, currently available functions are

slider: oled display next page

switch: fan turn on/off switch

reboot, poweroff

If you have any good suggestions for key functions,

please add an issue on https://setq.me/rockpi-penta

click = slider
twice = switch
press = none

[time]

twice: maximum time between double clicking (seconds)

press: long press time (seconds)

twice = 0.7
press = 7.0

[slider]

Whether the oled auto display next page and the time interval (seconds)

auto = true
time = 10

[oled]

Whether rotate the text of oled 180 degrees, whether use Fahrenheit

rotate = true
f-temp = false

While idle, the pi runs about 30-40C. While under load it quickly jumps to about 50C. The problem is despite these temperatues, after a while the fans slow and then stop (it actually just happened again while I type this). I have to manually double press the button to start the fans back up. And if I forget or I’m out of the room, the device quickly heats up to 80+C.

What do I need to check or do to troubleshoot or fix this issue?

Thanks,
Jaysin

You indicate that you have a Rock Pi 4A+, so there is only a top-board fan, and the CPU is cooled primarily by the case bottom (a hefty aluminum heatsink) and secondarily by the air being pulled through the heat sink slots and the case holes. Did you use the heat sink pad or some grease to better thermally bond the processor to the heat sink? It may be possible that you don’t have good thermal contact and that is why it is getting that hot (other than the fan). Is the heat sink noticeably warm to the touch, when the processor indicates it is hot?

When you are seeing the fan slowing and stopping, does the display indicate that the fan is supposed to be running at a high speed, or does the display indicate the same kind of speed that the fan is doing?

I am presuming that the fan is fairly quiet, when running, and does not exhibit noise or is at all difficult to spin with a finger when it is off. Just to ensure it isn’t a bad fan.

Hey oket, I will admit I didn’t notice the thermal pad when building it, it looks like I tossed it initially somehow, the youtube assembly video doesn’t mention the pad so I didn’t think to look for one, that’s my bad. It’s almost done building a RAID, so I will report back once the RAID is built and I have the pad installed. Heatsink is warm, but not 50+C, so I think you’re right about the thermal contact.

When the fan stops (it just did again, I manually restart it by pressing the button twice) the OLED display does not indicate anything pertaining to it. The info it shows that’s relevant is the CPU temp, which is sitting around 50C right now. When it stops, CPU temps rise very fast, but again, I don’t have the thermal pad installed.

The fan itself isn’t silent, its actually a little louder than I expected when running at what I assume is full speed. When it was cool, before the RAID build after setting it up, it was a lot more silent. When I was building it I did play with the fan a bit and it was easy to spin. If the thermal pad doesn’t do it, I’ll look into replacing the fan just in case.

Strangely, it just stopped again (second time now writing this) and started back up on its own. I’ll keep an eye on it until its done its building, take it apart, get that thermal pad in there (would it be more ideal to use thermal grease instead of the pad? I have some lying around) and I’ll update again.

Thanks for the help, hopefully I’m not wasting your time with my silly mistake.

I used the pad when I assembled, rather than doing grease on the assumption that grease the thickness of the pad (filling the gap) might not conduct heat as well as the pad. It seemed as if the pad left no gaps, but I don’t know how much it might have compressed (which is good) since I haven’t disassembled since I assembled it.

The stock software is running the fan against the CPU temperature. I have updated software to get more information on the oled (ala @RayMondDrakon) have the ability to pick up the disk temperatures to also use to control the fan, since it is evacuating the disk vault more than it is needed to cool the CPU.

I have posted that software at: Penta SATA HAT enhanced display (oled, I/O, temp) and operation (button, fan) (though I have updated a bit after that). It has been solidly plugging along for me.

I have both a Quad and a Penta, and actually prefer the Penta since it does not need an explicit fan, and has the always-on controllers, so stopping and starting the service does not drop the controllers.

My Penta, running 4 2-TB seagates in raid-5) is the working storage, the Quad is a play system sharing single disks.

Well I feel really stupid. The difference once the pad was installed was night and day. As soon as I got it back together I noticed it sitting cool around 30C for the CPU.

I went ahead and applied your updates to the software, and I’m very pleased with the performance and added functionality to the OLED and fan controls, so thank you greatly for that.

I am getting the gpio146 message, in your other post you suggest using as 10K resistor rather than the 1K you initially used. Would that still be the case? Do I forfeit anything major by not applying this fix? I know how to solder, but if its for something minor I’d rather just leave it.

My penta is also running 4x 2TB seagates in RAID 5 with the plan of using a 12+TB via eSATA for backing up my data.

Thank you again for your help. This is a huge step up from the Raspberry Pi 4B I was using with an old external USB hard drive.

Thanks very much, I am glad you liked it. It is interesting that the button worked for you, since you could control the fan on/off without the resistor, it certainly didn’t for me without the passive pullup. So if it is working, and you can control the fan and other things you wouldn’t need the resistor.

If you can’t get the button to work then see if the resistor will do it for you. 10K or 1K aren’t too much different, since they only draw current when the button is pressed. 1K draws about 3milliamps, 10K is just .3milliamps. I just grabbed a 1K and didn’t look through my parts beyond that.

I’d be interested to know if yours continues to work with the button and does not need the resistor. If so I’ll take a look at code differences and see if I did anything silly.

Did the fan still exhibit its problems, or did it work without problems with the new code?

So the button is working without issue. I’ve tested each function and its working. I still have the gpio146 message when I check the status of rockpi-penta.service.

The fan however, is still stopping when it should be running. It stopped running a little bit ago (I was away for a bit and didn’t notice) and both the drives and CPU had raised to 50+C and the fan was at 0%.

It was working on some tasks in the background, which I did notice was causing a rise in temps, but the fan was running (~50-70% speed) and seemed to have it under control. I left the room, came back and the fan was off. I checked the temps and each were at 50+C. I double pressed the button and the fan kicked on at 100%.

I guess I’ll keep researching, if you know of anything I should be looking into, let me know. I may look into getting a new fan and try that just in case. Thanks again.

Are you pushing the button only twice to get the fan back on, or twice (turn off), then twice again (turn on).

If the oled fan % is reading 0% when you notice that the fan is not turning, and you only have to press twice to turn it back on, this suggests that the fan thought that it was turned off explicitly and you are only turning it back on.

You might try to disable the fan entry in the conf file, so that you could not turn it off, and then see if it is still turning off on its own, or if somehow it is seeing possible noise and acting as if the button has been pressed.

Change: twice = switch to twice = none, and then see if the fan stays on on its own. If so, that means that the code is getting a signal that makes it think the button has been pressed twice. Possibly you do need the resistor then for reliable pullup on the switch to stop electrical noise and false readings.

I wasn’t sure why one would want to turn off the fan in the first place. It is even more questionable for the Quad hat, since it also turns off the CPU fan, and then you have a buried heat source that can’t cool itself even by convection.

I am only pressing it twice to turn it back on, I don’t have to turn it off first.

It stopped on its own about 15 minutes ago while I was watching a movie off it. I paused my film, made the change and restarted the service. Fan kicked on as expected, I’ll monitor it for the next few hours and update later.

I agree, the only thing I can see turning the fan off being useful for is if there’s a noise and you’re trying to isolate if its a hard drive or the fan. Other than that, I can’t think of any reason for a manual shutoff.

Just a quick update. I’ve noticed a couple things since the change, one of which I noticed before but I thought was a software issue.

  1. My OLED display sometimes changes before the configured 4 seconds have passed (I reverted the settings to what’s in your conf while I test). I noticed this when I first had issues but thought it was something to do with software. I kept an eye on it and timed it and it seems totally random, sometimes the full 4 seconds pass, but a lot of the time it changes under 4 seconds. So far it looks like you’re right and there are false readings.

  2. Since the change my fan has not stopped, but its much quieter as its running at a much lower speed. My CPU sits between 30-35C when idle, about 39-43C when under light load, and can reach upwards of 50C when under heavy load. The drives do seem to heat a little alongside the CPU, but they sit comfortable between 40-50C when in use, so I’m not worried there.

So, given the OLED is changing before the countdown sometimes, and the fans have ceased their stopping nonsense after the conf change, I think you’re 100% correct and the system is getting either electrical noise or false readings from the switch. At this time, I don’t need the manual on/off for the fan, and I don’t mind the OLED changing like that. However, for the sake of knowing, I will look into installing a resistor for reliable pullup and see if that makes a difference, it just may not be today.

Oket, thank you again for all your assistance, suggestions, and your sick update to the software.

Glad to hear it was useful. I had wondered if you might be getting single spike noise too, the random early advance sort of clinches that.

Tom