Mainline kernel now working on ROCK 5 ITX as a server

Hi,

I’ve been annoyed for quite a long time seeing mainline hang hard on my ROCK 5 ITX during boot since 6.10-rc-something, and without ever finding what the problem was, even disabling drivers, changing options, playing with various DTB and patches etc. Given that I’m still not seeing distros shipping mainline for this device, I thought this was just not yet supported.

Today after exchanging with Heiko Stübner who committed the DTS to mainline, he sent me his various settings and images, allowing me to test various combinations, and we could figure the problem. It was stupid but quite not trivial to spot.

It turns out that the Rockchip kernels are using ttyFIQ0 as the console, while mainline uses ttyS2. So if you reuse the working cmdline from the rockchip kernel, you end up with an apparent hard hang somewhere after usb/pcie initialization. It’s not trivial to spot, when using earlycon, because a lot of messages are emitted before switching to that faulty console. When it’s not properly set, it just switches to silent mode and then panics a bit later for other reasons.

Figuring this allowed me to retest previous versions (some 6.10 and 6.11-rc) and I found that these ones did not detect either SATA or NVME or both due to PCIe apparently not working fine. Heiko pointed me to very recent patches he sent for 6.13 which are available here, and with which PCIe works fine, including NVME+SATA.

In my case, the server is equipped with 4 SATA SSDs, one 10GbE NIC connected to M.2 and NVME storage on the other M.2 (via an adapter). In addition, /boot is on the eMMC. All of this now works fine, that’s super encouraging, I feel like I’ll soon be able to switch that system to production.

It’s also worth noting that the kernel’s defconfig now looks sufficient to support all of these (I just don’t use defconfig because I prefer to have storage drivers linked in the kernel instead of modules).

For more details, here’s what I’m loading in extlinux.conf:

label 612
        menu label 6.12.0-rc6-rknas-1
        linux ../6.12.0-rc6-rknas-1/Image
        #initrd ../initrd.img
        fdtdir ../dtb/
        append root=/dev/nvme0n1p2 console=ttyS2,1500000n8 debug splash loglevel=7 ro earlycon

Hoping this can help someone facing the same difficulties. Thanks to Heiko for this help, today my setup progressed a lot :wink:

4 Likes

I’ve battled consoles on a few different boards and yes they’re horrible to debug :slight_smile: That’s great news, I really want a rock5itx for my home server but my dealbreaker is needing to use forks of kernel/firmware.

1 Like

try this one, it helps for me:

root=/dev/nvme0XXX console=tty1 earlycon=uart8250,mmio32,0xfeb50000 console=ttyS2,1500000n8 rootwait ignore_loglev

It works on kernel from Collabora.

1 Like

you literally just described my exact target config with the 10gbe nic on the m.2 2280 and a dual sata to ngff adapter on the 2230 to use a sata ssd and have another port for 5th hdd likely a spare or parity for possible a low power always on nas solution using a server only distro, i didn’t plan on using the 2230 E key for nvme but i did get an adapter to try it tho, basically only wanted this for smb maybe some nfs and i was also going to look into iscsi options on this setup. ill admit i purchased the 16gb model without doing enough research to realize that the software and firmware support for this device was still very sparse as is most non raspberry pi SBCs but to my surprise roobi has become invaluable tool for testing all available images including 3rd party armbian ones, i cannot express how relived i am to not haf to deal with rkdevtool.
but so far i have yet to test an install and config that doesn’t present issues, first i was unable to instantiate the 10gb nic even tho it showed up in lspci on any images i tried so that went out first and i replaced it with a Realtek 8126 5gb m.2 nic that seemed to work at least in roobi to grab images off the net but in situ on a booted os it also was not instantiated as an available nic, i also tried a ngff 2230 8125 2.5gb nic just out of curiosity as i know that is the same nic used on both integrated nics and it acted the same way were it would work in roobi to grab a image with some errors that required restarting the download basically were it left off at failure and then was unusable in the installed env so i basically gave up and also all pcie devices used ended up having issues unless it was an nvme m.2 on the 2280 with spi+nvme boot that seemed to be stable enough but then there were the intermediate errors waking drives and also i had read on a few forums about some ASPM issues that may be causing some of the pcie issues but im no hardware guy so this analysis was completely over my head but i felt like a simple aspm issue wouldn’t present itself as staging peripheral issues idk.
so i swapped back in a Shenzhen special itx board that i picked up off aliexpress that has a aqc113 copper 10gb nic but only has a n100 SoC and tries to cram in way to many port options like dual m.2 2280 slots and a bunch of usb with some thunderbolt support and ends up not being able to supply enough bandwidth to the 10gb nic for its full capabilities but at least i could install a modern kernel and distro and have a usable low cost appliance. but alas it was not what i wanted for the venerable n2 by jonsbo, i wanted a overkill yet low power nas/firewall appliance… well just cuz lol, i already have all of these things as other stand alone x86 based appliances, actually a few of them as i have a problem stopping myself from buying things i see on my feeds but this whole ARM desktop/server experience had me intrigued that is why i purchased the rock 5 itx and i also have a milk-v variant coming that is absolutely arriving with no expectations from me and will likly just go into a cheap thin usff case and mounted on a wall and used for some tinkering and not deployed in any capacity other that to tinker with.
i was hopping someone who is smarter than me was looking into these kernel level issues on this itx arm board because this has the potential to become a very competitive alternative in the market not just for us nerds and our insatiable homelabs “needs” but for the general market were there is real demand for non proprietary or intel low power based overpriced nas solutions
im so glad to see someone actually working on this and i will on my next day off give myself a crash course on bootstrapping this 6.13 kernel, if i end up with a board that works as well as you have described here with your config i think i will have a satisfactory setup to redeploy this in that appliance and put that slow intel itx board back in the box to go onto a shelf lol, or into some other low cost server ill probably build once this project is done, i cant stop lol

also i have a m.2 to sfp+ 10gb nic coming from aliexpress that i also want to try, its intel based and not another aqc nic like these overpriced 10gb m.2 copper nics and if that works as well than that is the 10gb solution i will use, i have probably misplaced high hopes as the x520 nics are older than the marvel aqc ones so we shall see, im also ok using the 5gb nic from Realtek because i didn’t expect to have +/- 800Mbps rw performance on the hdd config as i don’t just want to use the same old raid0 setup that i use on a couple of my higher spec setups that can actually take advantage of on average 1.5Gpbs rw of 8 hdd raid0 array with 16tb exos drives but was instead going to use smaller 12tb HGST helium drives and a modest zfs pool with some parity that might actually stretch the legs of this rk3388 and have the whole appliance still be fairly low tdp and always available on the network and kind of itself being a network or at least as switch so i can start shutting down some other appliances when not being heavily leveraged lol

I would also like to see some m.2 SFP+ cards with 2x pcie 3.0 upstream. For now, there is no such card available (and there are many regular 4x pcie2.0 cards).

AQC113 is gen4x4: https://www.marvell.com/products/ethernet-adapters-and-controllers/fastlinq-edge-ethernet-controllers.html
And there are M.2 module with it:
https://www.aliexpress.com/item/1005007499639661.html

I’m using an older AQC107 which is gen3, and on the Rock 5 ITX it runs in gen3 x2:

0000:01:00.0 Ethernet controller: Aquantia Corp. AQtion AQC107 NBase-T/IEEE 802.
3an Ethernet Controller [Atlantic 10G] (rev 02)
        Subsystem: Aquantia Corp. Device 0001
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Step
ping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
...
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x2 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

So that’s fine :slight_smile:

Hmmm I didn’t notice you insisted on SFP+ (this one is RJ45), I thought your concern was more related to any NIC. SFP+ controllers are less frequent and often older. Intel’s 82599 has been ubiquitous for more than 10 years and is gen2. Maybe we’ll find new M.2 NICs based on i40e but this chip sucks a lot of power and might be a bit just for M.2.

As far as I tested some connections parts… cooper 10G uses more energy and produces more heat than SFP+. Also its easy to use 10G RJ45 inserts where that is needed so I decided to get fiber switch (all ports as sfp+) instead of cooper. Right now we have many cheap 2.5G switches with sfp+ one or two sfp+ uplinks.

For m.2 cards I found only this:


1Gbit sfp only :frowning:

And there are cards like those:
https://www.lr-link.com/product/industrialapplicationsnic/M2NIC.html

All based on old intel or mellanox x3 chips with no aspm support, sata power requirement. All are old and power hungry :confused:
It would be great to get something new woth better power efficiency. 8 port sfp+ switch idles at 6W, each sfp+ 10G port connects at 1.2W. why this is not possible as modern m.2 card? :slightly_smiling_face:

Same thing is missing for laptop, there is ACQ113 to thunderbolt/usb4 adapter, large with hot heatsink. I got really cheap (about $25!) 2x sfp+ + 4x 2.5G switch there. Would be awesome to connect it with DAC.

Nice find! I agree with you regarding power efficiency. Twisted pair is horrible at 10G speeds while DAC over SFP is great (and low latency). All the 10G in my home lab is made over SFP direct attached with no switch, except my PC and next file server (rock5 itx) on RJ45 via the LAN switch.

I think that the market of M.2 to SFP+ is so small that vendors employ good old designs that are super stable and known to work well, and available at a very low cost. You would have more chances to find a modern controller by looking for 25Gbps SGP+, as these controllers are generally Gen 3/4, and could run over two lanes at 10G speeds. But does such an adapter exist yet? I doubt. Also they’re hard to look for on search engines dur to them falling back to “2.5G”. Maybe trying SFP28 would help.

Another option is to use an M.2 to PCIe adapter and plug a low-profile card. But it takes way more
space and usually requries the Molex power adapter.

For now SFP+ comes to home users as devices like this one:

QNAP relesed such switch years ago, but it was really expensive. This one is about $25, that makes SFP+ available for everyone. 8x 2.5 + 1 10G SFP+ is about $50 now. This really makes sense to use SFP+ as uplink. And it’s cheap right now.
I hope we will see more devices with SFP+ ports. This is should not require to do big changes later with 25G and 40G networks.

As I mentioned - this year we have many affordable, cheap, home devices with SFP+, so I hope that other peripherals will follow. Not only m.2 but also with usb4/thunderbolt designs. Same as with 2.5G ethernet dongles - they were really expensive for some time until new chip for that was released.

This is still some option, in fact those lr-link are just low profile card with sata port for power and m.2 for data. Just bit optimized design.

I generally agree with your points. But you need to think about having a spare switch. I didn’t want to go SFP+ for now due to the fact that I’ve already had a dying switch quite a few times, and in such cases I’m very happy to pick whatever older switch I still have available to fix the network, even at a lower speed. With SFP+ connectivity, you need to replace your switch with an SFP+ one. And I know myself, even if I buy a second one as a spare, 2 months later it will be in prod :slight_smile:. Once SFP+ becomes more common at home (possibly thanks to such switch vendors), that’s no longer a problem.

Another thing to keep in mind is that direct-attached SFP cables are currently much more rigid than twisted pair. This may change of course, just like we’ve seen 3mm-thin twisted pair cables appear, and we may even imagine seeing their cost drop for long distances. We’re not there yet, but we can at least hope.

And it’s clear that in terms of cost per port and power draw, SFP+ is way better than RJ45. It may also succeed if SFP28 becomes accessible on such switches, and 25G NIICs become more ubiquitous in consumer devices.

SFP+ is really cheap today, I have some spare network stuff with it. In case anything die I’ll be able to adapt, also not all ports are used. For my main switch there is failover so no worries. Same for router (I have now AX and older AC as backup, this will be upgraded to BE soon). I’m also sure that SFP+ switches are really cheap now and I can’t say same thing for 10G cooper.

I just order some DACs and fiber cables. It’s still much cheaper than RJ45 for long distances, I paid about $8 for 30m LCLC OM4 cord. I don’t need more at home :slight_smile:

This depends on hardware availability, prices and power consumption.
I already had a chance to get 52x QFSP switch from work, this looked amazing in specs until I found out that it needs 650W on idle :wink: Of course this was pro grade hardware, not intended to use at home, but we just need hardware that is able to work on less. My first 2.5G switch takes 2x more power than the one after upgrade. Same for any new standard and I’m even not sure how long it will be possible to keep ethernet at reasonable power consumption at home. 2.5G is ok, 10G seems to be not that efficient, what about 25G 40G and 100G?

Honestly all this discussion regarding fiber vs copper is still a bit ahead of were im at considering I have yet to get an image installed that leverages these kernel tweaks to enable any of these pcie devices, I get both the Intel sfp+ nic and the AQC107 m.2 nic to show up as devices in lspci but I don’t see any devices enumerated in a network manager at all except the 2 realtek nics so I’m missing something more fundamental here with all the Debian images iv tried including all of them from armbian… I’ll admit it’s all a bit over my head here, I’m not used to going this low level to config a device tree as im aclimated to uefi based systems and i realy dont even know were to look, and even if u figured that out i would still need a walkthrough on recompiling the moddifed kernel… I know
to almost everyone here this is a trivial task but thats not a good faith sample of the market, we need a distro option that already has this 6.13 kernel instantiated so us smooth brains don’t haf to do all the leg work in order to use this crazy expensive SBC … Just sayin

If the NICs are listed in lspci, they should work (or you might be missing their modules but usually modules for detected PCI devices are automatically loaded). You can check using “lsmod” to see if you have atlantic (for aqc107) or probably i40e or 82599 for the intel NIC. If not, you may want to try to load them manually and check with “ip -br link” if they appear. Normally the only reason for them not to appear when they’re properly listed in lspci are either the absent modules, or serious PCI errors (which will then show up in “dmesg”). If everything is present, then that’s just a networkmanager issue, but here I won’t be able to help, that’s getting more “windowsy” and far too complicated for me :slight_smile: