ROCK 5B Debug Party Invitation

willy · July 26, 2022, 9:43pm

I could finally run some network tests with a 4x10G NIC on my M.2->PCIe adapter. The NIC is the common dual-82599 behind a PCIe bridge. The PCIe bridge is PCIe 3.0 x8 converting to two PCIe 2.0 x8 for the network controllers. That’s perfect because it can convert the 8GT x4 to 5GT x8 with limited losses. We have 32Gbps theoretical PCIe bandwidth (27G effective with the standard 128-byte payload).

[edit: I forgot the mandatory setup photo]

Well, to make a story short, that board is a serious bit mover! Please see the graphs below. I’ve exchanged HTTP traffic between the board and 3 other devices (I needed that to make it give up). One was my mcbin (10G) and the two other ones were HoneyComb LX2 (one 10G port each). There was not much point going further, due to the bus width.

I started a server (httpterm) on each board. First tests consisted in using the Rock5B as a client only (download test). It reached 26.1 Gbps stable over the 3 NICs (measured at the ethernet layer, as reported by the interface), and there was still 24% idle CPU, indicating I was saturating the communication channel. It’s very close to the limit anyway:

30g-single

Then I tested the traffic in bidirectional mode. Rock5B had 3 clients (one per interface) and each other machine also had one client attacking Rock5B. Result: 23.0 Gbps in each direction and 0% idle! So the CPU can almost saturate the PCIe in both directions at once while processing network traffic (probably that with larger objects it could, they were only 1MB, that’s almost 3000 requests/s when you think about it). There were 2.1 million packets/s in each direction. There were It’s visible on the graph that the Rx traffic dropped a little bit when the Tx traffic arrived:

30g-bidir

Finally I wanted to see what happens when the network definitely is the limit. I started the 20G bidirectional test (only two ports). It saturated both links in both directions, leaving 12% idle:

20g-bidir

A few other points, I used all 8 CPUs during these tests. They seem to complement well, as the 4 little cores are able to download 14.5 Gbps together, which is pretty good!

A final test (not graphed) consisted in attacking the board with multiple HTTP clients sending small requests. It saturated at 540000 requests/s, which is also excellent for such a device.

Finally there was nothing in dmesg along all the tests, so that’s pretty reassuring, and indicates that the instabilities I faced with the older Myricom NIC was related to the card itself.

Thus I’m pretty sure we’ll start to see network gear and NAS devices made around this chip for its PCIe capabilities and its memory bandwidth, which play in the yard of entry-level Xeons. That’s pretty cool.

NekoMay · July 27, 2022, 7:39am

There is no such thing as a “standard” CSI or DSI port; they’re deliberately left undefined by MIPI. It’s a major problem with the standard, IMO.

jack · July 27, 2022, 8:57am

Neither QQ group or Discord is a good place for technical discussion since they are not public search engine indexing.

I think why there is official QQ group or discord is social purpose.

kangdei · July 27, 2022, 3:09pm

Blockquote我十分赞同,QQ群,微信群,TG群都不是什么好的地方,讨论出来的问题和解决方法哪怕是群成员都不能及时看到,并且消息过多,更多的可能是直接清理掉了,在论坛的话,不仅仅是可以被搜索到,有同样问题并解决了的人可以风险出自己解决问题的办法,而不是不厌其烦的问群里面的技术比较好的人,技术比较好的人面对同样的问题也是会厌烦的,同时社区越活跃对RADXA有积极的正面影响,可以让RADXA成为下一个树莓派的可能,社区越活跃资源也将越丰富,同时我也提一个建议,就是能不能让社区的服务器运行的更流畅一些呀,国内的用户访问卡卡卡卡的,

willy · July 28, 2022, 6:02pm

I just received my PCIe USB3.1 adapter this evening. The model is “StarTech.com PEXUSB312A3” sold as “StarTech USB3.1 adapter with UASP support”, which shows up like this:

rock@rock-5b:~$ lspci -nn
0000:00:00.0 PCI bridge [0604]: Fuzhou Rockchip Electronics Co., Ltd Device [1d87:3588] (rev 01)
0000:01:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller [1b21:2142]
0004:40:00.0 PCI bridge [0604]: Fuzhou Rockchip Electronics Co., Ltd Device [1d87:3588] (rev 01)
0004:41:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 04)

Thus I could run the test again with the Crucial X8 SSD. TL;DR: it works perfectly now :

# dd if=/dev/sda of=/dev/null bs=1M status=progress
12495880192 bytes (12 GB, 12 GiB) copied, 15 s, 832 MB/s^C

Look at the “bi” column during the test, I’ve seen it reach 1 GB/s once:

rock@rock-5b:~$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 3505692  23376 132156    0    0 23475    18  463  450  1  4 94  1  0
 1  0      0 3505692  23376 132156    0    0     0     0 1226 1561  0  1 99  0  0
 2  0      0 3248208 281420 131996    0    0 192584    36 2904 2637  0  3 96  1  0
 2  0      0 2328076 1198868 134476    0    0 982980     0 9351 6751  0  8 87  4  0
 0  1      0 1408228 2116316 136324    0    0 917448     0 9566 6934  0  9 86  5  0
 1  1      0 488048 3033764 138620    0    0 917448     0 8981 6404  0  9 86  4  0
 2  0      0 246848 3276656 139924    0    0 775352     0 9026 6328  0 14 83  4  0
 2  0      0 250976 3271576 141448    0    0 765112     0 8572 6469  0 14 82  4  0
 2  0      0 243664 3277176 142964    0    0 742984    12 9057 6661  0 15 81  4  0
 0  1      0 260476 3257040 144916    0    0 832076     0 9725 7007  0 15 81  4  0
 2  0      0 245108 3272296 146544    0    0 774548     0 9136 7019  0 14 82  3  0
 1  0      0 251768 3264428 148080    0    0 816308     0 9735 7497  0 15 81  4  0
 0  1      0 276760 3236952 149696    0    0 733800     0 8483 6261  0 14 82  4  0
 1  0      0 263156 3249680 151388    0    0 896260     0 10610 8162  0 13 83  4  0
 2  0      0 247468 3263040 153240    0    0 769476     0 9244 6621  0 15 81  3  0
 0  1      0 261800 3247116 154872    0    0 811076     0 9356 6831  0 15 81  3  0
 3  0      0 242204 3263760 156572    0    0 787256     0 9312 6775  0 14 82  4  0
^C

So that tells us that:

the USB stack and xhci driver in the Rockchip BSP kernel are OK
the SSD indeed supports UAS well, just like the kernel
the issues when using the local USB ports (either USB2 or USB3) are definitely related to the rockchip USB controller or to the connection between it and the USB3 connectors. I tend to think that it’s very likely an issue within the driver itself given that it doesn’t particularly work better in USB2 (at least we should hope for this).

For those interested, here’s what the kernel says about the USB card:

[    2.729508] xhci_hcd 0000:01:00.0: xHCI Host Controller
[    2.729533] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 10
[    2.729555] xhci_hcd 0000:01:00.0: Host supports USB 3.1 Enhanced SuperSpeed
[    2.729652] usb usb10: We don't know the algorithms for LPM for this host, disabling LPM.
[    2.729785] usb usb10: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.10
[    2.729800] usb usb10: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    2.729813] usb usb10: Product: xHCI Host Controller
[    2.729825] usb usb10: Manufacturer: Linux 5.10.66-267861-g55f540ce97a3 xhci-hcd
[    2.729837] usb usb10: SerialNumber: 0000:01:00.0
[    2.730589] hub 10-0:1.0: USB hub found

what appears in kernel messages when I connect the SSD:

[   32.805961] usb 10-1: new SuperSpeedPlus Gen 2 USB device number 2 using xhci_hcd
[   32.833698] usb 10-1: New USB device found, idVendor=0634, idProduct=5600, bcdDevice= 1.00
[   32.833718] usb 10-1: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[   32.833726] usb 10-1: Product: Crucial X8 SSD
[   32.833732] usb 10-1: Manufacturer: Micron Technology Inc
[   32.833738] usb 10-1: SerialNumber: 2124E3297853
[   32.865379] scsi host0: uas
[   33.807846] scsi 0:0:0:0: Direct-Access     Micron   Crucial X8 SSD   0    PQ: 0 ANSI: 6
[   33.809603] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
[   33.809723] sd 0:0:0:0: [sda] Write Protect is off
[   33.809732] sd 0:0:0:0: [sda] Mode Sense: 43 00 00 00
[   33.809905] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   33.810593] sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes
[   33.837781]  sda: sda1
[   33.840421] sd 0:0:0:0: [sda] Attached SCSI disk
[   33.852544] vcc3v3_pcie2x1l0: disabling

tkaiser · July 29, 2022, 7:45am

Exactly.

Now that I made some noise few days ago at least the Armbian folks also took care about logging of their IRC bridged Discord crap.

And what can we see there now: https://libera.irclog.whitequark.org/armbian-rockchip/2022-07-29

Almost as many communication channels as persons involved. Minor tip from me: try to learn from your competitors like Hardkernel. If there happens a ‘debug party’ everyone involved is actually involved since everything happens in one place, is searchable and available even to people currently not involved. What happens here is plain stupid.

As for the USB PD charger shit show Willy mentioned: ‘For me the reboot at boot happens with the preloaded Debian image on the eMMC but not with the Ubuntu image booted on the SD, and only with one PSU.’

Personally I also had zero problems with the Ubuntu image available early July (that is not online any more) but had no luck with anything else I tested with (which was just a more recent Armbian image since the three Debian images I downloaded from releases section did not boot at all).

At least I think that’s an important data point allowing to bisect the problem. But keep continue using other communcation channels to ensure this ‘debug party’ remains as shitty as it is.

And if you want to learn another thing from e.g. Hardkernel: provide one single OS image that is known to work and enforce participants of whatever ‘party’ to use only this since what do you do with bug reports about ‘USB PD broken’ if you have not the slightest idea with which software (versions) testers are testing?

jack · July 29, 2022, 12:25pm

The image did not disappear, just collapsed by github.

Go to the 20220701-0727 section on https://github.com/radxa/debos-radxa/releases, the image is still there. We need to check what changed since that build.

tkaiser · July 29, 2022, 7:24pm

Thank you for the link.

I downloaded the initial Debian image, flashed it to SD card, tried it with an Apple 96W charger and ended up in a reboot loop (same with Armbian image): video

The very same Debian image that gets stuck in a reboot loop with this one Apple Charger boots happily with some 24W USB-C charger (and another two from my collection): video

With the initial Ubuntu image on eMMC (so just the other way around as @willy has tested) the board boots flawlessly also with the Apple charger: video

Now you know why Hardkernel only provides one single OS image to test new hardware.

BTW: I claimed not being able to boot any of the Debian images but that was caused by my own stupidity since I trusted into hardware. Ranning out of spare SD cards I tried to use an 16GB eMMC and an adapter you sent me years ago (either with very first Rock 4B or with your discontinued Rock960 Enterprise Edition board AKA Ficus):

This combination neither works in an SD card slot that tries to negotiate SDR104 nor in my USB3 card writer I found in the meantime. I flashed images way slower in the past (some crappy USB2 card reader) and seldomly experienced problems. Now with the USB3 thingy it reliably failed (using Etcher with verify) with this combo and as such I bought a few SanDisk 64GB Extreme A2 just to confirm that the adapter was the culprit and can’t be used with higher transfer rates.

Doesn’t change much for your next ‘debug party’ when you really should start to focus on an identical SW environment for all participants.

tkaiser · July 31, 2022, 7:32pm

2 days later…

What about those people having reported USB PD would be broken with Rock 5B…

What about taking a spare SD card, flashing rock-5b-ubuntu-focal-server-arm64-20220701-0826-gpt.img.xz from @jack’s link, connecting the USB-C charger in question, trying to boot and reporting here back?

Is this a ‘debug party’ or is all ‘communication’ happening again in some closed/hidden Discord channel or wherever else people are hiding?

@stuartiannaylor @amazingfate @Tonymac32 and whoever talked about design failures and broken reference designs…

Tonymac32 · August 1, 2022, 3:06am

I’ve communicated. The basic design is the same as the RK3399-ROC-PC, broken for years with no resolution as I linked. Testing and wishful thinking will not change this, nor likely will haphazard software hacks to U-boot.

I can a lot of time testing my various PD supplies, yes. Will I? I really don’t want to. Let me see if I can locate my soldering kit so I can instrument this board. The eye-witness accounts of what’s happening are useless without measurements.

Tonymac32 · August 1, 2022, 4:05am

Rebooted 1X with the supply being shut down/restarting, then stable operation with an Aukey 65W

Reading the voltage at /sys/devices/iio_sysfs_trigger/subsystem/devices/iio:device0/in_voltage6_raw, at least while the system is running, may be of some value during stress testing, I think this is the SARADC input shown in the schematic watching the vbus voltage.

stuartiannaylor · August 1, 2022, 5:23am

I am not bothering as its just a bad design that is expecting a battery to hold whilst voltage negotiation and switching.
It steals the USB-C port and considering there is a 5.1-26v stepdown buck onboard USB-C PD has zero advantage apart from adding another DC-DC convertor to the chain that is completely not needed and adds further inefficiency as the buck is going to convert to 5v.
Also with 5v input its likely the USB voltage will dip below min spec.
Its a bad hardware design for what many other similar RK3588 designs give a dedicated power socket and leave the USB-C to be a USB-C that seems to be implemented to keep the board size to the Pico-Itx format which it is almost approximately, but actually is not.

There is no testing to be done with what is just a very bad idea, rest of the board seems to work well with what is a BSP.
The revision is now v1.4 and Radxa have sent boards off to Collabora for hopefully mainline implementation, the debug party has sort of ended and guess Radxa are just waiting for a product run.

Dante4 · August 1, 2022, 5:27am

You are so funny guys

You don’t try to actually be part of debug testing and just whine about barrel plug. And when you actually provided with advice to test image - you just refuse to do it

Awesome position in my opinion, I have children in primary school more flexible than you, but that’s outside of topic. Maybe you actually do what you was adviced to do and help community, rather than continue to whine?

Tonymac32 · August 1, 2022, 5:29am

I have been testing this hopeless situation for 3 years, long before the Rock 5B was a twinkle in someone’s eye. You’re so funny, being a clueless troll.

tkaiser · August 1, 2022, 5:29am

Can you please do a systematic test using the OS image I talked about multiple times now (link also multiple times posted) and report back whether your ‘90watt PD’ is working reliably or not?

Great. So you only care about mainline Linux (and potentially broken drivers there) and nothing else.

I guess Radxa is more interested in what works today and for the next years: and that’s the BSP kernel with Android (a very valid use case of this board BTW) and Linux (see @willy’s efforts/checks to get Rockchip’s forward ported 5.10 mess into something that’s more close to 5.10 LTS)

stuartiannaylor · August 1, 2022, 5:40am

My opinion as Feedback to Radxa is the design is just a bad idea.
Its just a BSP and you can still apply static rail and it works so its nothing that stops use.
Mainline is the destination and hacking the Android Kernel to the Linux LTS is not a solution, just a stop gap.

Do not tell others what they care about or dictate what they should do, keep it to your input and what opinion and testing you have done with the hardware.
I understand you have a range of neural diversity that is often different to others and appreciate the info you often provide, but keep your opinion to the hardware and not to others.

tkaiser · August 1, 2022, 5:43am

I guess the TL;DR version of your post was: “I am amongst those badmouthing Rock 5B powering (USB PD broken) and I also refuse systematic testing”, correct?

Tonymac32 · August 1, 2022, 5:44am

To be honest, yeah pretty much. But even ignoring the obvious shitshow the Rockchip Linux kernel is, the tcpm and fusb302 that’s being added to the kernel by radxa are pretty close to the mainline drivers. This is not a good starting point, and quality of supply is going to be a huge problem, even if we somehow get around all the inherent issues on the device side.

My honest expectation is that a dedicated bring-up in early config of u-boot using a driver intended for microcontrollers combined with removing the device from the linux device tree entirely will be the most stable solution, of course then there are still questions about various supplies and their issues, but at least the board itself shouldn’t be the culprit. Say goodbye to the type C alt modes though. shrugs

stuartiannaylor · August 1, 2022, 5:47am

No its now on a new revision and it is what it is but I just don’t think there is much point with further testing and again keep to the hardware not to others.
My opinion is I don’t like the current design but its gone forward anyway and the debug party has essentially ended and some of us realise this and are giving a final summary.

amazingfate · August 1, 2022, 6:16am

@tkaiser Here are some info recently from radxa:
Radxa is trying to do PD negotiation in u-boot. Some codes are already pushed:https://github.com/radxa/u-boot/commits/stable-5.10-rock5.
PD negotiation in u-boot makes things better because it’s far earlier than the negotiation in kernel after the power is pulgged into the typec port. Because of the low speed of tf card, when the kernel starts to negotiate with the PD source, the PD source is in a timeout state and then hard reset, which means power down. I’ve tested the new u-boot, which will make the system on tf card boot up. There are still some power down during the boot process but I think that is software issue. And radxa is now working on it.
I like the full-featured typec, and I’ve been testing it many times to get closer to the root cause of the PD related issues. And I believe radxa will finally get these things done.