USB3 Ethernet Gadget Speed and Jumbo Packet Support

Hi,

I am testing the performance of g_ether module in CM3I / E25 baseboard. Currently I am able to get 20~30MBytes/sec from radxa -> host, and close to 40MBytes/sec from host -> radxa. I am using both iperf and scp commands to test, the results are similar.

I am actually expecting much higher transmission rates, above 100MB/s, as the underlying physical connection is USB3. Could the Radxa team suggest to me, what is stoping me from getting a higher rates?

One possible improvement I am thinking of is to use MTU greater than 1500, aka Jumbo Packet. I have read this discussion, and tried value 7148 and 9000, but the performance goes down to almost zero… Seems the debos and kernel doesnt support change of MTU?

Is there a chance to get > 100MB/s ethernet transmission on the E25 USB3 OTG port? Thanks!

Similarly I have also tested the transmission speed on the native Ethernet using iperf / scp. It seems that the speed is only about 10MB/s or so. Why is it much slower than expect also?

Correction. The issue was the ethernet cable. It was CAT5 cable. I changed to a thicker one, now the ethernet is operating at full speed. The bandwidth now is 112MB/s, which is close to theoretical 125MB/s

Can you check on your Linux host side of lsusb -vvv? Maybe it’s USB 2.0 mode?

It seems to be USB3 to me, on host

:~$ lsusb
Bus 004 Device 002: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 002: ID 0525:a4a2 Netchip Technology, Inc. Linux-USB Ethernet/RNDIS Gadget
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 008: ID 27c6:5395 Shenzhen Goodix Technology Co.,Ltd. Fingerprint Reader
Bus 001 Device 010: ID 8087:0025 Intel Corp. 
Bus 001 Device 009: ID 0c45:671d Microdia Integrated_Webcam_HD
Bus 001 Device 011: ID 056e:00e4 Elecom Co., Ltd ELECOM BlueLED Mouse
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
:~$ lsusb -t
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
    |__ Port 1: Dev 2, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 10000M
    |__ Port 2: Dev 2, If 0, Class=Wireless, Driver=rndis_host, 5000M
    |__ Port 2: Dev 2, If 1, Class=CDC Data, Driver=rndis_host, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M
    |__ Port 1: Dev 11, If 1, Class=Human Interface Device, Driver=usbhid, 12M
    |__ Port 1: Dev 11, If 2, Class=Human Interface Device, Driver=usbhid, 12M
    |__ Port 1: Dev 11, If 0, Class=Human Interface Device, Driver=usbhid, 12M
    |__ Port 4: Dev 10, If 1, Class=Wireless, Driver=btusb, 12M
    |__ Port 4: Dev 10, If 0, Class=Wireless, Driver=btusb, 12M
    |__ Port 7: Dev 8, If 0, Class=Communications, Driver=, 12M
    |__ Port 7: Dev 8, If 1, Class=CDC Data, Driver=, 12M
    |__ Port 12: Dev 9, If 1, Class=Video, Driver=uvcvideo, 480M
    |__ Port 12: Dev 9, If 0, Class=Video, Driver=uvcvideo, 480M

and

Bus 002 Device 002: ID 0525:a4a2 Netchip Technology, Inc. Linux-USB Ethernet/RNDIS Gadget
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               3.20
  bDeviceClass            2 Communications
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0         9
  idVendor           0x0525 Netchip Technology, Inc.
  idProduct          0xa4a2 Linux-USB Ethernet/RNDIS Gadget
  bcdDevice            4.19
  iManufacturer           1 Linux 4.19.193-58-rockchip-gbac1feba87f0 with dwc3-gadget
  iProduct                2 RNDIS/Ethernet Gadget
  iSerial                 0 
  bNumConfigurations      2
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength       0x005d
    bNumInterfaces          2
    bConfigurationValue     2
    iConfiguration          0 
    bmAttributes         0xc0
      Self Powered
    MaxPower              496mA
    Interface Association:
      bLength                 8
      bDescriptorType        11
      bFirstInterface         0
      bInterfaceCount         2
      bFunctionClass        224 Wireless
      bFunctionSubClass       1 Radio Frequency
      bFunctionProtocol       3 RNDIS
      iFunction               6 RNDIS
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass       224 Wireless
      bInterfaceSubClass      1 Radio Frequency
      bInterfaceProtocol      3 RNDIS
      iInterface              4 RNDIS Communications Control
      ** UNRECOGNIZED:  05 24 00 10 01
      ** UNRECOGNIZED:  05 24 01 00 01
      ** UNRECOGNIZED:  04 24 02 00
      ** UNRECOGNIZED:  05 24 06 00 01
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval               9
        bMaxBurst               0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        1
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass        10 CDC Data
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              5 RNDIS Ethernet Data
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0400  1x 1024 bytes
        bInterval               0
        bMaxBurst               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x01  EP 1 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0400  1x 1024 bytes
        bInterval               0
        bMaxBurst               0
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength       0x002c
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xc0
      Self Powered
    MaxPower              496mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass         2 Communications
      bInterfaceSubClass     12 Ethernet Emulation
      bInterfaceProtocol      7 Ethernet Emulation (EEM)
      iInterface              8 CDC Ethernet Emulation Model (EEM)
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0400  1x 1024 bytes
        bInterval               0
        bMaxBurst               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x01  EP 1 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0400  1x 1024 bytes
        bInterval               0
        bMaxBurst               0
Binary Object Store Descriptor:
  bLength                 5
  bDescriptorType        15
  wTotalLength       0x0016
  bNumDeviceCaps          2
  USB 2.0 Extension Device Capability:
    bLength                 7
    bDescriptorType        16
    bDevCapabilityType      2
    bmAttributes   0x00000006
      BESL Link Power Management (LPM) Supported
  SuperSpeed USB Device Capability:
    bLength                10
    bDescriptorType        16
    bDevCapabilityType      3
    bmAttributes         0x00
    wSpeedsSupported   0x000f
      Device can operate at Low Speed (1Mbps)
      Device can operate at Full Speed (12Mbps)
      Device can operate at High Speed (480Mbps)
      Device can operate at SuperSpeed (5Gbps)
    bFunctionalitySupport   1
      Lowest fully-functional device speed is Full Speed (12Mbps)
    bU1DevExitLat           1 micro seconds
    bU2DevExitLat         500 micro seconds
Device Status:     0x000c
  (Bus Powered)
  U1 Enabled
  U2 Enabled

Some more test with iperf3, as suggested here

rock@radxa-e25:~$ iperf3 -c 10.0.0.12 -f M
Connecting to host 10.0.0.12, port 5201
[  5] local 10.0.0.1 port 58238 connected to 10.0.0.12 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  29.9 MBytes  29.9 MBytes/sec    0    266 KBytes       
[  5]   1.00-2.00   sec  28.4 MBytes  28.4 MBytes/sec    0    280 KBytes       
[  5]   2.00-3.00   sec  28.2 MBytes  28.2 MBytes/sec    0    280 KBytes       
[  5]   3.00-4.00   sec  29.0 MBytes  29.0 MBytes/sec    0    280 KBytes       
[  5]   4.00-5.00   sec  28.7 MBytes  28.7 MBytes/sec    0    293 KBytes       
[  5]   5.00-6.00   sec  30.6 MBytes  30.5 MBytes/sec    0    293 KBytes       
[  5]   6.00-7.00   sec  29.5 MBytes  29.6 MBytes/sec    0    293 KBytes       
[  5]   7.00-8.00   sec  29.2 MBytes  29.2 MBytes/sec    0    293 KBytes       
[  5]   8.00-9.00   sec  29.3 MBytes  29.3 MBytes/sec    0    293 KBytes       
[  5]   9.00-10.00  sec  30.1 MBytes  30.1 MBytes/sec    0    433 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   293 MBytes  29.3 MBytes/sec    0             sender
[  5]   0.00-10.05  sec   290 MBytes  28.9 MBytes/sec                  receiver

iperf Done.
rock@radxa-e25:~$ lsusb -t
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=ohci-platform/1p, 12M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=ohci-platform/1p, 12M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-platform/1p, 480M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-platform/1p, 480M


and seems the usb device detected on host pc changes a bit from the previous post…

:~$ lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 10000M
    |__ Port 2: Dev 3, If 0, Class=Wireless, Driver=rndis_host, 5000M
    |__ Port 2: Dev 3, If 1, Class=CDC Data, Driver=rndis_host, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M
    |__ Port 1: Dev 11, If 1, Class=Human Interface Device, Driver=usbhid, 12M
    |__ Port 1: Dev 11, If 2, Class=Human Interface Device, Driver=usbhid, 12M
    |__ Port 1: Dev 11, If 0, Class=Human Interface Device, Driver=usbhid, 12M
    |__ Port 4: Dev 10, If 1, Class=Wireless, Driver=btusb, 12M
    |__ Port 4: Dev 10, If 0, Class=Wireless, Driver=btusb, 12M
    |__ Port 7: Dev 8, If 0, Class=Communications, Driver=, 12M
    |__ Port 7: Dev 8, If 1, Class=CDC Data, Driver=, 12M
    |__ Port 12: Dev 9, If 1, Class=Video, Driver=uvcvideo, 480M
    |__ Port 12: Dev 9, If 0, Class=Video, Driver=uvcvideo, 480M

Have you already searched for the obvious bottlenecks via atop and htop (one A55 maxing out since the driver being either too horrible or cpufreq scaling ending up with the A55 staying on lower clockspeeds)?

Interesting comment. I just checked htop and atop

with iperf


(realise the led.sh is occuping cpu too. killed that afterwards)

with scp


This indicates a horrible kernel driver I guess?

cpufreq scaling ending up with the A55 staying on lower clockspeeds what do you mean by this?

Maybe just bad settings in OS images as it’s pretty common in this SBC world. :slight_smile:

To identify please two more tests (ofc not using scp since the dynamically negotiated ciphers between both hosts add an unknown extra CPU utilization for encryption).

1st try this please: taskset -c 3 iperf3 -c 10.0.0.15 -f M (this moves the iperf3 task away from cpu0 which functions like an artificial bottleneck currently).

Even if atop reports the A55 being at 2.0 GHz to really check what happens with CPU clockspeeds either run sbc-bench -m 2 in another shell in parallel (sbc-bench) or switch to performance using these commands (no idea which one is the correct one since no idea which kernel you’re running):

echo performance | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance | sudo tee /sys/devices/system/cpu/cpufreq/policy0/scaling_governor

Then please recheck with taskset -c 3 iperf3 -c 10.0.0.15 -f M and if numbers are now higher than before then broken cpufreq scaling (settings) might be an issue as well.

Thanks!

First I tried taskset -c 3 iperf3 -c 10.0.0.15 -f M

The result doesnt change, compare to if I run iperf3 -c 10.0.0.15 -f M directly.

Second I turned cpu0 to performance mode

and running sbc-bench in parallel while executing iperf3

rock@radxa-e25:~$ sudo ./sbc-bench.sh -m 2
Rockchip RK3568, Kernel: aarch64, Userland: arm64
CPU sysfs topology (clusters, cpufreq members, clockspeeds)
                 cpufreq   min    max
 CPU    cluster  policy   speed  speed   core type
  0        0        0      408    1992   Cortex-A55 / r2p0
  1        0        0      408    1992   Cortex-A55 / r2p0
  2        0        0      408    1992   Cortex-A55 / r2p0
  3        0        0      408    1992   Cortex-A55 / r2p0

Thermal source: /sys/devices/virtual/thermal/thermal_zone0/ (soc-thermal)

Time        CPU    load %cpu %sys %usr %nice %io %irq   Temp
08:55:25: 1992MHz  0.16  18%  11%   4%   0%   0%   2%  50.0°C
08:55:27: 1992MHz  0.16   1%   0%   0%   0%   0%   0%  48.9°C
08:55:30: 1992MHz  0.16   1%   0%   0%   0%   0%   0%  49.4°C
08:55:32: 1992MHz  0.15   1%   0%   0%   0%   0%   0%  48.9°C
08:55:34: 1992MHz  0.15   9%   4%   0%   0%   0%   4%  50.0°C
08:55:36: 1992MHz  0.30  28%  11%   0%   0%   0%  16%  50.6°C
08:55:38: 1992MHz  0.30  28%  11%   0%   0%   0%  17%  51.2°C
08:55:40: 1992MHz  0.30  29%  12%   0%   0%   0%  16%  51.2°C
08:55:42: 1992MHz  0.51  29%  12%   0%   0%   0%  16%  51.9°C
08:55:44: 1992MHz  0.51  14%   6%   0%   0%   0%   6%  50.6°C
08:55:46: 1992MHz  0.47   1%   0%   0%   0%   0%   0%  50.0°C
08:55:49: 1992MHz  0.47   1%   1%   0%   0%   0%   0%  50.0°C
08:55:51: 1992MHz  0.43   1%   1%   0%   0%   0%   0%  49.4°C
08:55:53: 1992MHz  0.43   1%   1%   0%   0%   0%   0%  49.4°C
08:55:55: 1992MHz  0.43   1%   0%   0%   0%   0%   0%  49.4°C
08:55:57: 1992MHz  0.40   1%   0%   0%   0%   0%   0%  49.4°C
08:55:59: 1992MHz  0.40  10%   4%   0%   0%   0%   5%  50.6°C
08:56:01: 1992MHz  0.53  28%  11%   0%   0%   0%  16%  51.2°C
08:56:03: 1992MHz  0.53  29%  12%   0%   0%   0%  15%  51.2°C
08:56:05: 1992MHz  0.65  29%  13%   0%   0%   0%  15%  51.9°C
08:56:08: 1992MHz  0.65  29%  13%   0%   0%   0%  15%  51.9°C
08:56:10: 1992MHz  0.65  12%   5%   0%   0%   0%   6%  50.6°C
08:56:12: 1992MHz  0.59   1%   1%   0%   0%   0%   0%  50.0°C
08:56:14: 1992MHz  0.59   1%   0%   0%   0%   0%   0%  50.0°C
08:56:16: 1992MHz  0.55   1%   0%   0%   0%   0%   0%  50.0°C

The first run is with taskset, and second one is running directly iperf. The numbers seems no change.

Btw if I run ethtool usb0 on the host pc side, I get this

Settings for usb0:
	Supported ports: [ ]
	Supported link modes:   Not reported
	Supported pause frame use: No
	Supports auto-negotiation: No
	Supported FEC modes: Not reported
	Advertised link modes:  Not reported
	Advertised pause frame use: No
	Advertised auto-negotiation: No
	Advertised FEC modes: Not reported
	Speed: Unknown!
	Duplex: Half
	Port: Twisted Pair
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: off
	MDI-X: Unknown
	Current message level: 0x00000007 (7)
			       drv probe link
	Link detected: yes

Not sure why it says twisted pair, and speed is unknown

On ARM there is only one cpufreq policy per cluster so cpu0 adjusts all four A55 (which isn’t a bad thing). Anyway, it’s not cpufreq scaling but still a single A55 being the bottleneck since %sys and %irq are both handled here.

Can you please make a diff between /proc/interrupts contents before/after another iperf3 run? The lines with significant differences are the interesting ones (to send this specific interrupt later to another CPU).

I see what you are suggesting here. Basically to share loads of the usb gadget work across different CPU cores. This doesnt sound too good to me, as I actually those cores to do quite some data processing before sending out.

Do you think the the CPU usage for this device gadget fuction is abnormally high?

This is the diff btw:

diff before.txt after.txt 
2,4c2,4
<   5:      73918      50389      52377      48588     GICv3  26 Level     arch_timer
<   6:      36899      43159      32243      27454     GICv3 141 Level     rk_timer
<   7:        165          0          0          0     GICv3 260 Level     arm-pmu
---
>   5:      76946      50409      52676      48593     GICv3  26 Level     arch_timer
>   6:      36964      43327      32520      27523     GICv3 141 Level     rk_timer
>   7:        176          0          0          0     GICv3 260 Level     arm-pmu
6c6
<   9:          0          0         91          0     GICv3 262 Level     arm-pmu
---
>   9:          0          0         93          0     GICv3 262 Level     arm-pmu
41c41
<  56:      14363          0          0          0     GICv3  51 Level     mmc0
---
>  56:      14383          0          0          0     GICv3  51 Level     mmc0
56c56
< 100:    1826603          0          0          0     GICv3 201 Level     dwc3
---
> 100:    2060974          0          0          0     GICv3 201 Level     dwc3
61c61
< 122:       1867          0          0          0   ITS-MSI 524288 Edge      eth0-0
---
> 122:       1977          0          0          0   ITS-MSI 524288 Edge      eth0-0
77c77
< 139:       1405          0          0          0   ITS-MSI 524304 Edge      eth0-16
---
> 139:       1469          0          0          0   ITS-MSI 524304 Edge      eth0-16
125c125
< IPI0:     64359      72571      58520      56244       Rescheduling interrupts
---
> IPI0:     64425      74141      59267      56350       Rescheduling interrupts
129,130c129,130
< IPI4:     13989      18990      19003      20510       Timer broadcast interrupts
< IPI5:     12517       8016       6993       6893       IRQ work interrupts
---
> IPI4:     13997      19011      19020      20518       Timer broadcast interrupts
> IPI5:     12639       8016       6993       6893       IRQ work interrupts

More or less. One last test… as root please do this (given the network interface name on the E25 is usb0 – needs to be adopted of course):

echo 3 >/proc/irq/$(awk -F":" "/dwc3/ {print \$1}" </proc/interrupts | sed 's/\ //g')/smp_affinity_list
echo 7 >/sys/class/net/usb0/queues/rx-0/rps_cpus

Then again a test with iperf3 (also with -R the reverse direction).

Thank again!

This time there seems to be improvement. Two observations

  1. The 100% now appears on cpu 3 instead of cpu 0, but CPU usage for that core is still 100%
  2. The improvement show up in radxa -> host direction, from ~40 to ~48MB/s
rock@radxa-e25:~$ iperf3 -c 10.0.0.19 -f M
Connecting to host 10.0.0.19, port 5201
[  5] local 10.0.0.1 port 39970 connected to 10.0.0.19 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  50.8 MBytes  50.8 MBytes/sec    0    373 KBytes       
[  5]   1.00-2.00   sec  48.5 MBytes  48.5 MBytes/sec    0    393 KBytes       
[  5]   2.00-3.00   sec  48.5 MBytes  48.5 MBytes/sec    0    413 KBytes       
[  5]   3.00-4.00   sec  48.3 MBytes  48.3 MBytes/sec    0    413 KBytes       
[  5]   4.00-5.00   sec  48.5 MBytes  48.5 MBytes/sec    0    413 KBytes       
[  5]   5.00-6.00   sec  48.2 MBytes  48.2 MBytes/sec    0    413 KBytes       
[  5]   6.00-7.00   sec  48.0 MBytes  48.0 MBytes/sec    0    413 KBytes       
[  5]   7.00-8.00   sec  48.2 MBytes  48.2 MBytes/sec    0    413 KBytes       
[  5]   8.00-9.00   sec  49.2 MBytes  49.2 MBytes/sec    0    433 KBytes       
[  5]   9.00-10.00  sec  47.9 MBytes  47.9 MBytes/sec    0    433 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   486 MBytes  48.6 MBytes/sec    0             sender
[  5]   0.00-10.04  sec   484 MBytes  48.2 MBytes/sec                  receiver

iperf Done.
rock@radxa-e25:~$ iperf3 -c 10.0.0.19 -f M -R
Connecting to host 10.0.0.19, port 5201
Reverse mode, remote host 10.0.0.19 is sending
[  5] local 10.0.0.1 port 39974 connected to 10.0.0.19 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  71.9 MBytes  71.9 MBytes/sec                  
[  5]   1.00-2.00   sec  72.0 MBytes  72.0 MBytes/sec                  
[  5]   2.00-3.00   sec  72.1 MBytes  72.1 MBytes/sec                  
[  5]   3.00-4.00   sec  73.1 MBytes  73.1 MBytes/sec                  
[  5]   4.00-5.00   sec  72.4 MBytes  72.4 MBytes/sec                  
[  5]   5.00-6.00   sec  72.2 MBytes  72.2 MBytes/sec                  
[  5]   6.00-7.00   sec  72.3 MBytes  72.3 MBytes/sec                  
[  5]   7.00-8.00   sec  72.7 MBytes  72.7 MBytes/sec                  
[  5]   8.00-9.00   sec  72.7 MBytes  72.7 MBytes/sec                  
[  5]   9.00-10.00  sec  72.2 MBytes  72.2 MBytes/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.04  sec   725 MBytes  72.3 MBytes/sec    0             sender
[  5]   0.00-10.00  sec   724 MBytes  72.4 MBytes/sec                  receiver

iperf Done.
rock@radxa-e25:~$ 

Below is without your echos changes:

rock@radxa-e25:~$ iperf3 -c 10.0.0.16 -f M
Connecting to host 10.0.0.16, port 5201
[  5] local 10.0.0.1 port 54712 connected to 10.0.0.16 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  42.1 MBytes  42.1 MBytes/sec    0    240 KBytes       
[  5]   1.00-2.00   sec  41.0 MBytes  41.0 MBytes/sec    0    266 KBytes       
[  5]   2.00-3.00   sec  40.3 MBytes  40.3 MBytes/sec    0    266 KBytes       
[  5]   3.00-4.00   sec  40.3 MBytes  40.3 MBytes/sec    0    266 KBytes       
[  5]   4.00-5.00   sec  40.3 MBytes  40.3 MBytes/sec    0    266 KBytes       
[  5]   5.00-6.00   sec  40.3 MBytes  40.3 MBytes/sec    0    266 KBytes       
[  5]   6.00-7.00   sec  40.3 MBytes  40.3 MBytes/sec    0    266 KBytes       
[  5]   7.00-8.00   sec  40.8 MBytes  40.8 MBytes/sec    0    266 KBytes       
[  5]   8.00-9.00   sec  40.3 MBytes  40.3 MBytes/sec    0    266 KBytes       
[  5]   9.00-10.00  sec  40.3 MBytes  40.3 MBytes/sec    0    266 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   406 MBytes  40.6 MBytes/sec    0             sender
[  5]   0.00-10.04  sec   405 MBytes  40.3 MBytes/sec                  receiver

iperf Done.
rock@radxa-e25:~$ iperf3 -c 10.0.0.16 -f M -R
Connecting to host 10.0.0.16, port 5201
Reverse mode, remote host 10.0.0.16 is sending
[  5] local 10.0.0.1 port 54716 connected to 10.0.0.16 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  72.3 MBytes  72.3 MBytes/sec                  
[  5]   1.00-2.00   sec  72.3 MBytes  72.3 MBytes/sec                  
[  5]   2.00-3.00   sec  72.3 MBytes  72.3 MBytes/sec                  
[  5]   3.00-4.00   sec  72.5 MBytes  72.5 MBytes/sec                  
[  5]   4.00-5.00   sec  72.6 MBytes  72.6 MBytes/sec                  
[  5]   5.00-6.00   sec  72.5 MBytes  72.5 MBytes/sec                  
[  5]   6.00-7.00   sec  71.9 MBytes  72.0 MBytes/sec                  
[  5]   7.00-8.00   sec  72.5 MBytes  72.5 MBytes/sec                  
[  5]   8.00-9.00   sec  72.2 MBytes  72.2 MBytes/sec                  
[  5]   9.00-10.00  sec  72.3 MBytes  72.3 MBytes/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.04  sec   725 MBytes  72.2 MBytes/sec    0             sender
[  5]   0.00-10.00  sec   723 MBytes  72.3 MBytes/sec                  receiver

iperf Done.

Well, then this seems to be the limit with your setup: Vendor BSP kernel 4.19 (forward ported since ages from 2.6 on) and a single A55 somewhere at 1.9-2.0 GHz (those Rockchip SoCs implement PVTM and as such if you have a ‘weak’ silicon variant upper clockspeeds will be cut. In case of interest simply run sbc-bench to get real clockspeeds).

Asides running a recent mainline kernel in the hope of a better gadget driver (most probably not possible with E25 due to lacking device-tree?) most probably more capable hardware is needed to exceed USB3 gadget performance. Radxa has a Rock 5A in the pipeline (RK3588s with four A55 + another four A76 that should be twice as fast here) but I haven’t heard of it since months and also have no clue whether USB OTG is SuperSpeed or only Hi-Speed…

I have also just tested iperf3 on the CM3 at hand (USB2 OTG). It has comparable speed, but also the CPU 0 give 100% usage. Emm, didn’t know usb gadget uses that much CPU?

rock@radxa-cm3-io:~$ iperf3 -c 10.0.0.16 -f M
Connecting to host 10.0.0.16, port 5201
[  5] local 10.0.0.1 port 57414 connected to 10.0.0.16 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  38.2 MBytes  38.2 MBytes/sec    0    232 KBytes       
[  5]   1.00-2.00   sec  36.8 MBytes  36.8 MBytes/sec    0    232 KBytes       
[  5]   2.00-3.00   sec  37.3 MBytes  37.3 MBytes/sec    0    246 KBytes       
[  5]   3.00-4.00   sec  37.3 MBytes  37.3 MBytes/sec    0    246 KBytes       
[  5]   4.00-5.00   sec  37.3 MBytes  37.3 MBytes/sec    0    246 KBytes       
[  5]   5.00-6.00   sec  37.3 MBytes  37.3 MBytes/sec    0    246 KBytes       
[  5]   6.00-7.00   sec  37.3 MBytes  37.3 MBytes/sec    0    246 KBytes       
[  5]   7.00-8.00   sec  37.5 MBytes  37.5 MBytes/sec    0    280 KBytes       
[  5]   8.00-9.00   sec  37.3 MBytes  37.3 MBytes/sec    0    280 KBytes       
[  5]   9.00-10.00  sec  36.8 MBytes  36.8 MBytes/sec    0    280 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   373 MBytes  37.3 MBytes/sec    0             sender
[  5]   0.00-10.05  sec   372 MBytes  37.0 MBytes/sec                  receiver

iperf Done.
rock@radxa-cm3-io:~$ iperf3 -c 10.0.0.16 -f M -R
Connecting to host 10.0.0.16, port 5201
Reverse mode, remote host 10.0.0.16 is sending
[  5] local 10.0.0.1 port 57418 connected to 10.0.0.16 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  26.1 MBytes  26.1 MBytes/sec                  
[  5]   1.00-2.00   sec  25.9 MBytes  25.9 MBytes/sec                  
[  5]   2.00-3.00   sec  26.1 MBytes  26.1 MBytes/sec                  
[  5]   3.00-4.00   sec  25.9 MBytes  25.9 MBytes/sec                  
[  5]   4.00-5.00   sec  26.3 MBytes  26.3 MBytes/sec                  
[  5]   5.00-6.00   sec  26.1 MBytes  26.1 MBytes/sec                  
[  5]   6.00-7.00   sec  25.9 MBytes  25.9 MBytes/sec                  
[  5]   7.00-8.00   sec  25.9 MBytes  25.9 MBytes/sec                  
[  5]   8.00-9.00   sec  26.0 MBytes  26.0 MBytes/sec                  
[  5]   9.00-10.00  sec  26.0 MBytes  26.0 MBytes/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.04  sec   261 MBytes  26.0 MBytes/sec    0             sender
[  5]   0.00-10.00  sec   260 MBytes  26.0 MBytes/sec                  receiver

iperf Done.

Doesn’t need to be like this. Just remembered that I tested with an RPi Zero 2 W some time ago and that obviously driver situation at the other end of the USB cable also matters and that CPU utilization wasn’t that high (but ofc many differences: kernel, Hi-Speed vs. SuperSpeed, dwc2 vs. dwc3 and so on).

Nice doc! It is interesting thst for your case, from device to host takes much less cpu than the other way. Looks like many variables to factor in.

Hi,
On the E-25, we use the USB3.0 interface and the A to A data cable for testing. The following are the test results:

root@radxa-e25:/home/rock# iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.0.1.2, port 47728
[  5] local 10.0.1.1 port 5201 connected to 10.0.1.2 port 47732
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  74.5 MBytes   625 Mbits/sec                  
[  5]   1.00-2.00   sec  78.2 MBytes   656 Mbits/sec                  
[  5]   2.00-3.00   sec  77.8 MBytes   653 Mbits/sec                  
[  5]   3.00-4.00   sec  79.0 MBytes   662 Mbits/sec                  
[  5]   4.00-5.00   sec  78.5 MBytes   659 Mbits/sec                  
[  5]   5.00-6.00   sec  78.7 MBytes   660 Mbits/sec                  
[  5]   6.00-7.00   sec  78.4 MBytes   658 Mbits/sec                  
[  5]   7.00-8.00   sec  78.7 MBytes   660 Mbits/sec                  
[  5]   8.00-9.00   sec  78.4 MBytes   658 Mbits/sec                  
[  5]   9.00-10.00  sec  78.9 MBytes   662 Mbits/sec                  
[  5]  10.00-10.04  sec  2.98 MBytes   663 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.04  sec   784 MBytes   655 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.0.1.2, port 45846
[  5] local 10.0.1.1 port 5201 connected to 10.0.1.2 port 45858
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  32.7 MBytes   274 Mbits/sec    0    290 KBytes       
[  5]   1.00-2.00   sec  32.5 MBytes   273 Mbits/sec    0    290 KBytes       
[  5]   2.00-3.00   sec  32.0 MBytes   269 Mbits/sec    0    290 KBytes       
[  5]   3.00-4.00   sec  32.7 MBytes   275 Mbits/sec    0    305 KBytes       
[  5]   4.00-5.00   sec  33.1 MBytes   277 Mbits/sec    0    368 KBytes       
[  5]   5.00-6.00   sec  33.1 MBytes   277 Mbits/sec    0    383 KBytes       
[  5]   6.00-7.00   sec  32.6 MBytes   274 Mbits/sec    0    402 KBytes       
[  5]   7.00-8.00   sec  33.2 MBytes   278 Mbits/sec    0    441 KBytes       
[  5]   8.00-9.00   sec  32.1 MBytes   269 Mbits/sec    0    461 KBytes       
[  5]   9.00-10.00  sec  31.3 MBytes   263 Mbits/sec    0    461 KBytes       
[  5]  10.00-10.04  sec   954 KBytes   218 Mbits/sec    0    461 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.04  sec   326 MBytes   273 Mbits/sec    0             sender
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------