Orion O6 Debug Party Invitation

For now we are using the development key provided in the CIX EDK2 source code to sign the output. I personally believe it should be available later.

You can also look here: https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html

  1. Grub things, maybe caused by set variable runtime service not working. Copying debian/grubaa64.efi to boot/BOOTAA64.EFI works. BOOTAA64.EFI is the default efi application in EDK2. And there is another issue on set time runtime services, it causes kernel error. I guess, maybe no runtime service works on my board.
  2. For 8126, I found a source code from openwrt. I build it myself, works fine with offical debian images.
    https://github.com/openwrt/rtl8126

https://www.realtek.com/Download/List?cate_id=584

Run my armcpuinfo.efi in EFI Shell. Will show I’d registers and explain most of them.

Just run “mode 100 31” before cause output is aligned for 100 columns terminal.

OK I can check this week-end. Care to share a direct link please ?

https://github.com/hrw/edk2-armcpuinfo/releases/tag/v1.3.1 has binary.

2 Likes

Thank you, finally I could run it remotely since my serial port is still attached :wink:

Here it comes:

$ cat bootterm-20250206-083258.log
FS0:\EFI\> ARMCPUINFO.EFI
ArmCpuInfo v1.3.1

ID_AA64AFR0_EL1  = 0x0000000000000000
ID_AA64AFR1_EL1  = 0x0000000000000000
ID_AA64DFR0_EL1  = 0x100F11F310305719
ID_AA64DFR1_EL1  = 0x0000000000000000
ID_AA64FPFR0_EL1 = 0x0000000000000000
ID_AA64ISAR0_EL1 = 0x0221111110212120
ID_AA64ISAR1_EL1 = 0x0111111100211002
ID_AA64ISAR2_EL1 = 0x0000000001005102
ID_AA64ISAR3_EL1 = 0x0000000000000000
ID_AA64MMFR0_EL1 = 0x2100022200101122
ID_AA64MMFR1_EL1 = 0x1001111010312122
ID_AA64MMFR2_EL1 = 0x1221011110101011
ID_AA64MMFR3_EL1 = 0x0000000000000000
ID_AA64MMFR4_EL1 = 0x0000000000000000
ID_AA64PFR0_EL1  = 0x1201111123111111
ID_AA64PFR1_EL1  = 0x0000000000010321
ID_AA64PFR2_EL1  = 0x0000000000000000
ID_AA64SMFR0_EL1 = 0x0000000000000000
ID_AA64ZFR0_EL1  = 0x0000110100110021

Reg   | Name         |  Bits | Value | Feature
------|--------------|-------|-------|----------------------------------------------
MMFR0 | ECV          | 63:60 |  0010 | FEAT_ECV implemented with extras.
MMFR0 | FGT          | 59:56 |  0001 | FEAT_FGT implemented.
MMFR0 | ExS          | 47:44 |  0000 | FEAT_ExS not implemented.
MMFR0 | TGran4       | 31:28 |  0000 |  4KB granule supported.
MMFR0 | TGran4_2     | 43:40 |  0010 |  4KB granule supported at stage 2.
MMFR0 | TGran16      | 23:20 |  0001 | 16KB granule supported.
MMFR0 | TGran16_2    | 35:32 |  0010 | 16KB granule supported at stage 2.
MMFR0 | TGran64      | 27:24 |  0000 | 64KB granule supported.
MMFR0 | TGran64_2    | 39:36 |  0010 | 64KB granule supported at stage 2.
MMFR0 | SNSMem       | 15:12 |  0001 | Supports a distinction between Secure and Non-Secure Memory.
MMFR0 | BigEnd       | 11:8  |  0001 | Mixed-endian support.
MMFR0 | BigEndEL0    | 19:16 |  0000 | No mixed-endian support at EL0.
MMFR0 | ASIDBits     |  7:4  |  0010 | ASID: 16 Bits
MMFR0 | PARange      |  3:0  |  0010 | 40 Bits (1TB) of physical address range supported.
------|--------------|-------|-------|----------------------------------------------
MMFR1 | ECBHB        | 63:60 |  0001 | FEAT_ECBHB implemented.
MMFR1 | CMOW         | 59:56 |  0000 | FEAT_CMOW not implemented.
MMFR1 | TIDCP1       | 55:52 |  0000 | FEAT_TIDCP1 not implemented
MMFR1 | nTLBPA       | 51:48 |  0001 | FEAT_nTLBPA implemented.
MMFR1 | AFP          | 47:44 |  0001 | FEAT_AFP implemented.
MMFR1 | HCX          | 43:40 |  0001 | FEAT_HCX implemented.
MMFR1 | ETS          | 39:36 |  0001 | FEAT_ETS implemented.
MMFR1 | TWED         | 35:32 |  0000 | FEAT_TWED not implemented.
MMFR1 | XNX          | 31:28 |  0001 | FEAT_XNX implemented.
MMFR1 | SpecSEI      | 27:24 |  0000 | The PE never generates an SError interrupt due to an 
      |              |       |       | External abort on a speculative read.
MMFR1 | PAN          | 23:20 |  0011 | FEAT_PAN3 implemented.
MMFR1 | LO           | 19:16 |  0001 | FEAT_LOR implemented.
MMFR1 | HPDS         | 15:12 |  0010 | FEAT_HPDS2 implemented.
MMFR1 | VH           | 11:8  |  0001 | FEAT_VHE implemented.
MMFR1 | VMIDBits     |  7:4  |  0010 | FEAT_VMID16 implemented.
MMFR1 | HAFDBS       |  3:0  |  0010 | FEAT_HAFDBS implemented with dirty status support.
------|--------------|-------|-------|----------------------------------------------
MMFR2 | E0PD         | 63:60 |  0001 | FEAT_E0PD implemented.
MMFR2 | EVT          | 59:56 |  0010 | FEAT_EVT: HCR_EL2.{TTLBOS, TTLSBIS, TOCU, TICAB, TID4} 
      |              |       |       | traps.
MMFR2 | BBM          | 55:52 |  0010 | FEAT_BBM: Level 2 support for changing block size.
MMFR2 | TTL          | 51:48 |  0001 | FEAT_TTL implemented.
MMFR2 | FWB          | 43:40 |  0001 | FEAT_S2FWB implemented.
MMFR2 | IDS          | 39:36 |  0001 | FEAT_IDST implemented.
MMFR2 | AT           | 35:32 |  0001 | FEAT_LSE2 implemented.
MMFR2 | ST           | 31:28 |  0001 | FEAT_TTST implemented.
MMFR2 | NV           | 27:24 |  0000 | FEAT_NV not implemented.
MMFR2 | CCIDX        | 23:20 |  0001 | FEAT_CCIDX implemented.
MMFR2 | VARange      | 19:16 |  0000 | FEAT_LVA not implemented.
MMFR2 | IESB         | 15:12 |  0001 | FEAT_IESB implemented.
MMFR2 | LSM          | 11:8  |  0000 | FEAT_LSMAOC not implemented.
MMFR2 | UAO          |  7:4  |  0001 | FEAT_UAO implemented.
MMFR2 | CnP          |  3:0  |  0001 | FEAT_TTCNP implemented.
------|--------------|-------|-------|----------------------------------------------
PFR0  | CSV3         | 63:60 |  0001 | FEAT_CSV3 implemented.
PFR0  | CSV2         | 59:56 |  0010 | FEAT_CSV2_2 implemented.
PFR0  | RME          | 55:52 |  0000 | FEAT_RME not implemented
PFR0  | DIT          | 51:48 |  0001 | FEAT_DIT implemented.
PFR0  | AMU          | 47:44 |  0001 | FEAT_AMUv1 implemented.
PFR0  | MPAM         | 43:40 |  0001 | FEAT_MPAM v1.1 implemented.
PFR0  | SEL2         | 39:36 |  0001 | Secure EL2 implemented.
PFR0  | SVE          | 35:32 |  0001 | FEAT_SVE implemented.
PFR0  | RAS          | 31:28 |  0010 | FEAT_RASv1p1 implemented. FEAT_DoubleFault implemented.
PFR0  | GIC          | 27:24 |  0011 | System registers to versions 4.1 of GIC CPU implemented.
PFR0  | AdvSIMD      | 23:20 |  0001 | Advanced SIMD with half precision support (FEAT_FP16).
PFR0  | FP           | 19:16 |  0001 | Floating-point with half-precision support (FEAT_FP16).
PFR0  | EL3          | 15:12 |  0001 | EL3 in AArch64 only
PFR0  | EL2          | 11:8  |  0001 | EL2 in AArch64 only
PFR0  | EL1          |  7:4  |  0001 | EL1 in AArch64 only
PFR0  | EL0          |  3:0  |  0001 | EL0 in AArch64 only
------|--------------|-------|-------|----------------------------------------------
PFR1  | PFAR         | 63:60 |  0000 | FEAT_PFAR not implemented.
PFR1  | DF2          | 59:56 |  0000 | FEAT_DoubleFault2 not implemented.
PFR1  | MTEX         | 55:52 |  0000 | Canonical Tag checking and Memory tagging with Address 
      |              |       |       | tagging disabled are not supported.
PFR1  | THE          | 51:48 |  0000 | FEAT_THE not implemented.
PFR1  | GCS          | 47:44 |  0000 | FEAT_GCS not implemented.
PFR1  | MTE_frac     | 43:40 |  0000 | FEAT_MTE_ASYNC implemented.
PFR1  | NMI          | 39:36 |  0000 | FEAT_NMI not implemented.
PFR1  | RNDR_trap    | 31:28 |  0000 | FEAT_RNG_TRAP not implemented.
PFR1  | SME          | 27:24 |  0000 | FEAT_SME not implemented.
PFR1  | MTE          | 11:8  |  0011 | FEAT_MTE3 implemented.
PFR1  | SSBS         |  7:4  |  0010 | FEAT_SSBS2 implemented.
PFR1  | BT           |  3:0  |  0001 | FEAT_BTI implemented.
------|--------------|-------|-------|----------------------------------------------
ISAR0 | RNDR         | 63:60 |  0000 | FEAT_RNG not implemented.
ISAR0 | TLB          | 59:56 |  0010 | FEAT_TLBIRANGE implemented.
ISAR0 | TS           | 55:52 |  0010 | FEAT_FlagM2 implemented.
ISAR0 | FHM          | 51:48 |  0001 | FEAT_FHM implemented.
ISAR0 | DP           | 47:44 |  0001 | FEAT_DotProd implemented.
ISAR0 | SM4          | 43:40 |  0001 | FEAT_SM4 implemented.
ISAR0 | SM3          | 39:36 |  0001 | FEAT_SM3 implemented.
ISAR0 | SHA3         | 35:32 |  0001 | FEAT_SHA3 implemented.
ISAR0 | RDM          | 31:28 |  0001 | FEAT_RDM implemented.
ISAR0 | TME          | 27:24 |  0000 | TME instructions not implemented.
ISAR0 | Atomic       | 23:20 |  0010 | FEAT_LSE implemented.
ISAR0 | CRC32        | 19:16 |  0001 | CRC32 instructions implemented.
ISAR0 | SHA2         | 15:12 |  0010 | FEAT_SHA512 implemented.
ISAR0 | SHA1         | 11:8  |  0001 | FEAT_SHA1 implemented.
ISAR0 | AES          |  7:4  |  0010 | FEAT_AES and FEAT_PMULL implemented.
------|--------------|-------|-------|----------------------------------------------
ISAR1 | LS64         | 63:60 |  0000 | FEAT_LS64 not implemented.
ISAR1 | XS           | 59:56 |  0001 | FEAT_XS implemented.
ISAR1 | I8MM         | 55:52 |  0001 | FEAT_I8MM implemented.
ISAR1 | DGH          | 51:48 |  0001 | FEAT_DGH implemented.
ISAR1 | BF16         | 47:44 |  0001 | FEAT_BF16 implemented.
ISAR1 | SPECRES      | 43:40 |  0001 | FEAT_SPECRES implemented.
ISAR1 | SB           | 39:36 |  0001 | FEAT_SB implemented.
ISAR1 | FRINTTS      | 35:32 |  0001 | FEAT_FRINTTS implemented.
ISAR1 | GPI          | 31:28 |  0000 | FEAT_PACIMP not implemented.
ISAR1 | GPA          | 27:24 |  0000 | FEAT_PACQARMA5 not implemented.
ISAR1 | LRCPC        | 23:20 |  0010 | FEAT_LRCPC2 implemented.
ISAR1 | FCMA         | 19:16 |  0001 | FEAT_FCMA implemented.
ISAR1 | JSCVT        | 15:12 |  0001 | FEAT_JSCVT implemented.
ISAR1 | API          | 11:8  |  0000 | Address Authentication (API) not implemented.
ISAR1 | APA          |  7:4  |  0000 | Address Authentication (APA) not implemented.
ISAR1 | DPB          |  3:0  |  0010 | FEAT_DPB2 implemented.
------|--------------|-------|-------|----------------------------------------------
ISAR2 | ATS1A        | 63:60 |  0000 | Address Translate Stage 1 instructions without Permissions 
      |              |       |       | Checks are not implemented.
ISAR2 | LUT          | 59:56 |  0000 | FEAT_LUT not implemented.
ISAR2 | CSSC         | 55:52 |  0000 | FEAT_CSSC not implemented.
ISAR2 | RPRFM        | 51:48 |  0000 | FEAT_RPRFM not implemented.
ISAR2 | PRFMSLC      | 43:40 |  0000 | FEAT_PRFMSLC not implemented.
ISAR2 | SYSINSTR_128 | 39:36 |  0000 | FEAT_SYSINSTR128 not implemented.
ISAR2 | SYSREG_128   | 35:32 |  0000 | FEAT_SYSREG128 not implemented.
ISAR2 | CLRBHB       | 31:28 |  0000 | FEAT_CLRBHB not implemented.
ISAR2 | PAC_frac     | 27:24 |  0001 | FEAT_CONSTPACFIELD implemented.
ISAR2 | BC           | 23:20 |  0000 | FEAT_HBC not implemented.
ISAR2 | MOPS         | 19:16 |  0000 | FEAT_MOPS not implemented.
ISAR2 | APA3         | 15:12 |  0101 | FEAT_FPACCOMBINE implemented.
ISAR2 | GPA3         | 11:8  |  0001 | FEAT_PACQARMA3 implemented.
ISAR2 | RPRES        |  7:4  |  0000 | FEAT_RPRES not implemented.
ISAR2 | WFxT         |  3:0  |  0010 | FEAT_WFxT implemented.
------|--------------|-------|-------|----------------------------------------------
DFR0  | HPMN0        | 63:60 |  0001 | FEAT_HPMN0 implemented.
DFR0  | ExtTrcBuff   | 59:56 |  0000 | Trace Buffer External Mode not implemented.
DFR0  | BRBE         | 55:52 |  0000 | FEAT_BRBE not implemented.
DFR0  | MTPMU        | 51:48 |  1111 | FEAT_MTPMU not implemented.
DFR0  | TraceBuffer  | 47:44 |  0001 | FEAT_TRBE implemented.
DFR0  | TraceFilt    | 43:40 |  0001 | FEAT_TRF implemented.
DFR0  | DoubleLock   | 39:36 |  1111 | FEAT_DoubleLock not implemented.
DFR0  | PMSVer       | 35:32 |  0011 | FEAT_SPEv1p2 implemented.
DFR0  | CTX_CMPs     | 31:28 |  0001 | Number of breakpoints that are context-aware, minus 1.
DFR0  | SEBEP        | 27:24 |  0000 | FEAT_SEBEP not implemented.
DFR0  | WRPs         | 23:20 |  0011 | Number of watchpoints, minus 1.
DFR0  | PMSS         | 19:16 |  0000 | FEAT_PMUv2_SS not implemented.
DFR0  | BRPs         | 15:12 |  0101 | Number of breakpoints, minus 1.
DFR0  | PMUVer       | 11:8  |  0111 | FEAT_PMUv3p7 implemented.
DFR0  | TraceVer     |  7:4  |  0001 | Trace unit System registers implemented.
DFR0  | DebugVer     |  3:0  |  1001 | FEAT_Debugv8p4 implemented.
------|--------------|-------|-------|----------------------------------------------
ZFR0  | F64MM        | 59:56 |  0000 | FEAT_F64MM SVE not implemented
ZFR0  | F32MM        | 55:52 |  0000 | FEAT_F32MM SVE not implemented
ZFR0  | I8MM         | 47:44 |  0001 | FEAT_I8MM SVE implemented.
ZFR0  | SM4          | 43:40 |  0001 | FEAT_SVE_SM4 implemented.
ZFR0  | SHA3         | 35:32 |  0001 | FEAT_SVE_SHA3 implemented.
ZFR0  | B16B16       | 27:24 |  0000 | FEAT_SVE_B16B16 not implemented.
ZFR0  | BF16         | 23:20 |  0001 | FEAT_BF16 SVE implemented.
ZFR0  | BitPerm      | 19:16 |  0001 | FEAT_SVE_BitPerm implemented.
ZFR0  | AES          |  7:4  |  0010 | FEAT_SVE_AES and FEAT_SVE_PMULL128 implemented.
ZFR0  | SVEver       |  3:0  |  0001 | FEAT_SVE2 implemented.
------|--------------|-------|-------|----------------------------------------------
FS0:\EFI\>
3 Likes

Thanks.

Nice to see someone else uses my tool. And it shows what is and what is not implemented.

From quick look I see that RNG is not there. Well, so be it.

1 Like

@geerlingguy Which USB PD power supply did you use? Another user reported same issue on the wechat group, using the 65W PD power supply from Lenovo can power off the O6 but the 140W Legion PD power supply power off will reboot.

I was originally testing with an Apple 61W USB-C PD supply, but have switched to testing with an Anker Nano II 65W GaN supply (which I haven’t yet tested with a boot/shutdown loop… I will try that later).

It looks like CIX pushed the cpu freq up to 3.0 GHz on their EVBs. This brings quite a performance improvement. https://browser.geekbench.com/v6/cpu/10375547

3 Likes

It’s 2.9 GHz (Geekbench is notoriously bad in reporting correct numbers) and this might be the CP8180 and not the CD8180 SBC version we get on the O6 :slight_smile:

1 Like

Whats the diference?

Between CD8180 and CP8180? All I know is that different variants exist and I suspect CP8180 getting higher clocks (at the price of way higher peak / fully-loaded consumption)

“more flexible on the voltage/frequency” implies that the CD8180 found in O6 has more potential than CP8180.

1 Like

I could finally gather all my results of the quick 100 Gbps tests I’ve run with HAProxy this week. I first plugged a Mellanox ConnectX-5 dual-100G NIC. It was properly recognized and used a x8. The problem is that the card is a bit old now and limited to PCIe Gen3, so tests were severely limited, and I replaced it. Here’s a photo and the output of lspci for the device:

0001:c1:00.0 Ethernet controller [0200]: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017]
        Subsystem: Mellanox Technologies Mellanox ConnectX-5 MCX516A-CCAT [15b3:0007]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 136
        Region 0: Memory at 1800000000 (64-bit, prefetchable) [size=32M]
        Expansion ROM at 60300000 [disabled] [size=1M]
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x8 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn+
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [48] Vital Product Data
                Product Name: CX516A - ConnectX-5 QSFP28
                Read-only fields:
                        [PN] Part number: MCX516A-CCAT    
                        [EC] Engineering changes: A8
                        [V2] Vendor specific: MCX516A-CCAT    
                        [SN] Serial number: MT1913K00843          
                        [V3] Vendor specific: 7a98d2353b50e911800098039bcc0f74
                        [VA] Vendor specific: MLX:MODL=CX516A:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0
                        [V0] Vendor specific: PCIeGen3 x16
                        [RV] Reserved: checksum good, 2 byte(s) reserved
                End
        Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [c0] Vendor Specific Information: Len=18 <?>
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 04, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 1
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [1c0 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [230 v1] Access Control Services
                ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Kernel driver in use: mlx5_core
        Kernel modules: mlx5_core

Next I replaced it with an intel i810 (dual 100G as well):


It consumes more than the Mellanox in idle, the power jumped by 3 more watts (24.8 to 27.8). This time it negotiates PCIe Gen4, thus the theoretical bus limit is 126 Gbps (16GT/s * 128/130 *8), here’s the lspci output:

0001:c1:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller E810-C for QSFP [8086:1592] (rev 02)
        Subsystem: Intel Corporation Ethernet Network Adapter E810-C-Q2 [8086:0002]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 136
        Region 0: Memory at 1800000000 (64-bit, prefetchable) [size=32M]
        Region 3: Memory at 1806000000 (64-bit, prefetchable) [size=64K]
        Expansion ROM at 60300000 [virtual] [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [70] MSI-X: Enable+ Count=1024 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00008000
        Capabilities: [a0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 16GT/s, Width x8 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
                LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [e0] Vital Product Data
                Product Name: Intel(R) Ethernet Network Adapter E810-CQDA2
                Read-only fields:
                        [V1] Vendor specific: Intel(R) Ethernet Network Adapter E810-CQDA2
                        [PN] Part number: K91258-006
                        [SN] Serial number: B49691B3CA78
                        [V2] Vendor specific: 0621
                        [RV] Reserved: checksum good, 1 byte(s) reserved
                End
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 1
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [150 v1] Device Serial Number b4-96-91-ff-ff-b3-ca-78
        Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration- 10BitTagReq- Interrupt Message Number: 000
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy- 10BitTagReq-
                IOVSta: Migration-
                Initial VFs: 128, Total VFs: 128, Number of VFs: 0, Function Dependency Link: 00
                VF offset: 256, stride: 1, Device ID: 1889
                Supported Page Size: 00000553, System Page Size: 00000001
                Region 0: Memory at 0000001804000000 (64-bit, prefetchable)
                Region 3: Memory at 0000001806020000 (64-bit, prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Capabilities: [1a0 v1] Transaction Processing Hints
                Device specific mode supported
                No steering table available
        Capabilities: [1b0 v1] Access Control Services
                ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [1d0 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [200 v1] Data Link Feature <?>
        Capabilities: [210 v1] Physical Layer 16.0 GT/s <?>
        Capabilities: [250 v1] Lane Margining at the Receiver <?>
        Kernel driver in use: ice
        Kernel modules: ice

The tuning was not trivial at all, I tried many combination attempts of numbers of queues, core bindings etc. In the end what worked best was:

  • when mostly sending traffic (i.e. data coming from the cache):
    • using only the 4 biggest cores for the NIC
    • using the 4 medium cores for haproxy
  • when forwarding traffic (rx then tx):
    • using all a720 cores for IRQs
    • using all a720 cores for haproxy as well

The thing is, i810 has always been quite hard to tune CPU-wise, its Rx path is super expensive and can quickly end up in polling mode with a few spinning ksoftirqd. So in case of high Rx traffic, it’s better to spread the load on all 8 cores even if it means competing with user land, because that’s still the best way to avoid triggering polling mode.

By doing this, I could reach:

  • 73-78 Gbps of HTTP responses retrieved from the cache.
  • 37 Gbps of HTTPS responses retrieved from the cache
  • 18.2 Gbps of forwarded HTTP traffic
  • 15.5 Gbps for forwarded HTTPS traffic

For the cache (4 cores haproxy, 4 cores NIC):

The CPU is not full (~15-20% idle in HTTP), which indicates saturation on the PCIe side. It’s not surprising, we’re dealing with 6.5 Mpps, and i810 descriptors are large, so that requires quite a bunch of extra bandwidth on the PCIe and number of transactions. Also the MaxPayload is only 128 bytes, which is super small for large transfers like this. All of this represents significant overhead, and passing 80G at the network level for 126G on the PCIe side seems reasonable to me. Here’s a graph of one of these short tests:
cache

Most of the CPU usage is in memory copies from userland to the NIC:

Overhead  Shared Object            Symbol
   9.47%  [kernel]                 [k] __arch_copy_from_user
   8.40%  [ice]                    [k] ice_clean_rx_irq
   8.24%  [ice]                    [k] ice_process_skb_fields
   8.09%  [kernel]                 [k] dcache_clean_poc
   5.58%  [kernel]                 [k] cpuidle_enter_state
   3.84%  libc.so.6                [.] 0x00000000000a1f08
   3.30%  [kernel]                 [k] _raw_spin_unlock_irqrestore
   2.63%  libc.so.6                [.] 0x00000000000a1f14
   2.25%  [ice]                    [k] ice_napi_poll
   1.29%  [kernel]                 [k] skb_release_data
   1.27%  libc.so.6                [.] 0x00000000000a1f10
   1.15%  [kernel]                 [k] __irqentry_text_start
   1.14%  [ice]                    [k] ice_start_xmit
   1.05%  [kernel]                 [k] ip_finish_output2

Now for forwarded traffic
As mentioned above, Rx costs a lot of CPU, so the bandwidth is much lower, even when we sum in+out.
This time the CPU is almost entirely used (2-4% total idle over 8 cores). There are still a few dead times due to the competition between userland and driver which can cause rings to fill up and stop receiving, but if you increase rings, the performance decreases (less cache efficiency). Here’s what we’re seeing:
forward

Here’s an example of CPU usage during the HTTP transfer:

top - 16:45:16 up 29 min,  0 user,  load average: 11.65, 9.06, 6.59
Tasks: 347 total,   5 running, 340 sleeping,   2 stopped,   0 zombie
%Cpu0  : 10.7 us, 63.1 sy,  0.0 ni,  4.9 id,  0.0 wa,  1.9 hi, 19.4 si,  0.0 st 
%Cpu1  :  0.0 us,  2.9 sy,  0.0 ni, 96.1 id,  0.0 wa,  0.0 hi,  1.0 si,  0.0 st 
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st 
%Cpu3  :  0.0 us,  1.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st 
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st 
%Cpu5  :  0.0 us,  1.9 sy,  0.0 ni,  1.0 id,  0.0 wa,  0.0 hi, 97.1 si,  0.0 st 
%Cpu6  :  1.0 us,  5.8 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 93.2 si,  0.0 st 
%Cpu7  :  6.7 us, 51.0 sy,  0.0 ni, 11.5 id,  0.0 wa,  1.0 hi, 29.8 si,  0.0 st 
%Cpu8  :  1.9 us, 20.4 sy,  0.0 ni,  5.8 id,  0.0 wa,  1.0 hi, 70.9 si,  0.0 st 
%Cpu9  :  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,100.0 si,  0.0 st 
%Cpu10 :  1.0 us,  1.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 98.1 si,  0.0 st 
%Cpu11 :  1.0 us, 55.3 sy,  0.0 ni,  5.8 id,  0.0 wa,  2.9 hi, 35.0 si,  0.0 st 
MiB Mem :  15222.3 total,  12697.0 free,   1469.3 used,   1291.2 buff/cache     
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  13753.1 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                
  31668 willy     20   0  675156  85028  13800 S 211.5   0.5   4:36.38 haproxy                
     60 root      20   0       0      0      0 R  99.0   0.0   2:42.46 ksoftirqd/9            
     65 root      20   0       0      0      0 R  97.1   0.0   2:39.53 ksoftirqd/10           
     40 root      20   0       0      0      0 R  96.2   0.0   0:36.26 ksoftirqd/5            
     45 root      20   0       0      0      0 R  90.4   0.0   1:17.32 ksoftirqd/6            
     55 root      20   0       0      0      0 S  56.7   0.0   1:00.98 ksoftirqd/8            
     50 root      20   0       0      0      0 S   5.8   0.0   0:52.94 ksoftirqd/7       

The driver is indeed suffering and the load varies quickly between 0 and 100% for each queue, which is also reflected in the unstable %si (softirq, mostly rx) vs %sy (system, mostly tx) at the top.

As expected, perf top now shows more CPU on the Rx path (more than 50% for the first 4 functions used on the Rx path):

Overhead  Shared Object     Symbol                                                             
  27.28%  [kernel]          [k] __arch_copy_to_user
   9.97%  [ice]             [k] ice_process_skb_fields
   9.18%  [ice]             [k] ice_clean_rx_irq
   4.87%  [kernel]          [k] dcache_clean_poc
   3.98%  [kernel]          [k] __arch_copy_from_user
   2.22%  [kernel]          [k] _raw_spin_unlock_irqrestore
   1.92%  [kernel]          [k] __irqentry_text_start
   1.23%  [ice]             [k] ice_start_xmit
   1.19%  [kernel]          [k] el0_svc_common.constprop.0
   1.03%  haproxy           [.] h1_fastfwd
   0.99%  [kernel]          [k] arch_local_irq_restore
   0.88%  [kernel]          [k] __skb_datagram_iter
   0.81%  [ice]             [k] ice_napi_poll

That’s the best I could get, and it’s very good, I’d even say excellent for a machine of this size and power usage.

Speaking of power usage, the total consumption reached up to 37W max during large transfers. The NIC doesn’t consume much more when working (a few extra watts), I think that the 10 extra watts were for 2/3 in the CPU and 1/3 in the NIC. By the way the consumption was about the same in SSL while the network traffic is lower and CPU used more.

I might retest once we get good support for mainline and we can also make the CPU cores run at their target speed. But overall I’m pretty impressed by what this little device can achieve for now. It looks like CIX is taking the notion of CPU performance seriously, and contrary to many in the past, is not focusing on outdated inexpensive cores. I can easily imagine server variants of this CPU with 16 big cores at 2.8-3.0G doing marvels in the low-consumption area and self-hosting (NAS etc).

8 Likes

Mainline Fedora 41 with an AMD WX 5100 (Low Power Workstation Polaris 10 card) booted up without issue and shows the UEFI. Seems pretty stable. It booted with USB-C but needed to swap to ATX to complete benchmarks.


Linux fedora 6.12.11-200.fc41.aarch64 #1 SMP PREEMPT_DYNAMIC Fri Jan 24 05:21:03 UTC 2025 aarch64 GNU/Linux

=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      AMD
    GL_RENDERER:    AMD Radeon Pro WX 5100 Graphics (radeonsi, polaris10, LLVM 19.1.7, DRM 3.59, 6.12.11-200.fc41.aarch64)
    GL_VERSION:     4.6 (Compatibility Profile) Mesa 24.3.4
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 3028 FrameTime: 0.330 ms
[build] use-vbo=true: FPS: 3954 FrameTime: 0.253 ms
[texture] texture-filter=nearest: FPS: 3787 FrameTime: 0.264 ms
[texture] texture-filter=linear: FPS: 3708 FrameTime: 0.270 ms
[texture] texture-filter=mipmap: FPS: 3436 FrameTime: 0.291 ms
[shading] shading=gouraud: FPS: 3199 FrameTime: 0.313 ms
[shading] shading=blinn-phong-inf: FPS: 3540 FrameTime: 0.283 ms
[shading] shading=phong: FPS: 3876 FrameTime: 0.258 ms
[shading] shading=cel: FPS: 3634 FrameTime: 0.275 ms
[bump] bump-render=high-poly: FPS: 3977 FrameTime: 0.252 ms
[bump] bump-render=normals: FPS: 3519 FrameTime: 0.284 ms
[bump] bump-render=height: FPS: 4011 FrameTime: 0.249 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 3744 FrameTime: 0.267 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 3312 FrameTime: 0.302 ms
[pulsar] light=false:quads=5:texture=false: FPS: 3814 FrameTime: 0.262 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 2688 FrameTime: 0.372 ms
[desktop] effect=shadow:windows=4: FPS: 2912 FrameTime: 0.343 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 1007 FrameTime: 0.994 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 1766 FrameTime: 0.566 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 1081 FrameTime: 0.926 ms
[ideas] speed=duration: FPS: 1551 FrameTime: 0.645 ms
[jellyfish] <default>: FPS: 3700 FrameTime: 0.270 ms
[terrain] <default>: FPS: 693 FrameTime: 1.444 ms
[shadow] <default>: FPS: 3604 FrameTime: 0.277 ms
[refract] <default>: FPS: 1633 FrameTime: 0.612 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 3753 FrameTime: 0.266 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 3590 FrameTime: 0.279 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 3125 FrameTime: 0.320 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 4039 FrameTime: 0.248 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 3248 FrameTime: 0.308 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 3091 FrameTime: 0.324 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 3275 FrameTime: 0.305 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 4083 FrameTime: 0.245 ms
=======================================================
                                  glmark2 Score: 3131 
=======================================================

=======================================================
    vkmark 2017.08
=======================================================
    Vendor ID:      0x1002
    Device ID:      0x67C7
    Device Name:    AMD Radeon Pro WX 5100 Graphics (RADV POLARIS10)
    Driver Version: 100675588
    Device UUID:    826fa47eb82102297ca50e9690d2b3c0
=======================================================
[vertex] device-local=true: FPS: 12413 FrameTime: 0.081 ms
[vertex] device-local=false: FPS: 5290 FrameTime: 0.189 ms
[texture] anisotropy=0: FPS: 11155 FrameTime: 0.090 ms
[texture] anisotropy=16: FPS: 11092 FrameTime: 0.090 ms
[shading] shading=gouraud: FPS: 10706 FrameTime: 0.093 ms
[shading] shading=blinn-phong-inf: FPS: 10385 FrameTime: 0.096 ms
[shading] shading=phong: FPS: 10085 FrameTime: 0.099 ms
[shading] shading=cel: FPS: 10058 FrameTime: 0.099 ms
[effect2d] kernel=edge: FPS: 9622 FrameTime: 0.104 ms
[effect2d] kernel=blur: FPS: 4667 FrameTime: 0.214 ms
[desktop] <default>: FPS: 8118 FrameTime: 0.123 ms
[cube] <default>: FPS: 11941 FrameTime: 0.084 ms
[clear] <default>: FPS: 10372 FrameTime: 0.096 ms
=======================================================
                                   vkmark Score: 9684
=======================================================
5 Likes

Oh nice! I tried to boot a mainline 6.12 on it but failed. As usual in the Arm world, it could be due to any missing option. I’ll download an fc41 image and try again.

So actually I was partially wrong. Your post made me want to try again. I only tested with the provided DTB since I initially thought it was mandatory. But if a default fedora works, it doesn’t have the DTB. So I tried again, removing the DTB and removing acpi=off efi=noruntime, and guess what ? My generic kernel that I normally use for the rock5 boots straight out of the box with PCIe support:

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
root@orion-o6:~# uname -a
Linux orion-o6 6.12.6-rk-1 #2 SMP Fri Dec 20 10:48:07 CET 2024 aarch64 GNU/Linux
root@orion-o6:~# cat /proc/cmdline 
BOOT_IMAGE=/image-6.12.6-rk-1 loglevel=0 console=ttyAMA2,115200 earlycon=pl011,0x040d0000 arm-smmu-v3.disable_bypass=0 root=/dev/nvme0n1p2 rootwait rw
root@orion-o6:~# lspci 
00:00.0 PCI bridge: Cadence Design Systems, Inc. Device 0100
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8126 (rev 01)
30:00.0 PCI bridge: Cadence Design Systems, Inc. Device 0100
31:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8126 (rev 01)
90:00.0 PCI bridge: Cadence Design Systems, Inc. Device 0100
91:00.0 Non-Volatile memory controller: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1202 (rev 01)

This is extremely promising!

There are still important missing points:

  • all cores run at 1800 MHz (no cpufreq is recognized, though I could be missing some modules)
  • I have not seen boot messages. I suspect the earlycon is at fault but I could be wrong. However the console is OK once reaching userland.
  • the fan doesn’t spin. Most likely I have not enabled the suitable hwmon driver. Will check.

Edit: fixed the reported CPU frequency, I was wrong, I used the patched “mhz” utility to measure the FPU speed.

1 Like

Can someone provide dumps from Debian or Fedora 6.12 kernel?

dmesg, lspci -vvvnn, ACPI tables