Fogwise Airbox: Vision Model Support for Radxa Fogwise Airbox

M_Kraft · August 2, 2024, 7:45pm

Hello everyone,

I am exploring the Radxa Fogwise Airbox and its compatibility with various vision models. Specifically, I’m interested in:

Phi-3 Vision https://huggingface.co/microsoft/Phi-3-vision-128k-instruct
llava-llama3 https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf
llava https://github.com/haotian-liu/LLaVA/blob/main/docs/MODEL_ZOO.md

Does anyone know if there are pre-converted versions of any of these models, or models with similar capabilities, available for the Radxa Fogwise Airbox?
Any information on their performance and setup would be greatly appreciated.

Thanks in advance!

Best regards,
Matthias

3djelly · August 3, 2024, 10:33am

note specifically the models you asked about, but you can see what is available in the model zoo.

Morgan · August 6, 2024, 2:19am

Hi, @M_Kraft

For now we don’t have any pre-converted multimodal model,

but you still can convert it using the latest version of TPU-MLIR

best,
Morgan

M_Kraft · November 11, 2024, 7:05pm

Hello everyone,

I’m trying to run the model Llama3_2-Vision from Sophgo’s LLM-TPU GitHub repository. I’m running into a memory allocation error that I can’t seem to resolve.

Here’s what I’ve done so far:

I followed the setup instructions provided in the repository and ran the run_demo.sh script.
BM-SMI shows approximately 8GB of RAM allocated on the TPU before the process fails.

Error Details

Here’s the log output around where the error occurs:

[BMRT][BMProfileDeviceBase:190] INFO: gdma=0, tiu=0, mcu=0
Model[../../bmodels/llama3.2-11b-vision_int4_512seq.bmodel] loading ....
[BMRT][load_bmodel:1939] INFO: Loading bmodel from [../../bmodels/llama3.2-11b-vision_int4_512seq.bmodel]
[BMRT][load_bmodel:1704] INFO: Bmodel loaded, version 2.2
[BMRT][load_bmodel:1706] INFO: pre net num: 0, load net num: 86
[BMRT][load_tpu_module:1802] INFO: loading firmware in bmodel
[BMRT][preload_funcs:2121] INFO: core_id=0, multi_fullnet_func_id=22
[BMRT][preload_funcs:2124] INFO: core_id=0, dynamic_fullnet_func_id=23
[bmlib_memory][error] bm_alloc_gmem failed, dev_id = 0, size = 0x6b9a3000
[BM_CHECK][error] BM_CHECK_RET fail /workspace/libsophon/bmlib/src/bmlib_memory.cpp: bm_malloc_device_byte_heap_mask_u64: 1121
[BMRT][Register:2019] FATAL: coeff alloc failed, size[0x6b9a3000]
python3: /mnt/NVME/projekte/LLM-TPU/models/Llama3_2-Vision/python_demo/chat.cpp:138: void Llama3_2::init(const std::vector<int>&, std::string): Assertion `true == ret' failed.
./run_demo.sh: line 28: 39708 Aborted                 python3 python_demo/pipeline.py --model ../../bmodels/llama3.2-11b-vision_int4_512seq.bmodel --image_path python_demo/test.jpg --tokenizer ./token_config --devid 0

It seems like the TPU fails to allocate enough memory, causing bm_alloc_gmem failed errors when trying to allocate 0x6b9a3000 bytes. The process then exits due to a coeff alloc failed error.

Troubleshooting Steps Tried

Ensuring that all paths and configurations are correctly set.
Checking memory usage on the TPU, which confirms 8GB is allocated just before the error.

Has anyone encountered a similar issue when running Llama3_2-Vision on the BM1684X?

I appreciate any input and help you can provide!

best regards
Matthias

Morgan · November 12, 2024, 6:50am

Hi, @M_Kraft

It seems 8g TPU memory is not enough, try to increase TPU memory up to 12g

Best

M_Kraft · November 13, 2024, 8:54pm

Hello @Morgan ,

Thank you, for the quick response!

I appreciate the suggestion to increase the TPU memory to 12GB. I checked the documentation provided here, but I thought the maximum assignable memory for the TPU was capped at 7615MB. Could you clarify if there’s a way around this limit or if I’m missing something in the memory allocation process?

Thanks again for your help!

best regards
Matthias

Morgan · March 3, 2025, 1:19pm

hi, @M_Kraft

The TPU memory is shared with the system memory, which is 16GB, and the TPU memory consists of NPU, VPU, and VPP memories. Even though the NPU is set to a maximum of 7615MB, when the NPU memory is not enough to load a bmodel, the NPU would try to uses the VPU and VPP memories to avoid memory overload errors, so increase three of them memories can avoid the error.

best,
Morgan