Hello everyone,
I’m trying to run the model Llama3_2-Vision from Sophgo’s LLM-TPU GitHub repository. I’m running into a memory allocation error that I can’t seem to resolve.
Here’s what I’ve done so far:
- I followed the setup instructions provided in the repository and ran the
run_demo.sh
script.
-
BM-SMI shows approximately 8GB of RAM allocated on the TPU before the process fails.
Error Details
Here’s the log output around where the error occurs:
[BMRT][BMProfileDeviceBase:190] INFO: gdma=0, tiu=0, mcu=0
Model[../../bmodels/llama3.2-11b-vision_int4_512seq.bmodel] loading ....
[BMRT][load_bmodel:1939] INFO: Loading bmodel from [../../bmodels/llama3.2-11b-vision_int4_512seq.bmodel]
[BMRT][load_bmodel:1704] INFO: Bmodel loaded, version 2.2
[BMRT][load_bmodel:1706] INFO: pre net num: 0, load net num: 86
[BMRT][load_tpu_module:1802] INFO: loading firmware in bmodel
[BMRT][preload_funcs:2121] INFO: core_id=0, multi_fullnet_func_id=22
[BMRT][preload_funcs:2124] INFO: core_id=0, dynamic_fullnet_func_id=23
[bmlib_memory][error] bm_alloc_gmem failed, dev_id = 0, size = 0x6b9a3000
[BM_CHECK][error] BM_CHECK_RET fail /workspace/libsophon/bmlib/src/bmlib_memory.cpp: bm_malloc_device_byte_heap_mask_u64: 1121
[BMRT][Register:2019] FATAL: coeff alloc failed, size[0x6b9a3000]
python3: /mnt/NVME/projekte/LLM-TPU/models/Llama3_2-Vision/python_demo/chat.cpp:138: void Llama3_2::init(const std::vector<int>&, std::string): Assertion `true == ret' failed.
./run_demo.sh: line 28: 39708 Aborted python3 python_demo/pipeline.py --model ../../bmodels/llama3.2-11b-vision_int4_512seq.bmodel --image_path python_demo/test.jpg --tokenizer ./token_config --devid 0
It seems like the TPU fails to allocate enough memory, causing bm_alloc_gmem failed
errors when trying to allocate 0x6b9a3000
bytes. The process then exits due to a coeff alloc failed
error.
Troubleshooting Steps Tried
- Ensuring that all paths and configurations are correctly set.
- Checking memory usage on the TPU, which confirms 8GB is allocated just before the error.
Has anyone encountered a similar issue when running Llama3_2-Vision on the BM1684X?
I appreciate any input and help you can provide!
best regards
Matthias