Voice assistance

Andreas_M · May 30, 2024, 7:41am

Hi,
I have the crazy idea to build a battery powered device with offline voice assistance. The most time, it should be in the idle mode, and the voice activity detector in combination with a wake word detector are waiting for a wake word to activate the general speech recognition. In idle mode, my device shouldn’t consume more than 1-2 Watts.
Your AICore SG2300x may have enough computational power. But, it has no GPU for a display and audio has to be recorded somewhere. Is it somehow possible to extend SG2300x by a GPU? Another idea would be to take one of the smaller Rock PIs to transfer the audio data to the SG2300x and to power it up and down. When powered down the language model(s) should be kept in RAM. What hardware composition do you recommend? Is there something in development on your side that I should wait a bit longer? What would be the idle power consumption for your suggested composition?

Kind regards,
Andreas

jack · May 30, 2024, 12:58pm

I think the better idea is to use Airbox as your LLM server and you can use some battery powered wireless mcu such as esp32 for the voice streaming to the Airbox and send back the response of words and then use TTS to output audio on ESP32.

Andreas_M · May 30, 2024, 1:59pm

Many thanks for your answer. You seem to prefer the 2 computers solution. But in my single portable device, both computers will be battery powered. Do AICore or Airbox have a sleep mode and can they be woken up with a wake-up pin? If yes, what is the power usage in that case?

Kind regards,
Andreas

Morgan · August 6, 2024, 2:24am

Hi, @Andreas_M

Airbox do not have sleep mode

best,
Morgan

Raspberry_Tech · November 12, 2024, 2:47pm

Hi Andreas,

I am currently building this type of project for my university dissertation project.

I am using the home assistant platform to combine all systems together and have ported Whisper TPU to work with home assistant, OpenWakeWord and Piper TTS.

Currently only Whisper is running on the TPU and there is progress to be made on the TTS side of things. Furthermore I have not yet implemented an LLM into the workflow yet, but hope to do so at one some point as it will provide more complex responses to a user.

In terms of power consumption, I am also building an off-grid system as traditional compute systems with a dedicated graphics card consumes 100s of watts of power, and this system seems to have suitable compute power. That being said, as this area of expertise is new, there is a lot of tweaking and messing about to get everything working.

I am currently running the home assistant platform and the other specified voice assistant pipeline software on the Fogwise Airbox and have had minimal issues so far. OpenWakeWord is ran on an ESP 32-S3-Box. The Fogwise Airbox consumes anywhere from 20-30W of power normally, so you would need to account for this.

In terms of running this off-grid, you would need to have a powerpack which could sustain this, combined with solar cells. In my basic calculations, to power this completely off-grid and to account for cloudy days, you will need to have a 250-300W solar cell charging a power pack. Storage values for the powerpack I have not yet calculated.

Regards,
Yusuf Ibn Saifullah