Asystent głosowy 8-ball

boskikak · 31 Styczeń 2025 20:47

Z każdym swoim kolejnym asystentem głosowym mówię sobie: to już był ostatni, nie mam na to czasu…
Tutaj motywacją było pojawienie się opcji stereo mikrofonu i z taką myślą stworzyłem oto taką czarną kulę:

Plik to wydruku znajdziesz tu:
8-ball.zip (4,3 MB)

Sam projekt pochłonął gigantyczną część wolnego czasu (prawie 2 miesiące), 1kg filamentu do drukarki, sporo gotówki i jeszcze więcej nerwów. Postaram się skupić na tym co najważniejsze i na swoich przemyśleniach a nagromadziło się ich mnóstwo. Najważniejszy cel nie został osiągnięty czyli mikrofon stereo. Nikt na razie tego nie uruchomił na esphome i nikt nie zgłosił takiego problemu więc założyłem issue i czekam na odzew.
Z racji tego że udało mi się skompletować 2 takie gadające kule to mam okazję porównać jak sobie radzą różne frameworki z tym samym zadaniem. O ile na arduino zbudowałem do praktycznie w tydzień, o tyle z ESP-IDF były wieczne problemy wynikające bardziej z mojej niewiedzy niż z samego środowiska.

Budowa:

ESP32 S3 N16R8 (do ESP-IDF)
ESP32 S3 ZERO (do Arduino)
2 głośniki 40mm 3W 4ohm
2 wzmacniacze MAX98357
mikrofon INMP411
pasek led (wykorzystałem chip SK6812 72d/metr 5V)
buzzer 3V
gniazdo usb C
3 tact switch 12x12

Kod do ESP-IDF:

substitutions:
  leds: "28"  # number of diodes in the strip
  vol_step: "0.1" # media player volume step
  waiting_time: "8s" # duration of listening 


# PINOUT
  pin_buzzer: GPIO10
  pin_leds: GPIO38
  pin_lrclk_mic: GPIO16 #WS
  pin_bclk_mic: GPIO17 #SCK
  pin_din_mic: GPIO15
  pin_lrclk_spk: GPIO5
  pin_bclk_spk: GPIO6
  pin_din_spk: GPIO7
  pin_vol_up: "42"
  pin_vol_down: "40"
  pin_play: "41"

esphome:
  name: 8-ball
  friendly_name: 8-ball
  min_version: 2024.12.4
  platformio_options:
    board_build.flash_mode: dio

esp32:
  board: esp32-s3-devkitc-1
  variant: esp32s3
  flash_size: 16MB
  framework:
    type: esp-idf
    version: recommended
    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
      CONFIG_ESP32S3_INSTRUCTION_CACHE_32KB: "y"
      CONFIG_ESP32_S3_BOX_BOARD: "y"
      CONFIG_SPIRAM_ALLOW_STACK_EXTERNAL_MEMORY: "y"

      CONFIG_SPIRAM_TRY_ALLOCATE_WIFI_LWIP: "y"

      # Settings based on https://github.com/espressif/esp-adf/issues/297#issuecomment-783811702
      CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
      CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
      CONFIG_ESP32_WIFI_STATIC_TX_BUFFER: "y"
      CONFIG_ESP32_WIFI_TX_BUFFER_TYPE: "0"
      CONFIG_ESP32_WIFI_STATIC_TX_BUFFER_NUM: "8"
      CONFIG_ESP32_WIFI_CACHE_TX_BUFFER_NUM: "32"
      CONFIG_ESP32_WIFI_AMPDU_TX_ENABLED: "y"
      CONFIG_ESP32_WIFI_TX_BA_WIN: "16"
      CONFIG_ESP32_WIFI_AMPDU_RX_ENABLED: "y"
      CONFIG_ESP32_WIFI_RX_BA_WIN: "32"
      CONFIG_LWIP_MAX_ACTIVE_TCP: "16"
      CONFIG_LWIP_MAX_LISTENING_TCP: "16"
      CONFIG_TCP_MAXRTX: "12"
      CONFIG_TCP_SYNMAXRTX: "6"
      CONFIG_TCP_MSS: "1436"
      CONFIG_TCP_MSL: "60000"
      CONFIG_TCP_SND_BUF_DEFAULT: "65535"
      CONFIG_TCP_WND_DEFAULT: "65535"  # Adjusted from linked settings to avoid compilation error
      CONFIG_TCP_RECVMBOX_SIZE: "512"
      CONFIG_TCP_QUEUE_OOSEQ: "y"
      CONFIG_TCP_OVERSIZE_MSS: "y"
      CONFIG_LWIP_WND_SCALE: "y"
      CONFIG_TCP_RCV_SCALE: "3"
      CONFIG_LWIP_TCPIP_RECVMBOX_SIZE: "512"

      CONFIG_BT_ALLOCATION_FROM_SPIRAM_FIRST: "y"
      CONFIG_BT_BLE_DYNAMIC_ENV_MEMORY: "y"

      CONFIG_MBEDTLS_EXTERNAL_MEM_ALLOC: "y"
      CONFIG_MBEDTLS_SSL_PROTO_TLS1_3: "y"  # TLS1.3 support isn't enabled by default in IDF 5.1.5



# Enable logging
logger:

# Enable Home Assistant API
api:
  encryption:
    key: "3d3RcoYaHkHmaFCrx2jgzkpIYjvhMqZAx3ASUXlKu8U="
  on_client_connected:
     - media_player.volume_set: !lambda "return 0.60;"
     - light.turn_on:
        id: ring
        red: 0%
        green: 100%
        blue: 0%
        brightness: 50%
     - delay: 3s
     - light.turn_on:
        id: ring
        red: 0%
        green: 100%
        blue: 100%
        brightness: 50%
     - script.execute: reset_ww
     - delay: 1s
     - light.turn_off: ring

ota:
  - platform: esphome
    password: "xxxxxxxxxxxxxxxxxxxxxx"

psram:
  mode: octal
  speed: 80MHz

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  manual_ip:
    static_ip: xxxxxxxxxx
    gateway: xxxxxxxxxxx
    subnet: xxxxxxxxxxx

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "8-Ball Fallback Hotspot"
    password: "xxxxxxxxxxxxxx"

captive_portal:

external_components:
  - source:
      type: git
      url: https://github.com/esphome/home-assistant-voice-pe
      ref: dev
    components:
      - media_player
      - nabu
    refresh: 0s    

button:
  - platform: restart
    name: Reboot


text_sensor:  # unlock this option in Home Assistant
  - platform: homeassistant
    id: assist_satellite
    entity_id: assist_satellite.8_ball_assist_satellite
    internal: true
    on_value:
      - if:
          condition:
            lambda: 'return id(assist_satellite).state == "responding";'
          then:
            - script.execute: speaking
            - wait_until:
                lambda: 'return id(assist_satellite).state == "idle";'
            - script.execute: reset_ww
            - light.turn_off:  ring 
      - if:
          condition:
            lambda: 'return id(assist_satellite).state == "processing";'
          then:
            - light.turn_off:  ring 
            - script.execute: reset_ww
      - if:
          condition:
            lambda: 'return id(assist_satellite).state == "listening";'
          then:
            - script.execute: detecting
            - wait_until:
                 condition:
                   not:
                     lambda: 'return id(assist_satellite).state == "listening";'
                 timeout: ${waiting_time}
            - voice_assistant.stop
            - light.turn_off: ring 
            - script.execute: reset_ww



voice_assistant:
  id: va
  microphone: asr_mic
  media_player: player
  use_wake_word: false
  noise_suppression_level: 2
  auto_gain: 31dBFS 
  on_tts_end:
    - delay: 100ms
    - script.execute: reset_ww
  on_stt_end:
    - delay: 100ms
    - script.execute: reset_ww
  on_error:
    - delay: 100ms
    - script.execute: reset_ww
    - script.execute: error


micro_wake_word:
  id: mww
  models:
    - model: alexa
  microphone: asr_mic
  on_wake_word_detected:
    - voice_assistant.start:
        wake_word: !lambda return wake_word;
    - delay: 100ms   
    - rtttl.play: 'two_short:d=4,o=5,b=100:16e6,16e6'
    - if:
          condition:
            lambda: return id(player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_PLAYING;
          then:
            - media_player.pause: player
            - wait_until:
                 condition:
                   lambda: 'return id(assist_satellite).state == "idle";'
                 timeout: 5s
            - media_player.play: player

i2s_audio:
  - id: i2s_spk
    i2s_lrclk_pin: ${pin_lrclk_spk}
    i2s_bclk_pin: ${pin_bclk_spk}
  - id: i2s_mic
    i2s_lrclk_pin: ${pin_lrclk_mic}
    i2s_bclk_pin: ${pin_bclk_mic}

microphone:
  - platform: i2s_audio
    id: asr_mic
    adc_type: external
    i2s_din_pin: ${pin_din_mic}
    channel: left
    i2s_audio_id: i2s_mic

speaker:
  - platform: i2s_audio
    sample_rate: 48000
    i2s_mode: primary
    i2s_dout_pin: ${pin_din_spk}
    bits_per_sample: 32bit
    i2s_audio_id: i2s_spk
    dac_type: external
    channel: mono
    timeout: never
    buffer_duration: 100ms

media_player:
  - platform: nabu
    id: player
    name: glosnik
    internal: false
    speaker:
    sample_rate: 48000
    volume_increment: ${vol_step}
    volume_min: 0.1
    volume_max: 1
    on_announcement:
      - script.execute: speaking
      - wait_until:
         condition:
             media_player.is_idle: player
      - light.turn_off: ring
      - media_player.stop: player
      - script.execute: reset_ww

switch:
  - platform: template
    id: assist
    icon: mdi:account-tie-voice
    name: "Asystent"
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    on_turn_on: 
       - micro_wake_word.start
    on_turn_off:
       - micro_wake_word.stop


binary_sensor:
  - platform: gpio
    pin: 
      number: ${pin_vol_up}
      mode: INPUT_PULLUP
      inverted: false
    name: "Vol+"
    internal: true
    on_press:
       - if:
           condition:
             lambda: 'return id(player).volume < 1.00;'
           then:
             - media_player.volume_up
             - script.execute: volume

  - platform: gpio
    pin: 
      number: ${pin_vol_down}
      mode: INPUT_PULLUP
      inverted: false     
    name: "Vol-"
    internal: true
    on_press:
       - if:
           condition:
             lambda: 'return id(player).volume > 0.0;'
           then:
             - media_player.volume_down
             - script.execute: volume

  - platform: gpio
    pin: 
      number: ${pin_play}
      mode: INPUT_PULLUP
      inverted: false
    name: "Play/Pause"
    internal: true
    on_press:
      - media_player.toggle
      - script.execute: reset_ww

output:
  - platform: ledc
    pin: ${pin_buzzer}
    id: rtttl_out
rtttl:
  output: rtttl_out
  id: my_rtttl  

light:
  - platform: esp32_rmt_led_strip
    default_transition_length: 0.5s
    id: ring
    chipset: SK6812
    pin: ${pin_leds}
    num_leds: ${leds}
    rgb_order: GRB
    rmt_channel: 0
    name: "Ring"
    effects:
      - pulse:
          name: "Pulse"
          transition_length: 300ms
          update_interval: 250ms
          min_brightness: 15%
          max_brightness: 60%
      - addressable_color_wipe:
          name: "Detecting"
          colors:
            - red: 10%
              green: 100%
              blue: 100%
              num_leds: 7
              gradient: false
            - red: 0%
              green: 0%
              blue: 0%
              num_leds: 19
          add_led_interval: 12ms
          reverse: true

sensor:
  - platform: homeassistant
    id: vol_led  # number of light diodes
    internal: true
    entity_id: media_player.8_ball_glosnik
    attribute: volume_level
    accuracy_decimals: 0
    filters:
      - lambda: return x / ${vol_step} * 2 ; # 2 diodes on one step volume
      - round: 0

script:
  - id: detecting
    then:
      - light.turn_on:
         id: ring
         red: 20%
         green: 80%
         blue: 100%
         brightness: 50%
         effect: Detecting

  - id: speaking
    then:
      - light.turn_on:
         id: ring
         red: 0%
         green: 100%
         blue: 0%
         brightness: 50%
         effect: Pulse

  - id: error
    then:
      - light.turn_on:
         id: ring
         red: 100%
         green: 0%
         blue: 0%
         brightness: 50%
         effect: none
      - delay: 2s
      - light.turn_off: ring


  - id: volume
    then:
      - light.addressable_set:
         id: ring
         range_from: 0    
         range_to: !lambda return id(vol_led).state;
         red: 100%
         green: 5%
         blue: 80%
      - delay: 30ms
      - light.turn_off: ring

  - id: reset_ww
    then:
     - micro_wake_word.stop     
     - delay: 250ms
     - micro_wake_word.start

Można powiedzieć że wreszcie na frameworku IDF pojawia się media player za sprawą wejścia na rynek Voice Assistant PE gdzie został też udostępniony kod: GitHub - esphome/home-assistant-voice-pe: Home Assistant Voice PE. Część ludzi na tym kodzie bazuje swoje projekty i jeśli ktoś by chciał w 8-ballu mieć to samo to nic nie stoi na przeszkodzie Przed tym kodem miałem też media playera opartego na esp-adf ale jednak postanowiłem wykorzystać najświeższą bibliotekę od PE. Mój kod może nie jest tak dopieszczony tak tamten ale nie chciałem burzyć swojej koncepcji obcym projektem. W tym asystencie jest też wykorzystany Micro Wake Word więc “myślenie” odbywa się po stronie płytki i to jest mega plus dla tego rozwiązania bo “nie ma spamu w logach” tak jak jest to w przypadku Arduino. Co do odtwarzania muzyki i ciągle działającego mikrofonu… wg mnie jakość dźwięku jest kiepska (ale może być to też przyczyna innych głośników), głośność jest zdecydowanie niższa niż w przypadku Arduino a działający mikrofon w trakcie grania muzyki praktycznie jest cały czas zagłuszany przez co wybudzenie asystenta graniczy z cudem. Może to być problem mojej konstrukcji głośnika lub mojego kodu bo podobno oryginalny kod od Voice PE nie ma z tym problemów (podobno).

Czas na Arduino:

substitutions:
  leds: "24"
  vol_step: "0.1"

# PINOUT
  pin_buzzer: GPIO6
  pin_leds: GPIO5
  pin_lrclk_mic: GPIO1 #WS
  pin_bclk_mic: GPIO2 #SCK
  pin_din_mic: GPIO4
  pin_lrclk_spk: GPIO7
  pin_bclk_spk: GPIO8
  pin_din_spk: GPIO9
  pin_vol_up: "12"
  pin_vol_down: "11"
  pin_play: "13"


esphome:
  name: "8a-ball"
  friendly_name: "8a-ball"

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: arduino


psram:
  mode: quad
  speed: 80MHz
    
# Enable logging
logger:

# Enable Home Assistant API
api:
  encryption:
    key: "ossjF+4Ajlf5IOeTrDnflxyU6olNM+VpPauU25xg9U0="
  on_client_connected:
     - media_player.volume_set: !lambda "return 0.40;"
     - light.turn_on:
        id: ring
        red: 0%
        green: 100%
        blue: 0%
        brightness: 100%
     - delay: 2s
     - script.execute: reset_ww
     - delay: 1s
     - light.turn_off: ring


ota:
  - platform: esphome
    password: "xxxxxxxxxxxxxx"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  fast_connect: on

  manual_ip:
    static_ip: xxxxxxxxxxxxxx
    gateway: xxxxxxxxxxxxx
    subnet: xxxxxxxxxxxxxxxx

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "8a-Ball Fallback Hotspot"
    password: "xxxxxxxxxxxxx"


captive_portal:

sensor:
  - platform: homeassistant
    id: vol_led  #ilosc led w zaleznosci od volume
    internal: true
    entity_id: media_player.8a_ball_glosnik
    attribute: volume_level
    accuracy_decimals: 0
    filters:
      - lambda: return x / ${vol_step} * 2 ;
      - round: 0

output:
  - platform: ledc
    pin: ${pin_buzzer} 
    id: rtttl_out

rtttl:
  output: rtttl_out
  id: my_rtttl  

button:
  - platform: restart
    name: Reboot

binary_sensor:
  - platform: gpio
    pin: 
      number: ${pin_vol_up}
      mode: INPUT_PULLUP
      inverted: false
    name: "Vol+"
    internal: true
    on_press:
       - if:
           condition:
             lambda: 'return id(player).volume < 1.0;'
           then:
             - homeassistant.action:
                 action: media_player.volume_set
                 data_template:
                   entity_id: media_player.8a_ball_glosnik
                   volume_level: "{{ state_attr('media_player.8a_ball_glosnik', 'volume_level') + ${vol_step} }}"
             - script.execute: volume

  - platform: gpio
    pin: 
      number: ${pin_vol_down}
      mode: INPUT_PULLUP
      inverted: false     
    name: "Vol-"
    internal: true
    on_press:
       - if:
           condition:
             lambda: 'return id(player).volume > 0.0;'
           then:
             - homeassistant.action:
                 action: media_player.volume_set
                 data_template:
                   entity_id: media_player.8a_ball_glosnik
                   volume_level: "{{ state_attr('media_player.8a_ball_glosnik', 'volume_level') - ${vol_step} }}"
             - script.execute: volume

  - platform: gpio
    pin: 
      number: ${pin_play}
      mode: INPUT_PULLUP
      inverted: false
    name: "Play/Pause"
    internal: true
    on_press:
      - homeassistant.action:
          action: media_player.media_play_pause
          data_template:
            entity_id: media_player.8a_ball_glosnik

light:
  - platform: fastled_clockless
    default_transition_length: 0.5s
    id: ring
    chipset: WS2812
    pin: ${pin_leds}
    num_leds: ${leds}
    rgb_order: GRB
    name: "Ring"
    effects:
      - pulse:
          name: "Pulse"
          transition_length: 300ms
          update_interval: 200ms
          min_brightness: 10%
          max_brightness: 100%
      - addressable_scan:
          name: "Scan"
          move_interval: 30ms
          scan_width: 8
      - addressable_color_wipe:
          name: "Detecting"
          colors:
            - red: 10%
              green: 100%
              blue: 100%
              num_leds: 8
              gradient: false
            - red: 0%
              green: 0%
              blue: 0%
              num_leds: 19
          add_led_interval: 13ms
          reverse: true

text_sensor:
  - platform: homeassistant
    id: assist_satellite
    entity_id: assist_satellite.8a_ball_assist_satellite
    internal: true
    on_value:
      - if:
          condition:
            lambda: 'return id(assist_satellite).state == "responding";'
          then:
            - script.execute: talk
            - wait_until:
                lambda: 'return id(assist_satellite).state == "idle";'
            - light.turn_off:  ring 
            - script.execute: reset_ww

i2s_audio:
  - id: i2s_mic           #INMP411
    i2s_lrclk_pin: ${pin_lrclk_mic}  #WS
    i2s_bclk_pin: ${pin_bclk_mic}   #SCK
  - id: i2s_out           #MAX98357A
    i2s_lrclk_pin: ${pin_lrclk_spk}  #LRCLK
    i2s_bclk_pin: ${pin_bclk_spk}   #BLCK


media_player:
  - platform: i2s_audio
    id: player
    name: Glosnik
    dac_type: external
    i2s_audio_id: i2s_out
    mode: stereo
    i2s_dout_pin: ${pin_din_spk} #DIN,SD
    on_play:
      - switch.turn_off: assist
    on_pause:
      - switch.turn_on: assist
    on_idle:
      - switch.turn_on: assist
    on_announcement:
      - switch.turn_off: assist
      - script.execute: talk
      - wait_until:
         condition:
             media_player.is_idle: player
      - light.turn_off: ring
      - media_player.stop: player
      - switch.turn_on: assist


microphone:
  - platform: i2s_audio
    id: mic
    adc_type: external 
    channel: left
    bits_per_sample: 32bit
    i2s_audio_id: i2s_mic
    i2s_din_pin: ${pin_din_mic}   #DIN,SDIN,SD,SDATA
    pdm: false

voice_assistant:
  id: va
  microphone: mic
  media_player: player
  use_wake_word: true
  conversation_timeout: 60s
  noise_suppression_level: 2
  auto_gain: 10dBFS 
  on_listening:
    - script.execute: detecting
    - rtttl.play: 'two_short:d=4,o=5,b=100:16e6,16e6'  
  on_tts_start: 
    - script.execute: talk
  on_stt_end:
    then: 
    - delay: 100ms
    - wait_until:
        not:
          media_player.is_playing:
    - script.execute: reset_ww
    - light.turn_off: ring
  on_tts_end:
    then: 
    - delay: 100ms
    - wait_until:
        not:
          media_player.is_playing:
    - script.execute: reset_ww
    - light.turn_off: ring
  on_error:
    - light.turn_on:
        id: ring
        red: 100%
        green: 0%
        blue: 0%
        brightness: 100%
        effect: none
    - delay: 2s
    - script.execute: reset_ww
    - light.turn_off: ring

switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    internal: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);

  - platform: template
    id: assist
    icon: mdi:account-tie-voice
    name: "Asystent"
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    on_turn_on: 
      - switch.turn_on: use_wake_word
    on_turn_off:
      - switch.turn_off: use_wake_word


interval:
  - interval: 300s
    then:
      - if:
          condition:
            - switch.is_on: assist
          then:
           - script.execute: reset_ww
           - light.turn_off: ring

script:
  - id: reset_ww
    then:
     - switch.turn_off: use_wake_word     
     - delay: 250ms
     - switch.turn_on: use_wake_word

  - id: detecting
    then:
      - light.turn_on:
         id: ring
         red: 20%
         green: 80%
         blue: 100%
         brightness: 100%
         effect: Detecting

  - id: talk
    then:
      - light.turn_on:
         id: ring
         red: 0%
         green: 100%
         blue: 0%
         brightness: 100%
         effect: pulse

  - id: volume
    then:
      - light.addressable_set:
         id: ring
         range_from: 0    
         range_to: !lambda return id(vol_led).state;
         red: 100%
         green: 5%
         blue: 80%
      - delay: 30ms
      - light.turn_off: ring

Jest to po prostu trochę bardziej rozbudowana wersja moich poprzednich kodów. Patrząc w logi działania asystenta zaczynasz sobie zdawać sprawę jaką wielką ma przewagę IDF.
Wykorzystuję tutaj zależność: albo głośnik albo asystent i powiem szczerze że po tylu miesiącach korzystania z takiego rozwiązania wolę dalej ten framework od powyższego. Patrząc w logi działania asystenta zaczynasz sobie zdawać sprawę jaką wielką ma przewagę IDF. Poza wyłączeniem mikfrofonu podczas odtwarzania muzyki zauważyłem jeszcze jedną dosyć istotną różnicę między tymi asystentami:

częściej (lepiej) reaguje na wywoływanie wake wordem
fałszywe wybudzenia są na porządku dziennym gdzie w esp-idf praktycznie to nie występuje. Dlatego pojawia się w kodzie przełącznik “Asystent” który wyłącza nadsłuch. Szczególnie przydaję się to podczas gości aby nie czuli się podsłuchiwani

Nie podejmuję w tym wątku tematów silników asystentów bo korzystam cały czas z płatnego Nabu Casy i nie widzę sensu testowania innych wariantów.

Dodatkowo wrzucam jeszcze kilka zdjęć:

szopen · 31 Styczeń 2025 23:32

Tu autor twierdzi, że używa 2 mikrofonów i procesora audio XMOS XU-316

microphone:
  - platform: nabu_microphone
    i2s_din_pin: GPIO44
    adc_type: external
    pdm: false
    sample_rate: 48000
    bits_per_sample: 32bit
    i2s_mode: secondary
    i2s_audio_id: i2s_input
    channel_0:
      id: nabu_mic_mww
    channel_1:
      id: nabu_mic_va

YAML zawierający “sam sos” (uwaga użyte jest include w pliku wyższego poziomu więc to nie jest całość)

github.com

formatBCE/Koala-Satellite/blob/main/config/common/koala-base.yaml

substitutions:
  # Phases of the Voice Assistant
  # The voice assistant is ready to be triggered by a wake word
  voice_assist_idle_phase_id: '1'
  # The voice assistant is waiting for a voice command (after being triggered by the wake word)
  voice_assist_waiting_for_command_phase_id: '2'
  # The voice assistant is listening for a voice command
  voice_assist_listening_for_command_phase_id: '3'
  # The voice assistant is currently processing the command
  voice_assist_thinking_phase_id: '4'
  # The voice assistant is replying to the command
  voice_assist_replying_phase_id: '5'
  # The voice assistant is not ready
  voice_assist_not_ready_phase_id: '10'
  # The voice assistant encountered an error
  voice_assist_error_phase_id: '11'

esphome:
  project: 
    name: formatbce.Koala Satellite

This file has been truncated. show original

link do całego projektu

boskikak · 1 Luty 2025 07:03

To właśnie fragment kodu Voice PE którego dosyć często przeglądałem jak popatrzysz dalej to zauważysz że jeden mikrofon jest wykorzystywany do wake worda a drugi do nadsłuchu więc są 2 mikrofony ale nie działające jako stereo. Jest też wykorzystana inna platforma niż tą która deklaruje esphome do trybu stereo.

szopen · 1 Luty 2025 12:42

Aż tak się nie wczytywałem, szczególnie, że zastosowane rozwiązanie (nowy komponent w ESPHome? nabu_microphone??) chyba nie ma jeszcze dokumentacji (a w każdym razie jej nie znalazłem na stable, ani na dokumentacyjnej becie, a nadal nie mam wkrętki na asystenta głosowego jeśli nie jest to do zrealizowania w sensowny sposób w pełni lokalnie oraz prosto i łatwo - brakuje mi właśnie tego czasu, który sam oceniłeś na miesiące pracy).

W każdym razie w tym rozwiązaniu jest jakaś metoda (nie stereo tylko wykorzystanie każdego mikrofonu do innych celów) podejrzewam, że ktoś nad tym pracował dość długo (deweloperzy z seeedstudio?) skoro jest wykorzystane gotowe rozwiązanie mikrofonów w postaci płytki Respeaker Lite.

boskikak · 1 Luty 2025 16:21

Nie ukrywam że sporo czasu mi to pochłonęło. Sam wydruk to ok 10h a wersji obudów miałem 4
Zanim powstała obudowa to przed 2 tygodnie walczyłem z uruchomieniem mikrofonów w stereo żeby poszerzyć pole działania. Myślę żebym nawet nie zaczynał tej zabawy gdyby nie ten wpis. Wtedy jeszcze nie zdawałem sobie sprawy że może to nie zadziałać
Coś czuję po kościach że i tak jeszcze to tego wrócę…

isom1266 · 1 Luty 2025 19:37

Kolego pełen szacun za tyle samozaparcia. Obudowy bardzo pomysłowe, wygląda to bardzo ok. Mam do Ciebie pytanie, napisałeś że w trakcie grania muzyki wybudzanie asystenta jest trudne, u mnie nie ma tego problemu. Co prawda mam tylko prototyp i głośnik jest jakieś 20 cm od mikrofonu, mam natomiast duży problem z rozpoznawaniem mowy przez Whisper.
Wygląda to jak na foto

Czy u Ciebie nie ma takich cyrków ?
Zaznaczyłem to co się udało. Polecenia są proste “włącz światło”, itp

boskikak · 1 Luty 2025 19:48

Jak miałem wszystko na pająku to też nie miałem problemu z wybudzaniem podczas muzyki ale jak umieściłem to wszystko w obudowie to już była inna bajka. Te logi świadczą tylko o bardzo słabym silniku whispera. Żeby rozpoznawanie było lepsze musisz przynajmniej skorzystać z Vosk. Radzi sobie bardzo dobrze z poprawnością słów ale zasięg realnego przechwytu to jakieś 2 metry. Są inne metody na STT ale musisz poszukać tutaj na forum, ja osobiście ich nie używałem

isom1266 · 1 Luty 2025 19:55

Bardzo dziękuję za szybką i rzeczową podpowiedz.

szopen · 2 Luty 2025 15:24

Pewnie to czytałeś skoro siedzisz głęboko w temacie

Custom Enclosure Design Recommendations

If you’re designing your own enclosure, here are some important guidelines to ensure optimal audio performance:

Ensure that the distance between the speaker and the two microphones is uniform, and minimize the transmission of speaker vibrations to the microphones to maintain audio clarity.

The enclosure should allow human voice to reach each microphone without obstruction, ensuring direct sound (as opposed to reflected sound) can evenly reach both microphones for accurate sound capture.

If you are designing a sealed microphone chamber, avoid creating a resonant cavity that is narrow at both ends and wide in the middle, as this can interfere with front-end signal processing. If a sealed chamber is not feasible, incorporate multiple perforations in the enclosure to prevent sound reflections from bouncing inside before reaching the microphones. This helps to maintain signal integrity.

źródło
Enclosure Installation Guide

temat wprawdzie dotyczy ReSpeaker Lite, gdzie jest użyty procesor sygnałowy Xmos z firmware co do którego nigdzie nie znalazłem opisu jak obrabia sygnał z tych 2 mikrofonów

github.com

respeaker/ReSpeaker_Lite/blob/master/xmos_firmwares/changelog.md

# XMOS XU316 DFU Firmware Change Log

### I2S Firmware 
- v1.0.9: change default DAC output gain(volume) from 0dB to -2dB; support i2c read mute_status
- v1.0.8: support new flash ZB25VQ32D
- v1.0.7: support i2c control speaker mute and output channels
- v1.0.6: change PRODUCT_STR to ReSpeaker Lite, fix ws2812 control bug
- v1.0.5: support i2c read vnr value

### USB Firmware
- v2.0.7: support new flash ZB25VQ32D
- v2.0.6: set high to WS2812_PIN to give up the control of WS2812 after boot

(jakkolwiek jeśli dobrze rozumiem jest przez ten procesor przepychany też sygnał dla głośnika, dzięki czemu jak sądzę w czasie stałego nasłuchu można odfiltrować sygnał odtwarzanej przez niego muzyki)

A warto zauważyć, że Seeedstudio używa rozwiązań z procesorami sygnałowymi Xmos’a od prawie 10 lat

edit - jeszcze jeden fajny dokument o projektowaniu obudów uwzględniających użycie mikrofonów MEMS
https://www.mouser.com/pdfDocs/Infineon-AN557_MEMS_microphone_mechanical_and_acoustical_implementation-AN-v01_01-EN.pdf

i kolejny również z zagadnienia związanego z rozpoznawaniem głosu