EDIT: Follow updates here: https://github.com/Koenkk/zigbee2mqtt/discussions/214…62#discussioncomment-8510320
> smaller PRs - he said
Alright, ducks, row, quoting one's self... ✔️ ✔️ ✔️
So, let's start with the disclaimer... **Don't use this in your live network.** It has only been tested by one (me, who'd have guessed!). Expect problems until this is thoroughly tested in a larger number of installations and nasty buggers are fixed. Otherwise, you could break the house, literally...
I'm introducing this as a completely separate adapter (can be select via `configuration.yaml` as always, with `ember` as `adapter`). This is not only because of the complete rewrite, but also because it **currently only supports EZSP 13 / 7.4.x**. The focus was put on future support, rather than backwards compatibility; firmware updates being very easy these days. This limitation is hardcoded to prevent any misunderstanding; Z2M will not start if your adapter is not at that version and you try to use this. On the same basis, and because backup was not fully supported before, backups from adapters pre EZSP 12 will be rejected for now. _Although you can modify the backup file manually to fake the version, if you are 100% sure your backup file is fine, but in doing so, you assume the risk that you might restore a broken network (in case the backup you have is improper/incomplete)._ This might change in the future if we notice no issue with previous versions (I don't have access to older backups to know what's what...).
Since this is a separate adapter (albeit for the same protocol), and it does not alter any of the old one's logic, nor any of herdsman logic (except for types related to the introduction of the new adapter), it can be safely reviewed, introduced, and tested. Only users that explicitly chose to do so will use this.
_So... from here on, you can put a "should" in front of... well, every line..._
## Features
### Full ASH protocol support (per silabs).
- TX/RETX/RX queueing (singly-linked lists); allocation from static pool.
- Flag-driven logic.
- Support fo ACK sliding window (TX_K=3).
- Counters for all frame types and errors (logged on stop).
### Full EZSP protocol support (per silabs).
- Buffalo-driven serialization (extending the existing one to fit EZSP's needs), all buffer-based, all function-based.
- Interval-based NCP callback handling
- Command priority. Callbacks are automatically set aside while waiting for a response to a command.
- All current EZSP frames supported.
_Actually, there's too much support here, we'll have to cleanup the functions not intended for Trust Center/Coordinator use. It was faster to implement all than to look closely at each of them and determine their usage... In the meantime, it has no impact since they just aren't used (except a few more lines of code)._
### Adapter
- Per stack-type configuration (at the moment, using TCP path detection to assume zigbeed, else default config; _config not yet implemented_).
- Multiprotocol/Zigbeed should reap the benefits here, by having much larger values for stack settings (and Z2M knowing about it).
- Configurable intervals for Ezsp callback processing & queue dispatching (static at the moment; _config not yet implemented_).
- Request queue with support for start/stop dispatching (interval-driven) and automatic deferring on network "busy"-type errors.
- Single waitress to handle delivery check, response, callback response, event and potential error (and of course the dreaded timeout).
- NCP Reset support from adapter-level down in case the NCP or host detect an error that requires it (most do...). Should now be able to fully resume transacting after a "recoverable" (software) fail (will only disconnect at Z2M-level on failed reset).
- Energy scan support, to detect how busy channels are in your environment (_currently unavailable until Z2M frontend supports it_).
- Simple watchdog that reads (& logs) then clear NCP counters at large-ish intervals (currently every hour; haven't seen the need for a watchdog that triggers very often, this one just allows to get regular feedback on the state of the stack...).
### Backup/Restore (gets its own header!)
Support for backup/restore using `zigpy/open-coordinator-backup` format is implemented. I've tested several combinations, and it restored everything without issue every time. See [Z2M-ember-devlogs](https://github.com/Nerivec/Z2M-ember-devlogs), I put some logs and dev/debug info in there on this and other stuff. Same as ZStack, backup has to match config for it to be used, otherwise it defaults to forming a network with config and ignores the backup. If config matches the network on the adapter, it, of course, resumes operation and bypasses restoration.
Support for NVM3 tokens backup/restore is also implemented, although Z2M+frontend will require update to support this. _Note: I've tested only the backup part. I didn't want to spent hours fixing my test adapter in the middle of all this in case restoring didn't go well..._ It creates a single buffer containing the tokens in a specific format -similar to silabs' one-, allowing it to be stored in hex format in JSON (size depends on stack config; expected 3k+ bytes, should be much larger with zigbeed).
### General
- Typing, typing everywhere! (or everywhere I could...) It should avoid many mistakes in future updates, especially since we're dealing with varying payloads & the likes.
- Commented as much as I could, same reason as above. Can typedoc the `ember` adapter part pretty well if needed (I zipped a version matching this PR and put it in the repo linked above -didn't check if it got everything though, as always, read the code...-).
### Tested
- All the basic stuff, pairing, commands, network map, LQI reporting, etc.
- Groups (multicast).
- Adapter backup/restore to and from various states.
- Adapter NVM3 backup.
- I can only assume it worked since I was able to identify all the relevant information in the output data (keys, IDs, etc), but I haven't tried restoring yet (hence its presence in the below category).
- Energy scan (matches my expectations on my environment -WiFi + live Zigbee network on different channels-).
- Fake-crashing the NCP to test recovery after reset (also a couple of real "forced" crashes, for good measure...)
_Tests done with devices from "the trash pile", i.e. devices that always caused trouble in my live network in the past and ended up in the closet. So, if it works with those..._
### Untested
- Socket-based (and thus multiprotocol). I don't see why it wouldn't work, unless the new ASH protocol handling doesn't like Sockets... might need some tweaking. If someone can test that and report. Since I haven't used multiprotocol (nor have any non-Zigbee device), I'm really not in the best position to test this properly.
- NOTE: This is 7.4-only, which is known to have issues with multiprotocol...
- Touchlink (don't have any).
- GreenPower (don't have any; bit of a mess that one, can't seem to find decent docs).
- Spamming devices (some are really good at crowding the network, but I don't have any).
- Large networks (my test environment is just a few devices of different types; router/end-device/known-to-be-crappy-device).
- Adapter NVM3 restore.
- OTA, logs show it is requesting fine, however I don't currently have any device that need updating so could not test this further...
### Dev Stuff
Being no "expert" in NodeJS, it is entirely possible I f'ed up some parts of this (though it is working, so I must have done at least some things right!). If one passes through here, some feedback/upgrades on the implementations of the various Node-specific features (related to queueing, waitressing, handling tick stuff & whatnot) would be fantastic. Promises, promises :wink:
Also, if someone wants to tackle writing some tests for all this... I did the ASH layer (the critical one) & some utilities, but I'm not that familiar with Jest; it is really slow implementing while reading the docs... Also, I'm sure the ones I've written can be improved.
NOTE: For anyone working on the code of the `ezsp` adapter, make sure to import the proper files. Don't want to import from the wrong folder; names are similar/identical in many cases but definitely different types.
#### TODOs
- ✅ Obvious, but so we don't forget... `zigbee2mqtt` repo will need to be updated to support `ember` adapter, else will fail to validate config on start (@Koenkk I'll let you do that). Should be:
- lib/types/types.d.ts: `adapter?: 'deconz' | 'zstack' | 'ezsp' | 'zigate' | 'ember',`
- lib/util/settings.schema.json: `"enum": ["deconz", "zstack", "zigate", "ezsp", "ember", "auto"],`
- This PR contains the necessary changes at herdsman-level:
- Added `EmberAdapter` to `Adapter.create`
- Added `ember` to `SerialPortOptions`
- Added `ember` to Controller test (error detection that needs all adapters listed)
- Fixed `fromUnifiedBackup` to include EZSP-specific stuff.
- [Postponed] Implement various configs for `ember`. @Koenkk if I can get your input on how you'd like this done? Here's the list:
- Ezsp callback dispatch interval, 60ms for now.
- Request queue dispatch interval, 60ms for now.
- Stack config (`default` or `zigbeed` available at the moment).
- Possibly also concentrator settings (untested, for now uses static defaults).
- Might need to reduce the watchdog time to avoid rollovers on larger networks? The counter rollover callback handler will log any rollover, so we should get some feedback on this pretty quickly if there is indeed a need.
- The issue with the frame counter on the network key (per silabs docs, can't let it reach all `F`'s, uint32_t) will have to be dealt with eventually. With backup supported, I'm sure a few users with larger networks will hit it within a year (according to my guesstimates...). I've disabled the code to broadcast a new network key switch for now, since Z2M would not support it (one-way config at the adapter level). Also, I haven't tested it; and some more research needs to be done on impact, timings, etc. Until then, it will give the user a warning, if they get "too close to the sun".
- GreenPower & Touchlink need special attention, sooner rather than later. Since I couldn't test either, I did the implementation "blindfolded"...
- Cleanup ezsp commands not used by Trust Center / coordinator.
- Got some conflicting info on `ENABLE_ROUTE_DISCOVERY` in APS options while using source routing (even though it would seem to make sense not to use it with source routing...); for now I left it in there, doesn't seem to hurt; can remove if troubles arise from it.
- Test the untested stuff, and write more tests to test all this; say that three times fast :wink:
- The codebase is currently pretty drastically checking everything and erroring out, which might create issues with poorly implemented devices... Will need to see on a case by case basis I suppose...
### Final note
In case anyone is wondering, and to make my next point, here are some stats on the codebase to make this work (in lines):
| path | files | code | comment | blank | total |
| :--- | ---: | ---: | ---: | ---: | ---: |
| Total | 22 | 11,685 | 8,326 | 2,915 | 22,926 |
| . | 4 | 1,541 | 2,732 | 290 | 4,563 |
| adapter | 6 | 3,135 | 1,443 | 732 | 5,310 |
| ezsp | 4 | 5,517 | 3,339 | 1,491 | 10,347 |
| uart | 6 | 1,403 | 749 | 384 | 2,536 |
| utils | 2 | 89 | 63 | 18 | 170 |
My next point: bear with me if errors slipped in there or if I screwed up something... It is more than likely 😅
---
#### Unrelated dev stuff
A few things, not related to this implementation (or maybe it is...), that I've noticed while testing this (instead of opening half a dozen issues...):
- Any reason why offline devices aren't automatically excluded when doing looping actions like network map? (Guess I'd never tried that before, waiting for the fail seems counterproductive).
- Groups UI doesn't seem to always update properly on add/remove? (might be something related to next point)
- I've noticed on several occasion that frontend would not properly report some part(s) of the UI (a state, a button, etc), until a cache clear. (Seems related to [this](https://github.com/Koenkk/zigbee2mqtt/discussions/21267#discussioncomment-8387524).)
- Something with groups management seems to create errors (unbind being called without a bind being created before?).
- Controller test `Call controller constructor options mixed with default options` seems to fail rather randomly with Jest timeout. Might need a bit of tweaking. (I probably only noticed because I ran the darn tests so many times...)
- Current `ezsp` adapter backup for v13 is broken (although not enabled, so not a problem per say). The wrong keys would be exported in that code path (wrong enum used, the newer `exportKey` command doesn't use `EmberKeyType`; they use the Sec Man one).