Over time firmware stabilizes but rework never entirely completes, either due to inevitable bugs or further enhancements. Manual firmware update applied to couple stations in the field has relatively low effort and is just dauting task. While number of stations grows, sparse distribution or distant locations starts to occur all that becomes significant task. Planning ahead I decided to equip the system with secure FOTA (firmware over-the-air) feature.
FOTA architecture typical for microcontroller environment is as follows: software is composed of two pieces, static bootloader and modifiable application. Since bootloader must be small it is application code that downloads firmware: application checks the server if there is an update for this station, downloads image to local filesystem and reboots. On restart bootloader looks for firmware and if detected decrypts it, then flashes the application area and runs new application code.
Update architecture must be resistant to multi-level firmware corruption. Application rejects partial downloads as obviously corrupted. Completed download is checked by application with its checksum that was built-in to image during development phase. Also bootloader do the same check before flashing. After flashing checksum is calculated again from reprogrammed memory since wear-out or unprotected power supply brown-out can silently mess up flashed data. In case of flashing error bootloader tries to reflash again few times and if it does not work halts the system as not-runnable.
In my case communication between application and bootloader is file based. Since bootloader flash space is very limited, file system library can access it in read-only mode; it means bootloader cannot remove files after successful flashing and application may forget to do it after start. To prevent from reflashing the same image on each reboot, bootloader keeps last checksum at the end of EEPROM section and if firmware checksum matches bootloader skips flashing and simply runs existing application.
To protect intellectual property FOTA must be also secure. While cloning hardware cannot be prevented (which is proven by Chinese market offering) keeping ready-to-flash firmware on SD card makes complete solution giveaway. For that reason binary image downloaded from server and stored on SD card is strongly encrypted and only bootloader can decrypt it. Bootloader is factory-flashed and MCU flash reading is blocked. Application and bootloader for its work use in-memory decryption to not leave any open-text traces on SD card.
I worked on bootloader first because I have never done it before and I could not find ready-made one that fits my needs. Surprisingly making first bootloader simply logging to UART and starting application was easy – during one evening I have written tiny loader application, instructed linker to relocate it to bootloader-reserved upper memory segments starting at location determined by fuse bits, keeping MCU state as close to hardware reset possible and finally jumping to zero location that starts regular application.
Writing a bootloader is easy, writing small bootloader is complex. This is why most bootloaders are written in assembler. I was lucky that Atmega128 can host up to 8kB of bootloader, comparing to 512B-4kB in most of smaller Atmegas. That was crucial as I needed FAT file system access – regular Chan’s FatFs library was no-go as it takes far more than 8kB of flash, his PetitFS with all few functions enabled is nearly 4kB. With 4kB more I could save time and stick to C-language instead of learning full-blown AVR assembler. PetiFS though is read-only as it avoids space-hungry disk’s clusters manipulations and I needed to code it in a way that bootloader does not reflash the same update file persisting on SD card.
Bootloader that is hosting in-memory decryption algorithm, checksum calculation from file and from flash, EEPROM access, disk reading and UART debugging traces took nearly 7.5kB in total leaving a little space for further tuning. 120kB memory left for application is still more than 30% free.
Application part seemed to be easier but in fact implementation was more time consuming that in case of bootloader. Strong encryption came for free: it is already used in the code that stores and transmits data in encrypted manner as IP anti-theft measure. CRC16 calculation was also part of open royalty-free implementation. File system maniulation on SD card is used for a long time.
First problem that popped up was binary data of firmware to be downloaded from network. Encrypted measurements data, even though is binary, is base64 encoded and sent as text thru HTTP over GSM. At this level of implementation I felt reluctant to introduce binary mode TCP connections on SIM900 module, which makes MCU driving DTR line and would lead to heavy changes of my command-processor code. An 120kB binary firmware image base64 encoded became 160kB data chunk so I did a series of download tests of larger (250kB) text files. During tests I have also simulated delays over the channel (on server side stream). All that worked good without any issues like timeouts and I stayed with textual data transfer and RESTful API on server side.
After merging isolated FOTA code into main trunk I observed another issue. In tests I was using single context of execution and it was working fine. Once I merged code into RTOS environment each download was corrupted. Investigation shown that switching tasks was creating time windows when SIM900 buffers was not read on time and was loosing data. I had two choices, either rebuild SIM900 driver to use hardware control-flow for buffers or make buffer reads non-interrupted. During hardware design I predicted need for buffer control and have DTR/CTS/RTS lines connected to MCU. I decided though to keep prototype implementation simple using non-interruptible approach. As OTA check is occasional all other RTOS tasks are suspended for download time and code is optimized for fast buffer reading and dumping to temporary file. I decided that short moment of blackout in regular periodic work is acceptable. In critical conditions (like high water) while schedule of measurements and data delivery is very tight OTA is simply disabled.
See also other related articles: