With the advent of 2nd revision of WLS experimental hardware I got nasty system freezes on ported firmware. As they happened always close to power cycling of probe section I suspected new ADC chip using I2C bus. Past experience with I2C drivers freezing system led to tests revealing bus corruption caused by non-deterministic behavior of slave during transient power rail states not isolated on time by the TXS0102 voltage shifter.
Experimental architecture uses two absolute pressure transducers, one underwater and one inside data logger above the surface: extra complexity is compensated easily with savings on irrationally expensive capillary cables especially when long distance between peers are considered. Analogous transducer voltage is turned into numbers using precision ADCs. On redesign I have replaced MCP3551 chip (21-bit ADC with SPI) with MCP3421 (18-bit ADC with I2C) for many reasons: enough precision, smaller case and rewiring to less loaded bus. This change released hardware SPI of MCU just to be used exclusively by the SD card. It added however second slave device to I2C so far used only by the PCF8563 real-time clock chip.
While RTC on I2C is constantly powered on and works continuously, added ADC chip belongs to measurement subsystem that is turned on on-demand when air pressure and other parameters must be measured. To isolate power cycling of I2C slave I have used first time TXS0102 bidirectional voltage-shifter with high-impedance state in a way depicted below.
Most of the time probe section is turned off, VCC_5V power-line voltage is close to ground, same as OE input of U2 cutting of left-side section of I2C. With probe powered on 5V lines of SDA/SCL are connected to I2C and MCU can talk to U1. And it worked like a charm from the beginning.
Troubles started to pop up in long runs. With logger setup of frequent measurements (every 3 minutes) first freeze occurred roughly after 2 hours 15 minutes. Re-run and another freeze withing 1 hour 30 minutes. With less intense measurements freeze happened after several hours. In all cases it happened in section of code responsible for computing inside MCU that could not lead to any problem. But tracking log files it happened always in less than a second after probe was turned off.
I was puzzled as there was no obvious reason for MCU freeze. And that was the moment when intuition built on past experiences with I2C shined through: I recalled cascading problem with RTC on I2C leading to freeze. I had bug so that many threads accessed simultaneously RTC clock without exclusive access; that led to I2C commands interlacing and mangling from multiple tasks talking to RTC and in turn to infinite waits for transmission end. As RTC is used by common logger functionality all tasks were effectively blocked forever. Click in my head and I realized that RTC frequent access is interfered by transient states of ADC power rail. All because I used high-Z triggering by power-rail (OE connected to VCC_5V) instead driving OE from MCU; when 5V rail drops down voltage converter still passes signal while undervolted ADC messes up bus lines. If I had OE connected to MCU I would cut-off 5V section first (OE down) and then after short pause powered off probe and ADC chip. I did not have any free pins of Atmega128 anymore and the only option was to tight up OE with 5V rail.
Quickly I have crafted test that does intense RTC access during ADC power off. This time I got freeze in every single run and looking at number of printed dots, each representing single successful RTC read, freezing moments were very coherent and clustering around specific moment of time – thus specific transient voltage trashing I2C.
Powering off was obvious: it takes some time to discharge capacitors and bring probe rails to ground. I suspected power on either, even though transient state is way shorter, it still would lead to the same problem, very rarely but still threatening stability. And there is nothing worse then weeks or months of uptime finished by cryptic freeze of device deployed in the field. I did the same test for power-on moment with even more intense RTC reads. Woo-hoo! freezes every time again. Now I knew I need to protect I2C for power-on and power-off transient states.
When working on embedded multi-tasking software I realized that it is good practice to have synchronization layer between interlacing tasks and non-interlaced peripherals. For I2C, no matter how many slaves I got (and how many drivers per slave were available for upper layers), I have already introduced lock object for I2C bus itself. This way I could just add 2 lines of code to lock it for the specific time of voltage stabilization; measuring rise and fall time I just wrapped up code with locks like this:
static void probePower(bool turnOn) { i2csync_lock(); if (turnOn) { PROBE_PWR_ON(); vTaskDelay(T_MSEC(200)); } else { PROBE_PWR_OFF(); vTaskDelay(T_MSEC(1000)); } i2csync_unlock(); } |
This fixed problem definitely. Previous freeze appearing after 2 hours of runtime did not occur after 3 weeks of continuous run with the same speed. Blocking RTC read for a short moment was irrelevant trade-off in case of solution with processing far away from real-time processing.
See also other related articles:
[posts_by_tag tag=”WLS”]