For prototype I estimated that 128kB or flash would suffice and I was right; firmware took 75kB even though fatten up by embedded strings used across the code for tracing. On the other hand 4kB RAM of Atmega128 turned to be almost insufficient: I had to tune stack sizes of tasks with ten-bytes precision. Tests were running for weaks without problem, yet extra function added in chain of calls could potentially corrupt memory. I had to make more room to avoid hard to debug problems before field deployment.
Typically in embedded environment memory usage was under control from the very beginning. Adding feature by feature however I could not avoid static allocations done by referenced libraries and, to some extent, by my own code. At the end of software development 2kB of RAM was just gone:
- FatFS is main contributor, it needs to keep in memory at least one 512-byte cluster; with all extra data structures it took around 650 bytes.
- FreeRTOS control structures took 200 bytes, without counting configurable heap used later for managed tasks.
- Two hardware and one software UART needed Rx/Tx buffers, even keeping them short it took nearly 200 bytes.
- GSM module command parser and tokenizer, another 250 bytes.
- Tracing facility (logging) took another 200 bytes for file handles and constantly used formatting buffers.
- Remaining few hundred bytes is used in tens of small structures used across the code including some space for stack to start main() and left for ISR stack.
Remaining 2kB was dedicated to RTOS and also appeared to be tight place:
- RTOS scheduler takes 200 bytes for task control block and other structures.
- Each task needs its own stack. Mimimal size is defined by size of registers placed when switching context and for empty task on Atmega128 it is 85 bytes. In reality I needed from 300 to 500 bytes per task.
- Synchronization objects (mutexes, semaphores) are based on queues, and each queue size of one takes around 50 bytes.
At the beginning I assumed RTOS will let me create task per functional block like GSM, sensors, stack monitor etc. that will communicate asynchronously. That model would require more than 5 tasks and twice more queues plus data structures. With real implementation and stack monitoring I realized that 2kB of memory got me 3 tasks and 2 short queues for commands and couple more more sync objects:
- My implementation of cron-like scheduler task works with one-minute resolution and determines activities to do: next measurements, push data to server, check memory etc. Different schedules can be applied based on situation detected: high water, low battery and so on. Activities are placed in requests’ queue.
- Executor task realizes requests. It is kind of dispatcher that delegates realization to lower level libraries. Executor takes care of activities that are relatively short and independent so can be sequenced without impact on functional behavior. The only long running tasks are GSM-specific.
- GSM task executes communication over cellular network. These task are relatively long running and can take from 1 up to 5 minutes to complete; it is because GSM modem is logged out of network most of the time (saving power) and login takes tens of seconds, same for GPRS context establishment. Also data transmission of small data sets (hundreds to thousands of bytes) takes some time as GSM modem has quite short buffer (half a kilobyte) and low speed UART is used (due to lack of RAM for buffers).
Adding memory
Looking again on my achievement one could say something like this: you do not need RTOS at all, there is virtually no “real time” aspect when one-minute resolution considered, actions could be ordered in single-threaded streamlined sequence of operations. Then rewrite existing libraries to force them using dynamically allocated memory to better balance RAM usage.
I had two goals though: (a) finish project with stable results AND (b) learn different techniques along the way. I was too stubborn to drop RTOS and I was careful choosing free-of-charge libs to not look inside too much if not forced to do so.
I had to add more memory to platform and there were two ways. I could use pin-compatible Atmega1281 that has 8kB or RAM among the differences or I could add external RAM as this chip has XMEM driver built-in. I had the latter approach in my mind from the very beginning. When drafting first schematics I kept XMEM pins unused and reserved for future use.
Both sides of equation were unbalanced due to my self-learning inclinations. I was biased towards XMEM. Fortunately it was also backed by economic calculation – while Atmega128 is widely available $2 chip then Atmega1281 is not available on local market and has to be imported from Asia at $8 a piece when buing 10 pieces. I found high-speed SRAM 32kB memory for $1 and 74VHC573 for $0.85 (per unit in 10 pieces batch).
When external RAM hard-way was rationalized to my ego, I decided to do some extra checks to find how much XMEM is slower than built-in RAM. I run read-write multi-level loops over 1024 byte array (doing some simple math on array) to get following results – internal memory time latched 50 seconds, external 54 seconds on 12MHz clock. Not much surprising as memory chip IS61LV256AL-10TLI has 10ns strobe time and runs without wait-states (configurable in XMCRA register). This told me I can use 32+4kB as flat memory without much care about slow-down even if RTOS stacks are located in external RAM. It does not matter for data logger under construction but it is a good news for more time-constrained projects and can be used as reference setup.
Custom memory sections order
Next step was to play with memory sections and change default locations. Typical scenarios showb below alway keep main stack below RAMEND as it is advised by Atmel to keep MCU stable.
Instead of typical setup shown above I decided to go shrink main stack first and make .bss crossing RAM boundaries. Stack is moved close to registers and set to very short. When using FreeRTOS this stack is used only in two cases. On start when main() function is called and when ISR (interrupt) needs to copy current stack. Stack can be compact if main entry point is kept short, bootstraps FreeRTOS immediatelly and the all necessary initializations (requiring deeper stack) are moved to RTOS task. ISR in AVR is also not a problem as nesting is not supported. I dedicated 256bytes for main stack.
Behind main stack I placed static allocations (.data and .bss) and regular heap. FreeRTOS allocates large single array of bytes that is placed in .bss area. This reserved memory will be split by RTOS into separate stacks (per task) as well as dynamic allocations using own implementations. With such setup I can freely decide if I use RTOS (pvPortMalloc() and vPortFree() functions) or libc (malloc() and free()) dynamic memory controllers.
To structure memory this way linker has to be instructed as follow (without new-line inside):
-Wl,--defsym=__stack=0x800200,--section-start,.data=0x800201,--defsym=__heap_end=0x8090ff
See also other related articles:
[posts_by_tag tag=”WLS”]