I’ve been spending quite a bit of time on embedded devices recently, as both a user and a designer. It’s not my first rodeo — I’ve designed and built a variety of embedded systems over the years — and it certainly provides stark comparisons when measured against standard computing practices and design.
Just as other computing technology has evolved greatly in the past few decades, embedded systems have grown as well, but their trajectories are quite different.
The term “embedded device” itself has some wash these days. A touchscreen remote could be considered an embedded device, as could any number of IoT widgets, all the way up to smartphones and a quad-core ARM box running Linux or Android that can display 4K video. The truth is that some embedded systems today are comparable in horsepower to desktop systems from not that long ago.
One thing that these devices share is their fragility. If an embedded device loses power during a firmware upgrade, it’s probably bricked. Similarly, a bug in the firmware of sufficient severity can render the device permanently useless. The flash used in these devices is permanent, and if that flash develops problems, the device is toast. Further, the nature of embedded designs is such that updates must occur remotely, either through a phone-home process or through a manual trigger, which means little to no room for error anywhere. Embedded systems are a finicky business.
This is especially true for embedded systems that are integrated with other hardware, such as a sensor array that is married to a SoC that runs Linux. Many times, these integrated devices are custom manufactured and equipped with a specific software stack that is not easily replaced or updated. These are the sorts of devices that run BusyBox and Dropbear, in which the Linux environment is tightly coupled to the hardware. These are not single-board systems that can take a generic Ubuntu installation — these devices require special care and feeding.
When you get down to this level, you truly begin to appreciate how computing evolved over the years and what working with massive production systems was like 20 and 30 years ago. In a standard Linux system, you have your choice of scripting language, plus all kinds of I/O, storage, and compute resources, and whatever you may need that isn’t there can be installed fairly easily. You also have a console, and reboots are an afterthought. By contrast, in these small embedded systems, you generally have only the tools that are present by default, typically represented by BusyBox, and it’s unlikely you’ll have any scripting languages at all, save for ash. There may or may not be a serial console, and reboots can require ritual sacrifices during development.
You may have some significant storage space due to the use of SD cards or smaller fixed storage. Either way, it’s storage that cannot and should not be used like a hard disk or SSD, but must be managed differently. There are penalties for writes, and the flash itself has a finite number of read/write cycles. If there is logging, it should probably be to a RAM disk, but you might not have much RAM to work with. Many of these devices have delayed write buffers that can result in permanent damage if not handled correctly. For instance, if you write a file and immediately reboot, it’s likely that the file you wrote is now corrupted. If that was a new kernel or critical boot file, you’ve bricked the device. Thus, you need to make sure the buffers are flushed before reboots.
While working with essentially a skeleton crew of tools, you need to build as much fault-tolerance, error-checking, and automated recovery into the device as possible. This is immeasurably critical when designing and building devices that will be deployed very remotely — such as the aforementioned sensor array that might be placed atop a mountain for meteorological measurements. A device like that might be physically accessible only during certain times of year or dependent on the weather. The same is true for submersible devices.
For every casual action you might take on a production server, an embedded device requires 10 times the planning and error detection, because “small” problems such as a hung boot cycle can be either extremely expensive or impossible to fix. This is where you write in some code that will fire up some means of external communication, or a fixed network config after many status checks and lengthy timeouts, in case the SSH daemon doesn’t start or something else unexpected happens — simply to have a chance of remote recovery.
After enough time wrangling embedded devices, normal computing systems seem gluttonous by comparison and plain wasteful. It’s a bit like how ballplayers swing weighted bats on deck, so the normal bat feels lighter by comparison. I would argue that a trip into embedded system design would be worthwhile for any deep IT person, across the spectrum from admins to developers. It might provide enough juxtaposition to reduce the bloat we see in so many other areas of computing. With embedded systems, bloat is not an option.