Logistics & Sustainable Transport Vol. 7, No. 1, October 2016, 41–50 doi: 10.1515/jlst-2016-0004 41 For Safety and Security Reasons: The Cost of Component-Isolation in IoT Alexander ZUEPKE, Kai BECKMANN, Andreas ZOOR and Reinhold KROEGER RheinMain University of Applied Sciences, Wiesbaden, Germany Abstract — The current development trend of Internet of Things (IoT) aims for a tighter integration of mobile and stationary devices via various networks. This includes communication of vehicles to roadside infrastructure (V2I), as well as intelligent sensors / actors in Logistics and smart home environments. Compared to isolated traditional embedded systems, the exposure to open networks increases the attack surface, and errors in the networking components could compromise the safety and security of the embedded application or the whole network. But often current system architectures for mass-market IoT devices lack the required isolation concepts. Using a partitioning microkernel and enforcing the use of a microcontroller's memory protection unit (MPU) facilities, we compare different isolation concepts for a publish/subscribe middleware implementing OMG's Data Distribution Service (DDS) standard and we evaluate our results on an STM32F4 microcontroller. The results of this case study show moderate costs for increased memory usage and additional context switches. Key words— Component-Isolation, Microkernel, Partitioning, IoT. I. MOTIVATION Today, low-cost microcontrollers are quite powerful and offer wireless networking facilities. Based on a network of connected sensors and/or actors, this allows the design of new embedded applications. The umbrella terms Cyber-Physical Systems (CPS) and Internet of Things (IoT) cover a wide range of applications in domains like Ambient Living, Smart Cities, and Intelligent Transportation Systems & Logistics. Thus, formerly isolated and domain specific networks now become connected to the Internet. But the first generation of networking-capable microcontrollers lacked support for isolation concepts. Any potential error in the application code or the protocol stacks could compromise the whole embedded system or open a door into a protected network. This makes early controllers unsuitable for safety- or security-critical applications, such as Vehicular Ad-hoc Networks (VANETs) communication to roadside infrastructures or critical transportation systems. Recently, low-cost 32-bit microcontrollers featuring memory protection units (MPU) became available. They enable isolation of application code and protocol stacks, not only from each other, but also from the rest of the system, for example from uninvolved real-time tasks. Unfortunately, a lot of current software platforms for IoT devices, like FreeRTOS, Contiki, or RIOT-OS, originate from real-time operating systems (RTOS) without or only limited MPU support. On the other end of the spectrum, operating systems explicitly designed for safety-critical applications, like VxWorks, Integrity, or PikeOS, as well as general-purpose operating systems like Linux and QNX, often require a full virtual memory management unit (MMU) in the processor and therefore target higher- priced platforms. Additional safety and security do not come for free: isolation concepts usually add costs by increased memory usage due to storing multiple copies of data. Also, isolation can impose a significant performance overhead by multiple data copy operations between isolated software components or additional context switches and related reprogramming of memory protection hardware. This work analyses these costs based on a case study of porting a data-centric middleware onto a microkernel platform for safety-critical applications. The main safety objective here is to allow multiple applications to communicate over the network via the middleware, while keeping potential errors isolated to the erroneous software components. And for security reasons, software components must be able to prevent other unrelated components from accessing their internal data, e.g. encryption Logistics & Sustainable Transport Vol. 7, No. 1, October 2016, 41–50 doi: 10.1515/jlst-2016-0004 42 keys or personal data. This is especially important when software components of different vendors need to be integrated on a single platform. Applied to logistics, an example would be an intelligent sensor integrating two different applications on a single platform for cost reasons: one application (provided by the logistics company) tracks the shipment, while a second application (provided by the goods producer) tracks the integrity of sensitive goods, for example if a medical product was transported at its required temperature. Isolating the applications increases trust and keeps business data separated. In this paper, the system's software components are decomposed into isolated containers. The system comprises (1) two synthetic test applications sending and receiving data, (2) sDDS [1], a publish/subscribe middleware based on the Data Distribution Service (DDS) standard [2], (3) the lightweight TCP/IP stack (lwIP) [3], (4) an Ethernet driver for the STM32F4 microcontroller, and (5) AUTOBEST [4], a small partitioning microkernel developed for safety-critical automotive use cases. These software components form a vertically layered system model, where higher level components rely on services of the lower level components. This allows applying and evaluating isolation concepts at each of the layer interfaces. The proposed system is comparable to AUTOSAR [5], the established software architecture for automotive electronic control units (ECUs). AUTOSAR uses a three layered software-model comprising an application layer, the Runtime Environment (RTE) providing data exchange, and the Basis Software (BSW) hosting device drivers and software stacks. However, in contrast to AUTOSAR, our setup is simpler and easier to analyse due to the larger size of its building blocks, but the general results are applicable to AUTOSAR as well. The rest of this paper is organized as follows: Sec. II introduces the software components of this study in detail. In Sec. III, we discuss different concepts where to apply isolation techniques between the component interfaces and select one approach for evaluation. This approach is compared to a baseline without isolation in Sec. IV. Sec. V summarizes and discusses the results and compares the approach to related work. Finally, Sec. VI concludes the paper. II. SOFTWARE ARCHITECTURE In this section we introduce the used software components in detail and discuss their interfaces. All components are customizable to a high degree regarding their resource usage, making them suitable for IoT systems. The hardware platform is a Texas Instruments STM32F4 microcontroller which includes a Cortex-M4-based ARM core clocked at 168 MHz, 112 KiB SRAM, 1 MiB flash memory, and 100 MBit/s Ethernet. A. AUTOBEST As operating system, we use AUTOBEST [4], which was developed and implemented at the authors’ faculty in a research project together with Easycore GmbH in Erlangen, Germany. Application areas are safety-critical automotive use cases following the standard ISO26262 [6], which requires "freedom of interference" between independent software components. The kernel targets embedded microcontrollers with memory protection (MPU) support and is fully statically configurable. AUTOBEST is based on a small microkernel, which isolates different applications into so-called partitions. A partition entails an isolation boundary in (scheduling) time and (address) space and comprises a set of executable tasks (threads) plus necessary system resources for synchronization inside the partition. All communication and synchronization mechanisms across partitions are statically configured, allowing fine grained access control at system compile time. AUTOBEST is not limited to automotive use cases: The kernel implements an abstract programming model supporting multiple operating system APIs implemented in domain specific libraries inside the partitions, as Fig. 1 shows. Currently, AUTOBEST support AUTOSAR for automotive use cases, as well as an ARINC 653 [7] application executive for avionics. Logistics & Sustainable Transport Vol. 7, No. 1, October 2016, 41–50 doi: 10.1515/jlst-2016-0004 43 Figure 1: AUTOBEST system architecture with two different partition types. Libraries in user space implement the differences of the domain specific APIs on an abstract microkernel interface. Data exchange between partitions is realized by dedicated shared memory segments (SHMs) between communicating partitions. Code generators provide the structure of the SHMs at compile time. The system supports linking of arbitrary variables in SHM with kernel wait queues. This allows the construction of blocking synchronization means, like the producer-consumer pattern, by updating the variables atomically in the fast path, or calling into the kernel to wait or wake up tasks on the other side. Also, tasks in one partition can notify other partitions asynchronously via inter-partition events. Lastly, AUTOBEST provides a synchronous Remote Procedure Call (RPC) to call functions in other partitions. B. sDDS The Object Management Group's (OMG) standard Data Distribution Service (DDS) [2] specifies a vendor- and platform-neutral data-centric middleware following a publish-subscribe paradigm. The standard defines a set of configurable Quality-of-Service (QoS) properties regarding real-time message delivery, redundancy, persistence, and resource constraints. The architecture of DDS is based on the concept of a global data space (domain), in which participating nodes share data over a network as publishers and subscribers. The exchanged data is structured in application-specific data types. A topic links such a data type and its QoS properties with a globally unique name. The application interface to the data space comprises DataWriter and DataReader classes. To get access to meta-information, like the available nodes and topics in the system, DDS provides so-called built-in topics. Originating from military applications, the development of the DDS standard today is mainly driven by industrial applications with a promising future in application areas like Internet of Things and Industry 4.0. Although developed for distributed embedded systems, implementations of the DDS standard do not support resource constrained hardware platforms like those used in wireless sensor networks and IoT environments. Therefore, for this work, sDDS (sensor network DDS) [1] is used. Applications for sensor networks or IoT typically allow for statically configured nodes and only use a limited subset of the DDS middleware functionality. sDDS is based on a model-driven software development process, which collects the application requirements regarding its specific computing and sensor capabilities and generates an individually tailored middleware implementation for each node. This allows for deploying DDS on heterogeneous platforms, ranging from 8-bit microcontrollers to standard PCs, and simplifies both horizontal and vertical integration. Currently, sDDS supports a configurable subset of the DDS standard. Besides simple data exchange with callbacks and polling, sDDS supports some QoS features important for sensor networks. Communication between application components can be static, dynamic, or a combination of both. For dynamic communication relations, sDDS uses a discovery mechanism based on built-in topics. Focusing on platform independence, sDDS is implemented in C, and its API is conforming to the DDS standard. In Fig. 2 an overall architecture of sDDS is outlined. Operating system and platform dependent functionality is abstracted via internal modules: sDDS requires dynamic heap memory for initialization, cyclic activation of internal callbacks, and mutual exclusion. On the networking side, sDDS depends on a datagram delivery service with routing and broad- and multicast functionality, AUTOSAR Library ISR AUTOSAR Partition ARINC 653 Library ARINC 653 Partition ... Task Task Task Task Task Sync- Object Task User Mode Supervisor Mode AUTOBEST Kernel Logistics & Sustainable Transport Vol. 7, No. 1, October 2016, 41–50 doi: 10.1515/jlst-2016-0004 44 like UDP/IP, by the underlying operating system. The discovery mechanism requires multicast groups. Due to the focus on IoT use cases, sDDS uses IPv6. Figure 2: Overall architecture of sDDS. Divided in three layers, platform-specific, platform- independent, and application-specific. C. lwIP For this work, we integrated the highly customizable open source lightweight TCP/IP stack (lwIP) [3] to connect sDDS to an Ethernet driver. sDDS only uses the UDPv6 functionality of lwIP and its IPv6 Stateless Address Auto Configuration (SLAAC) mechanism. LwIP provides two component interfaces: the interface towards a network driver sends and receives single Ethernet frames, and the interface towards the application side (in our case sDDS) handles datagrams via UDP. Both interfaces use decoupled buffers following the producer-consumer pattern. LwIP internally manages the protocols (Ethernet, IP, UDP) fully transparently for the application. LwIP also has built-in dynamic memory management for data and Ethernet frames, so called pbufs. By reserving spare space at the front and at the end of a pbuf, lwIP can add the necessary protocol headers for outgoing data. Likewise, lwIP strips off the headers from incoming frames without copying overhead. Frames larger than 536 bytes of user data (the default TCP maximum segment size) can be split into multiple linked pbufs. D. Test Applications In the presented scenario, two applications run on the embedded target on AUTOBEST and communicate with a third application on a Linux host. An sDDS specific IDL compiler generates the necessary middleware components, application-specific initialization, and application stubs according to the system model. The first application cyclically publishes data for the topic Alpha using the synchronous DDS method call DDS_AlphaDataWriter_write(), while the second application receives data from the topic Beta. To keep latency low, the application registers a listener callback at the sDDS middleware. This callback is called when new data becomes available. Then the application reads the topic-specific data (samples) using the topic-specific method DDS_BetaDataReader_take_next_sample(). This non-blocking call fetches the data out of the input queue and acknowledges the arrival to sDDS. Both topics have the same data structure with a size of only 2 bytes to keep the overhead small for copy operations during measurements. The third application on the Linux system is the counterpart for the two applications on the embedded system. It implements the roles of subscriber and publisher for the topic Alpha and Beta, respectively. Besides that, the realisation is mostly identical. target hardware OS network stack SNPS DDS-API discovery service topic management applications OS abstraction layer DataSource DataSink Data- Reader msg queue Data- Writer Built-in topics platform specific platform independent app. specific network module Logistics & Sustainable Transport Vol. 7, No. 1, October 2016, 41–50 doi: 10.1515/jlst-2016-0004 45 III. ISOLATION CONCEPTS In a traditional monolithic system, the software components discussed in Sec. II would be placed in a vertically layered model, as Fig. 3 shows on the left. But here, on the microkernel AUTOBEST, the components can be isolated into horizontally ordered partitions. The primary goal is to have multiple isolated applications communicating over the network via sDDS. Isolation enables fault containment and increases information security, but induces communication overhead (in terms of memory usage and performance impact). Therefore, the secondary goal is to keep the communication overhead between partitions low. The obvious places for such splits into dedicated partitions are the component interfaces, as Fig. 3 shows. At such a split, the component interface is mapped onto the microkernel's communication mechanisms and a shared memory interface. In the following sections, we discuss multiple scenarios where to apply a split. Figure 3: Variants. From left to right: monolithic approach without separation, (A) monolithic approach with isolation of networking components, (B) split between Ethernet driver and lwIP, (C) split between lwIP and sDDS, and (D) split between sDDS and the applications. Adjacent boxes represent a partition. Dots refer to other, unrelated partitions. A. Isolation of Networking Components from Rest of System The simplest variant of isolation is to keep all networking-related software components in a single partition. This allows isolating the applications, sDDS, lwIP, and the Ethernet driver from the rest of the system. Such an approach is acceptable for scenarios where unrelated parts of the system must not be affected by faults in the network stack, for example if the network application implements remote monitoring of a real-time control application. We use this scenario as baseline for comparison, because here all component interfaces can be implemented by function calls, and data can be passed indirectly via pointers instead of costly copy operations. This variant is also comparable to implementations in other IoT operating systems without isolation concepts. B. Isolation between Ethernet Driver and lwIP Starting at the bottom of the network stack, the first split can be applied between the Ethernet driver and lwIP. Besides the initial setup of Ethernet addresses and the network's link status, the component interface between the Ethernet driver and lwIP mainly comprises the exchange of Ethernet frames. Both incoming and outgoing traffic may happen in bursts. Therefore, for a robust decoupling of both components, two ring buffers in a shared memory segment can be used. This avoids unnecessary context switches on bursts, and both components can notify each other asynchronously on new incoming or outgoing frames. Also, memory management in lwIP supports fine grained control of memory locations of the pbufs. Additionally, the network chip could directly access the Ethernet frames in the shared memory via DMA. In case the hardware supports DMA on fragmented frames, the ring buffers could consist of pbufs of a smaller size than the maximum Ethernet frame size of 1518 bytes. As lwIP can handle incoming packets in arbitrary order, the assignment of frames becomes more complex. Summarizing, this interface offers a good opportunity to decouple the software components and offers options for optimization when using DMA. On the other hand, memory usage can be higher. Microkernel lwIP sDDS Ethernet driver App. 1 App. 2 Microkernel lwIP sDDS App. 1 App. 2 Ethernet driver ... Microkernel lwIP sDDS App. 1 App. 2 Ethernet driver ... Microkernel lwIP sDDS App. 1 App. 2 Ethernet driver ... ... (A) Monolithic (C) Split lwip ↔ sDDS (B) Split E thernet driver ↔ lwIP (D) Split sDDS ↔ Applications OS Kernel lwIP sDDS Ethernet driver App. 1 App. 2 Traditional Monolithic ... Logistics & Sustainable Transport Vol. 7, No. 1, October 2016, 41–50 doi: 10.1515/jlst-2016-0004 46 C. Isolation between lwIP and sDDS The next split is possible between lwIP and sDDS. For network connection, sDDS uses an internal interface of lwIP, which is similar to the BSD socket interface. However, incoming and outgoing datagrams are directly managed in pbufs and not copied into internal buffers. The interface to the network side of sDDS opens a multicast and a unicast server connection for UDPv6 to handle all data exchange and service discovery. The job of lwIP is therefore limited to a pass-through of data and the resolution of IPv6 addresses to Ethernet MAC addresses. Like the split between Ethernet driver and lwIP, datagrams can be kept in ring buffers between the partitions; the internal memory management of lwIP would allow this, as long as only one client application (like sDDS) uses the IP stack. Otherwise lwIP could asynchronously notify sDDS on datagram reception, while sDDS uses synchronous RPC for sending. This asymmetry is necessary, because sDDS (client) can trust the lwIP stack (server), but in reverse lwIP should not trust sDDS, because an error in a client could then affect the functionality of the whole IP stack. Compared to split (B) in Fig. 3 between Ethernet driver and lwIP, the implementation effort and runtime overhead of a split between lwIP and sDDS are higher. D. Isolation between sDDS and Application The third option is a split between sDDS and its applications. This isolates the applications from each other and from the rest of the system. Further, this allows the integration of applications of different origin without compromising safety and security of the rest of the system. As described in Sec. II, the first application (publisher) uses a potentially blocking call to send its data, thus simplifying the design of sDDS when passing data down the stack to the network component, as calls to the network stack may block themselves. Because publishing of data may block, an isolating approach should also use a synchronous communication mechanism that supports blocking semantics. The second application (subscriber) registers a callback, which sDDS notifies on incoming data. On notification, the subscriber calls sDDS synchronously and without blocking to remove the new sample from the topic's input queue. Depending on the depth of this queue, incoming data of a topic could be kept in a shared memory segment, so the application can read a new sample directly without any context switch. However, the application would then need to have write access to the input queue to mark the samples as read and acknowledge the reception. A split between applications and sDDS therefore requires a mapping of the function calls of the DDS API to a synchronous communication mechanism and an asynchronous feedback channel to notify the application on new data. The amount of transferred data only depends on the size of the topics and is usually small, e.g. updates of sensor values. E. Selected Isolation Approach Comparing the isolation concepts presented in Sec. III (A) – (D) shows that typically each publish operation of the application leads to a frame for transmission in the Ethernet driver (this neglects the cases where sDDS publishes data to other applications on the same node or aggregates data for multiple topics into a single datagram) . However, incoming Ethernet frames do not necessarily contain data which finally reaches the application, because the frames could be used for discovery purposes in lwIP or for internal management of sDDS. Likewise, the size of exchanged data (respectively the required memory space) increases from application to network driver. On the other hand, the interfaces at the lower levels are more decoupled and allow the use of asynchronous notifications together with ring buffers. With regard to the typically scarce availability of RAM on embedded microcontrollers, variant (D) with isolation between sDDS and the applications was selected (Fig. 4). This approach promises the greatest flexibility towards application development, as it splits base software (kernel, software stacks) from applications, and keeps the applications isolated from each other, allowing the integration of software applications of different vendors. But this comes at the price of implementing the more complex API of DDS. The selected approach combines Ethernet driver, lwIP, and sDDS into a single partition. The Ethernet driver uses the pbuf memory management of lwIP for its Ethernet frames. The Ethernet driver comprises an ISR task, which passes received frames via a message queue to lwIP. lwIP uses a single task that dispatches incoming messages from either the Ethernet or the sDDS side. sDDS consists of multiple tasks as Fig. 4 shows: one task that manages all incoming data and asynchronously notifies the application; and dedicated tasks handle each application's synchronous RPC calls. Logistics & Sustainable Transport Vol. 7, No. 1, October 2016, 41–50 doi: 10.1515/jlst-2016-0004 47 Figure 4: Communication between application partitions and the combined sDDS and network partition. Function calls of the DDS API are mapped to synchronous RPC calls. sDDS asynchronously signals the availability of new data samples to the applications. The application partitions contain two tasks each: a callback manager task waiting for events from sDDS and running the registered callbacks in its context, and the client task executing the generated application code. A dedicated shared memory segment is used for data exchange between an application partition and sDDS. The shared memory segment contains the topics, the notifications, and the RPC call and reply arguments. A code generator sets up the structure of the shared memory segment, as well as necessary stubs for both client and server side (application and sDDS). From both application and sDDS points of view the component split is completely encapsulated by generated code. This protects both sides from errors due to the potentially insecure communication via a shared memory segment. The code generator also knows the specific roles of the applications and creates the corresponding DataReader and DataWriter classes. These are used for the communication to other nodes. IV. EVALUATION For evaluating the costs of the selected isolation approach (D), both memory usage and the time for transmission and reception of a topic's sample are compared to the monolithic approach without isolation (A). In order to measure the performance overhead, trace points were added into the send and receive paths of all component interfaces. These trace points toggle GPIO pins of the microcontroller. Changes of the pins' voltage levels are recorded externally by a logic analyser. This kind of measurement adds additional costs for a system call into the kernel to toggle a pin (interference). However, as the microcontroller has no caches, this execution time shows a constant overhead of measured meantime 0.920 µs. The only sources of variance are timer interrupts and interrupts from the Ethernet controller. For data transmission, the trace points mark the different phases starting at the application, via sDDS and lwIP, down to the Ethernet driver. Comparably, data reception includes trace points from the incoming interrupt of the arriving Ethernet frame up to the visibility of the data in the application. In splitting at the interface between sDDS and the applications, additional trace points were added to the kernel's event and RPC calls. The experiment consists of publishing data for a topic repeatedly, either from the microcontroller to a Linux host system, or the other way around. Table 1 shows the results for the data reception path with minimum and maximum values (MIN, MAX), arithmetic mean (AVG), and standard deviation (STD) in microseconds for an experiment with 16500 recorded trace events. Application 1 sDDS and Network Partition sDDS Client Task Callback Manager RPC call RPC reply Inter-Partition Event Shared Memory Segment Application N sDDS Client Task Callback Manager RPC call RPC reply Inter-Partition Event Shared Memory Segment ... sDDS Server (multiple tasks) Ethernet ISR lwIP Task Logistics & Sustainable Transport Vol. 7, No. 1, October 2016, 41–50 doi: 10.1515/jlst-2016-0004 48 Table 1: Measurement Data Reception (values in microseconds) Phase without isolation with isolation MIN AVG MAX STD MIN AVG MAX STD Ethernet IRQ in kernel 6.040 6.762 12.120 0.074 6.010 6.707 12.140 0.097 Ethernet ISR 8.240 8.276 18.380 0.428 8.240 8.291 18.380 0.573 lwIP stack 29.310 36.457 41.890 0.178 31.220 36.501 43.670 0.195 sDDS UDP module 17.490 17.505 23.380 0.125 17.410 17.426 20.320 0.073 sDDS DataSink_processFrame() 8.080 9.001 25.710 1.194 7.750 8.680 25.580 1.204 Data available callback 5.630 5.639 10.940 0.081 5.460 5.471 8.490 0.054 sDDS sends event to cb manager 1.140 1.147 3.070 0.023 Activation of callback manager 20.380 20.415 27.990 0.367 DataReader_take_next_sample; on isolation: RPC to sDDS 1.160 1.167 4.490 0.044 Reception of RPC in sDDS 8.240 8.252 13.750 0.107 sDDS sends RPC reply 3.390 3.396 6.960 0.048 Data available in application 3.340 3.352 8.820 0.075 1.160 1.167 4.490 0.044 Up to the point of the split, the data reception process shows similar values in the first phases for both approaches. Afterwards, the measured timing deviates: in the variant without split, the application can call DataReader_take_next_sample() directly from the callback, while the split variant sends an event to the callback manager first. The activation of the callback manager requires 20.415 µs on average, because the tasks of the application have a lower priority than the tasks in the networking partition and the network stack prepares internally for reception of the next frame. The callback manager calls the registered callback, which in turn calls DataReader_take_next_sample() to read the data. This is realized by a synchronous RPC back into sDDS and requires two additional context switches until the data arrives in the application. Table 2 shows the steps for publishing data. Here, both variants already differentiate in the first steps: The variant without split can call DataWriter_write() directly, while the splitting approach sends an RPC to the sDDS partition. Afterwards, both variants show similar timing. However, the final step of the send phase, until control returns to the application, shows a difference again. Internal work of the lwIP state machine leads to visible delays, again caused by the different priorities of the tasks. Table 2: Measurement Data Sending (values in microseconds) Phase without isolation with isolation MIN AVG MAX STD MIN AVG MAX STD Call to DataWriter_write; on isolation: RPC to sDDS 6.630 8.272 8.300 0.168 Reception of RPC in sDDS 1.180 1.390 18.070 1.093 Execution of DataWriter_write() 6.010 13.909 14.030 0.864 8.910 13.957 14.000 0.392 Generate SNPS packet 4.090 5.482 18.060 0.736 4.130 5.409 6.230 0.103 sDDS UDP module 7.130 7.145 7.390 0.020 7.130 7.142 7.390 0.020 lwIP stack 23.140 23.175 23.570 0.031 23.210 23.240 23.610 0.030 Ethernet driver 25.550 25.567 25.580 0.005 16.270 16.284 16.300 0.005 Back in application 13.170 13.178 13.190 0.004 For the whole project of all discussed software components including the kernel, RAM usage increases from 60,592 bytes to 65,792 bytes, i.e. by 8.6%. The costs are mainly caused by the stacks for the additional tasks on application and sDDS side. Likewise, the overall size of the program code increases from 83,944 bytes to 90,608 bytes by 7.9%. This is caused by the additional stubs, additional tasks, and two additional partitions. Logistics & Sustainable Transport Vol. 7, No. 1, October 2016, 41–50 doi: 10.1515/jlst-2016-0004 49 V. DISCUSSION AND RELATED WORK The measured average execution time of data reception takes 87.0 µs in the monolithic case and 127.0 µs in the case including the selected split between sDDS and its applications. The execution times for sending are closer together: 75.3 µs for the variant without split and 88.9 µs with split. However, these values still include the overhead for the trace points of 0.920 µs each. The adjusted values after removing the constant overhead show a difference of 80 µs to 115 µs (44%) for data reception. Also, the time for sending data increases from 70 µs to 81 µs, i.e. an increase of 16%. The additional context switches between the tasks added for the split are the main cause of performance overhead here. These tasks also drive the costs for RAM usage due to their stacks. Similar results were also observed in related work on decomposition of monolithic systems into multiple servers: SawMill [8] implements a file system as server on top of the L4 micro kernel [9]. For reading a 4K block of cached file data, the microkernel approach shows a 500 cycle overhead compared to a native Linux implementation with 3000 cycles [8]. The SawMill approach is comparable to the selected split on the application side. Comparable approaches for splits on lower levels are more often found in hypervisors: XEN [10] virtualizes disk and network accesses at the level of disk blocks and Ethernet frames. This fits well for server virtualization, because here full operating systems are virtualized. A comparable middleware approach like sDDS is offered by AUTOSAR with its Runtime Environment (RTE) [5]. The RTE layer comprises generated code and handles the data exchange between applications on different ECUs independently of the used network technology, e.g., CAN, FlexRay, or Ethernet. Optionally, AUTOSAR supports the spatial separation of applications from each other and the rest of the system, the Basis Software (BSW). However, AUTOSAR is currently restricted to applications in the automotive domain only. VI. CONCLUSION Summarizing, the results show that isolation concepts (and with it increased safety and security) has its price. The largest impact is on the performance side and results in performance losses due to additional context switches. In this work, we presented an approach that splits a software architecture at the component interface between applications and the overall network system with the primary goal of isolating the applications from each other and from the rest of the system. A secondary goal is to keep extra RAM usage low. Still, the presented approach requires additional RAM for the stacks of the added tasks. Therefore, the implementation of isolation concepts is always a trade-off between RAM usage (for buffers and stack), ROM usage (shared code, generated stubs), and performance overhead by additional context switches instead of direct function calls. In our opinion, the presented approach can help to increase safety and security of IoT devces by isolating application code, software stacks, and other software components from each other. From a safety point of view, this limits the impact of software errors to affected components. And from a security point of view, isolation prevents leakage of sensible data by unrelated applications. Also, the presented approach demonstrates a reasonable solution for cost sensitive markets like logistics in order to combine different applications on a single microcontroller. In our future work, we plan to optimize the presented approach regarding memory usage and performance. Furthermore, we intend to implement and analyse potential splits at the other discussed boundaries with the final goal of decomposing a full AUTOSAR system. REFERENCES 1. Beckmann, K. & Dedi, O. (2015). sDDS: A portable data distribution service implementation for WSN and IoT platforms. In 12th International Workshop on Intelligent Solutions in Embedded Systems (WISES), 29-30 October 2015 (pp. 115-120). Ancona, Italy: IEEE.J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3 rd ed., vol. 2. Oxford: Clarendon, 1892, pp. 68–73. 2. Object Management Group. (2015). Data Distribution Service. DDS Version 1.4. Retrieved June 6, 2016, from http://www.omg.org/spec/DDS/1.4/. 3. Various. (2016, June). Lightweight TCP/IP Stack [computer software]. Retrieved June 6, 2016, from http://savannah.nongnu.org/projects/lwip/. 4. Züpke, A., Bommert, M. & Lohmann, D. (2015). AUTOBEST: A United AUTOSAR-OS and ARINC 653 Kernel. In 21th IEEE Real-Time and Embedded Technology and Application Symposium (RTAS), 13-16 April 2015 (pp. 133-144). Los Alamitos, CA: IEEE. 5. AUTOSAR Consortium. (2016). AUTomotive Open System ARchitecture. Version 4.1. Retrieved June 6, 2016, from http://www.autosar.org/. 6. International Organization for Standardization. (2011). Road vehicles - Functional safety. ISO 26262:2011. Logistics & Sustainable Transport Vol. 7, No. 1, October 2016, 41–50 doi: 10.1515/jlst-2016-0004 50 7. Airlines Electronic Engineering Committee. (2010). Avionics Application Software Standard Interface. ARINC Specification 653. 8. Gefflaut, A., Jaeger, T., Park, Y., Liedtke, J., Elphinstone, K., Uhlig, V., Tidswell, J. E., Deller, L. & Reuther, L. (2000). The SawMill Multiserver Approach. In 9th Workshop on ACM SIGOPS European Workshop: Beyond the PC: New Challenges for the Operating System, 17-20 September 2000 (pp. 109-114). New York, NY: ACM. 9. Liedtke J. (1993). Improving IPC by Kernel Design. In 14th ACM Symposium on Operating Systems Principles (SOSP), 5-8 December 1993 (pp. 175-188), New York, NY: ACM. 10. Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Pratt, I., Warfield, A., Barham, P. & Neugebauer, R.. (2003). Xen and the Art of Virtualization. In 19th ACM Symposium on Operating Systems Principles (SOSP), 19-22 October 2003 (pp. 164-177), New York, NY: ACM. AUTHORS A. Zuepke is with the Faculty of Design - Computer Science - Media, RheinMain University of Applied Sciences, Wiesbaden, Germany (e-mail: alexander.zuepke@hs-rm.de). K. Beckmann is with the Distributed Systems Lab, Faculty of Design - Computer Science - Media, RheinMain University of Applied Sciences, Wiesbaden, Germany (e-mail: kai.beckmann@hs-rm.de). A. Zoor is with the Distributed Systems Lab, Faculty of Design - Computer Science - Media, RheinMain University of Applied Sciences, Wiesbaden, Germany (e-mail: andreas.b.zoor@student.hs-rm.de). R. Kroeger, PhD, is Professor at the Faculty of Design - Computer Science - Media, RheinMain University of Applied Sciences, Wiesbaden, Germany and leads the Distributed Systems Lab (e-mail: reinhold.kroeger@hs-rm.de).