Original scientific paper Informacije ^efMIDEM A Innrnal of M Journal of Microelectronics, Electronic Components and Materials Vol. 44, No. 4 (2014), 264 - 279 Heterogeneous MPSoC Technology for Modern Cyber-physical Systems Lech Jozwiak Eindhoven University of Technology, The Netherlands Abstract: Spectacular progress in microelectronics and information technology created a big stimulus towards development of advanced embedded and cyber-physical systems, but also introduced unusual silicon and system complexity and heterogeneity. Moreover, many modern cyber-physical applications demand guaranteed (ultra-)high performance and/or (ultra-)low energy consumption, as well as high dependability, safety and security. This all combined results in numerous serious system development challenges. To overcome these challenges a substantial system and design methodology adaptation is necessary. This paper focuses on the system technology for the modern highly-demanding cyber-physical systems. After brief introduction to modern cyber-physical systems and consideration of several serious challenges of their design, the paper discusses the heterogeneous MPSoC technology needed to implement those systems. This MPSoC technology exploits heterogeneous computation and communication resources involving application-specific instruction-set processors, hardware accelerators, distributed parallel memories and hierarchical communication structures. Keywords: cyber-physical systems, MPSoC technology, design Heterogena MPSoC tehnologija za moderne kibernetno-fizikalne sisteme Izvleček: Izreden razvoj v mikroelektroniki in informatiki je ustvaril veliko vzpodbud v smeri razvoja naprednih vgrajenih in kibernetni-fizikalnih sistemih. Hkrati je vpeljal neobičajen silicij in kompleksnost in heterogenost sistemov. Nadalje, moderne kibernetno-fizikalne zagotavljajo (zelo) visoko učinkovitost in/ali (izredno) nizko porabo energije, kakor tudi visoko zanesljivost in varnost. Vse to se odraža v številnih resnih razvojnih izzivih. Za premagovanje teh izzivov je potrebna nadomestna metodologija sistema in načrtovanja. Članek se osredotoča na sistemsko tehnologijo modernih visoko zahtevnih kibernetno-fizikalnih sistemov. Po kratkem uvodu v moderne kibernetno-fizikalne sisteme in opisu izzivov pri načrtovanju, članek opisuje heterogeno MPSoC tehnologijo, ki je potrebna za implementacijo teh sistemov. MPSoC tehnologija izkorišča heterogen računski in komunikacijski vir z vključevanjem izbranih procesorjev, pospeševalnikov, distribuiranega vzporednega pomnilnika in hiearhične komunikacijske strukture. Ključne besede: kibernetno-fizikalni sistemi, MOSoC tehnologija, načrtovanje ' Corresponding Author's e-mail: l.jozwiak@tue.nl 1 Introduction The recent nano-dimension semiconductor technology nodes enabled implementation of a very complex multi-processor system on a single chip (MPSoC) that may involve hundreds processors and realize much increased performance. This facilitated a further rapid progress in mobile and autonomous computing, global networking and wire-less communication which, combined with progress in sensor and actuator technologies, created new important opportunities. Many traditional applications can now be served much better, but what is more important, numerous new sorts of mobile and autonomous cyber-physical systems became technologically feasible and economically justified. Specifically, a big stimulus has been created towards development of high-performance embedded and cyber-physical systems. Various systems performing monitoring, control, diagnostics, communication, visualization or combination of these tasks, and representing (parts of) different mobile, remote or poorly accessible objects, installations, machines, vehicles or devices, or even being wearable or implantable in human or animal bodies can serve as examples. A new wave of information technology revolution arrived that started to create much more coherent and fit to use modern cyber-physical systems. However, these new opportunities come with a price. On the one hand, unusual complexity has been introduced: silicon complexity (i.e. an extremely high number, diversity, small dimensions and huge density of devices and interconnects; huge length of interconnects, etc.), and system complexity (i.e. huge number of possible system states, large number and diversity of subsystems, extremely complex interactions and interrelations between the subsystems etc.). On the other hand, for numerous new highly-demanding embedded and cyber-physical applications in several fields (e.g. consumer, medical, well-being, communication, automotive, monitoring, control, etc.) the straight-forward software solutions are not satisfactory. Many of these applications require a guaranteed high performance and/or (ultra-)low energy consumption. Moreover, many of the new embedded and cyber-physical applications combine different kinds of signal and information processing involving algorithms with various characteristics. They are from their very nature complex, heterogeneous and highly-demanding. To adequately serve these applications, heterogeneous architectures have to be exploited. They require application-specific heterogeneous MPSoCs to perform their divergent real-time computations to extremely tight schedules, while satisfying their stringent energy, area and other requirements. Furthermore, due to the rapid evolution of many modern applications towards newer improved versions and due to the high and growing costs of application specific circuit realization in new technology nodes, adaptable hardware solutions are needed, as provided by (re-)configurable and programmable hardware. Finally, the gap between the nano-electron-ic technology capability and the system designers' productivity increases rapidly. The combination of the high system and silicon complexity with the applications' stringent and partly con-tradictive requirements results in numerous serious system development challenges, such as: ability to satisfy the stringent requirements and ensuring high-quality of the complex systems, accounting in design for more aspects and changed relationships among aspects (e.g. increased influence of interconnects on major physical system characteristics, increased leakage power, etc.), adequately addressing the need of energy reduction, accounting for the dominating influence of interconnects and communication on major system characteristics, complex multi-objective system optimization, resolution of numerous complex design tradeoffs, reduction of the system development time and costs without compromising the system quality, etc. To overcome these challenges sophisticated system and design technologies are needed. When consider- ing the system and design methodology adaptation to the above briefly characterized situation in the field of the modern embedded and cyber-physical systems, we have first to ask: what general system approach and design approach seem to be adequate to address the listed problems and resolve the challenges? Predicting the current situation based on the commonly known Moore's law and own observations of rapid developments in embedded systems, more than 15 years ago I proposed such system paradigm and design paradigm that effectively address these challenges, namely, the paradigms of: life-inspired systems [1, 2, 3], and quality-driven system design [4 5, 6, 7], as well as, the methodology of quality-driven model-based system design based on them [6, 7, 8, 9]. From that time our research team, as well as, our industrial and academic collaborators are researching and applying this methodology to the multi-objective automatic architecture exploration and synthesis of MPSoCs for real-time embedded applications. The research confirmed the adequacy of this methodology. This paper focuses on the modern highly-demanding embedded and cyber-physical systems, and heterogeneous MPSoC technology needed to implement those systems. The paper starts with introduction to cyber-physical systems. Subsequently, it briefly introduces the paradigm of life-inspired systems, and discusses the issues and challenges of the modern cyber-physi-cal system design. Finally, the paper discusses the heterogeneous MPSoC technology needed to implement the highly-demanding embedded and cyber-physical systems. This MPSoC technology exploits heterogeneous computation and communication resources involving general purpose processors (GPPs), as well as, application-specific instruction-set processors (ASIPs), HW accelerators, distributed parallel memories and hierarchical communication structures, and it is a specific practical realization of the paradigm of life-inspired systems. 2 Modern cyber-physical systems There is much ambiguity and misunderstanding in the research area of cyber-physical systems, and specifically, many various (sometimes strange or logically incorrect) definitions of a cyber-physical system were proposed in recent years. Therefore, this section starts with several definitions and explanations, with the aim to order the background and increase the understanding. A system is a complex whole composed of interrelated, interdependent and interacting components (elements or larger parts of a system) that are so intimately composed together that they appear and operate as a single unit in relation to the external world (to other systems). Regarding to the increasing level of organization, all known systems can be sub-divided into the following three categories: - unorganized systems - representing a mechanical unsystematic conglomerates of objects (e.g. a dune, being a conglomerate of sand grains); - organized systems - being systematic and law-governed compositions of parts, which properties cannot be reduced to the properties of their parts, but involve some new emerging properties resulting from complex composition of the parts' properties (e.g. a molecule, crystal, circuit, computer); and - organic systems - formed not as a composition of some pre-existing parts, but being an integral whole with distinguishable parts that originate and develop together with the whole (e.g. life organisms); two characteristic features of the organic systems that distinguish them from all other systems are the self-development and self-reproduction. In this paper organized and life systems will be considered. Cyber systems (information technology systems) are the information collecting, processing and communicating systems. They are actually (parts of) control systems of other cyber, physical, biological or social systems. Cyber systems either: - collect information using various sensors and other (distant) communication inputs, or - process the collected information to compute some conclusions representing some control decisions or serving for preparation of such control decisions, or - communicate the results of information processing in time using various memories or in space using different (distant) communication means and actuators, or - implement all these functions realizing a complete control system, or - implement a sub-set of these functions realizing e.g. a monitoring or diagnostic system. For acquisition and communication of signals from physical and biological systems, the cyber systems use various sensors and (distant) communication means. For communication of control decisions to and actuation of controlled systems, the cyber systems use divergent (distant) communication means and actuators. Sometimes we even do not realize that particular systems or we personally are controlled or monitored by certain cyber systems. For instance, if we work in a feedback loop with a computer play: we controll the computer play, but the computer play also controls us. Similarly, when we watch TV, the TV system controls us to some degree. Physical systems are systems in which matter or energy acquisition, processing and communication (transportation, transmission or storage) take place according to the lows of physics. Life systems (biological organic systems, i.e. life organisms or organism populations) are compound systems performing matter and energy acquisition, processing and communication, as well as, information acquisition, processing and communication. They can be considered as natural cyber-physical systems (contrary to the engineered once). An embedded system (unlike a stand-alone computer) is an inseparable part of a certain larger system (e.g. product or infrastructure). It serves a specific aim (e.g. monitoring, control etc.) in this larger system through (repeatedly) executing specific computation and communication processes required by its application. It is application-specific. It has to be especially designed or adopted to adequately serve the execution of these specific computation processes, and satisfy the application's requirements related to such attributes as functional behaviour, reaction speed or throughput, energy consumption, geometrical dimensions, price, reliability, safety, security, etc. Typically, embedded systems are (reactive) real-time systems, include sensing, interfacing, processing and/or actuating sub-systems, and involve in their implementation various mixtures of digital and analog hardware, and embedded software. In brief, embedded systems are cyber systems that are tightly coupled with (embedded in) the systems that they control, supervise, monitor or diagnose. Pervasive computing (ubiquitous computing) is a paradigm of a seamless integration of information processing and communication into objects and environments. A pervasive system is in fact an "ideally" integrated embedded system. As mentioned above, many various definitions of a cyber-physical system were proposed in recent years. Below, two cyber-physical system definitions will be introduced. The first one defines cyber-physical system in the broad sense, and the second one in the narrow sense of the "ideal" future cyber-physical system. Cyber-physical systems (CPS) are compound systems engineered through close integration of cyber and physical sub-systems or components and/or preexisting component cyber-physical systems, so that they appear and operate as a single unit in relation to the external world (to other systems). The sub-system integration involves both an appropriate spatial arrangement of the sub-systems, as well as, an adequate interconnection, communication and collaboration of the sub-systems. Due to numerous reasons, of which discussion is out of the scope of this brief paper, much more effective and efficient infrastructure and products are required to be build in the future. In parallel to static system optimization at design time, highly effective and efficient dynamic control of the functional behaviour and resources consumption (e.g. energy consumption) of the future systems is essential to deliver these future much higher levels of system quality. Therefore, next generations of embedded systems are expected to be more intimately integrated and seamlessly orchestrated with the physical systems observed or controlled by them, and much more intelligent, autonomous, dynamic and adaptive to create higher synergy among the cyber and physical sub-systems, better combine the dynamics of the cyber and physical processes, and to result in highly effective, efficient and robust cyber-physical systems with coherent combined real-time behaviour of the collaborating cyber and physical sub-systems. They are expected to be smarter in the sense of a better use of more information and more precise information about various important features and laws of the part of the world observed by them, and in the sense of being more dynamic, predictive, adaptive and precise in controlling the physical part and of itself, to at the same time increase the effectiveness and efficiency. Additionally, more and more cyber-physical systems are becoming safety-critical. In addition to more systems in applications traditionally known as safety-critical (e.g. transportation, industrial and infrastructure automation, security control, medical and emergency management systems, etc.) many new applications become safety-critical [1, 3]. Such systems must be safe, secure and dependable, and have to guarantee adequate real-time operation. Therefore, the following more demanding narrower definition probably better reflects the character of the future advanced cyber-physical systems. Cyber-physical systems (CPS) are smart compound systems engineered through seamless integration of cyber and physical sub-systems or components and/ or pre-existing component cyber-physical systems, so that they appear and operate as a coherent single unit in relation to the external world (to other systems). The sub-system integration involves both an optimized spatial arrangement of the sub-systems, as well as, a seamless interconnection, and guaranteed real-time communication and collaboration of the sub-systems. As cyber-physical systems may involve other cyber-physical systems as their sub-systems, the above definition also covers systems involving biological sub-systems, including human beings, or even social systems. Various medical CPS systems and a health-care system can serve here as a good example. For instance, an implanted pace-maker or another medical control/ monitoring/diagnostic system forms a cyber-physical system together with the controlled/monitored/diagnosed (part-of) a human. It can be connected and can communicate with another CPS system being (a part of) a certain health-system (social system) forming together a higher-level CPS system. The above example shows that cyber-physical systems can communicate and collaborate with other cyber systems and cyber-physical systems (including human beings). They can be very complex: together with some other CPS systems they can form cyber-physical (hierarchical) systems-of-systems. They can communicate using various multimodal machine-machine or human-machine interfaces, and private or shared communication media and channels, including internet and intranet communication networks, and heterogeneous short and long distance wireless communication. Cyber-physical systems connected through various combinations of heterogeneous interfaces and communication media to the global Internet are starting to form the so called Internet of Things (IoT). This way they can use globally available data and services (e.g. cloud data-bases and cloud computing services) for tasks impossible or difficult to perform when only using local resources. The vision of "smart" systems collaborating on various scales, including the global scale, is not a science-fiction anymore, but is quickly becoming an actual reality. Examples of the new systems include various modern monitoring, diagnostic, control, multi-media and communication systems that can be put on or embedded in: mobile, remote, poorly accessible or dangerous objects, installations, devices, machines or vehicles; hospital, office, home or personal equipment; or even implanted in human or animal body. Embedded and cyber-physical systems already now play an extremely remarkable role in our life. Example application sectors of these systems include: infrastructure (e.g. smart homes, buildings, towns, etc.), transportation (e.g. smart transportation monitoring, communication and control systems, aerospace systems; traffic control and collision avoidance, assisted driving, autonomous com- municating cars, etc.), energy and information acquisition, storage and delivery (e.g. smart power grids, data centres, etc.), military (smart monitoring equipment, tele-operation equipment, arms etc.), environment (e.g. smart environmental monitoring and control, rapid environmental intervention, environment exploration, etc.), extension or replacement of human capabilities (e.g. smart operation in remote, poorly accessible or dangerous environments (tele-robotics and robotic surgery, fire-fighting, search and rescue, military applications, sea and space exploration...), artificial limbs and implants, etc.), personal assistance (e.g. in well being, sport, monitoring, distant communication and control etc.), social systems (e.g. smart health-care, assisted leaving, etc.), etc. The term "modern cyber-phys-ical systems" will be used by us to designate all kinds of "smart" infrastructure, things and objects, as briefly discussed above. In the future, embedded and cyber-physical systems will be used even more and more commonly in virtually all fields of human activity, in various sorts of technical, social and biological systems, in more and more important and demanding applications. Our life is and will be to a higher and higher degree dependent on their adequate operation. Therefore, the individual and society expectations regarding their quality grow rapidly. Future research and development in cyber-physical systems will have to satisfy these growing expectations though enabling systems that will be much more effective, efficient, responsive, robust, safe and dependable than the today's systems. Current focus of the research and development in cyber-physical systems is mainly on: - holistic system quality assurance and heterogeneity in system design, which treats cyber, physical, and biological (e.g. human) parts and sub-systems as integral components of a compound CPS system to create coherent highly optimized CPS, and addresses a. o. such aspects as intimate coupling, seamless orchestration, co-modelling, co-design, co-simulation and co-validation of the heterogeneous information-processing and physical subsystems; multi-objective multi-domain system optimization; effective and efficient dynamic, predictive and adaptive control of the system functional behaviour and resources consumption; and smart use of more information and more precise information for system control; - safety, security and dependability of CPS; - generic high-performance, low-power, robust and acceptable-cost architectures, architecture templates and platforms for complex collaborating CPS enabling guaranteed robust real-time performance and synchronization in complex networked CPS, low-energy consumption, predict- ability, adaptability, reusability, interoperability, scalability, composability, (semi-)automatic system integration, etc.; - miniaturized sensors and actuators easy to integrate with cyber and physical system parts; - packaging technologies and smart materials seamlessly integrating miniaturized sensors, actuators, processors, memories and their interconnections; - design methodology and design automation of (platform-based) CPS; - systems-of-systems and Internet-of-Things. As the future much more effective and efficient "smart" cyber-physical systems will have important applications in virtually all economic and social segments, their potential economic and societal impact will be enormous. Consequently, major investments are being made worldwide both by private enterprises (from large multi-nationals, as e.g. Intel, Qualcomm, Google, etc., to small start-up SMEs) and by country governments to develop the CPS technology. As Europe has an approximately 30% share in the global market of embedded and CPS systems (being especially strong in the automotive, aerospace, medical, consumer and several other sectors) the development of CPS technology is of crucial importance for Europe. A very important subclass of cyber-physical systems is this of mobile and autonomous CPS that can have inherent mobility or can be transported by other cyber-physical or biological systems, including humans and/ or are autonomous regarding their functioning and energy sources. Some of these systems work fully independently, but most of them collaborate with other cyber or cyber-physical systems, when they require information or resources that are locally not available, or they deliver some information or services to the external systems. Examples of such systems include mobile robots, wireless sensor systems, mobile equipment transported by humans or animals, wearable systems, implantable systems etc. Perhaps the most popular and broadly known of these mobile and autonomous CPS is smartphone. Smartphone is actually a quite complex CPS, including: significant heterogeneous information processing resources (several different general purpose and application-specific processors with their local and global memories, and communication structures); several different long and short distance communication sub-systems (3G or 4G mobile cellular, WiFi, Bluetooth, etc.) to communicate with mobile cellular telecommunication network, Internet or other devices; and several sensors and actuators (touch screen, microphone, speaker, camera(s), light sensor, proximity sensor, GPS, etc.) that enable various cyber-physical applications. Two essential characteristics of the mobile and autonomous CPS are: - the requirement of (ultra-)low energy consumption, often combined with the requirement of high performance, and - heterogeneity in the sense of convergence and combination of various earlier separated applications, systems and technologies of different sorts in one system, or even in a single chip or package. In the first generation of the mobile and autonomous CPS: computing, communication and media were combined, resulting a. o. in smartphones and tablets and creating a huge market. The global smartphones market is expected to rapidly grow from $85.1 billion in 2010 to as much as $258.9 billion in 2015 (Market-sandMarkets), and the smartphone shipments forecast for years 2014-2018 is roughly 8 billion units (Gartner, Sept. 2013). The tablet shipments are expected to rise to 315 million units in 2014 (65% of the mobile PC market), and to 455 million by 2017 (75% of the global PC market) (NPD, Feb.2014). In years 2014-2018 the smart-phone and tablet shipments are expected to exceed all other consumer system shipments in which modern processors are used, as PCs, TVs, cameras, video and audio systems, media players and adapters, game consoles, vehicles etc. (Gartner, Sept. 2013). In the next generations of the new CPS systems more sensors/actuators and more divergent sensors/actuators will be integrated together with computing, communication and media, and the cyber sub-systems will be tighter coupled or even concurrently co-designed with the physical sub-systems to better support various modern cyber-physical applications. Early versions of such products, called "convergence products', are already in the market for some time, as e.g. iPhone-based ECG, smartphone-based USG or smartphone-based glucose monitor. In the near future, a rapidly growing market will be this of smart wearable or implantable systems, for which miniaturized (multi-)sensors/actuators, ultra-low energy consumption, small size and appropriate form are of paramount importance. Smart wearables (often just referred to as wearables) are intelligent and often communicating (via Bluetooth) dresses, accessories and other wearable devices equipped with multiple sensors and/or actuators (e.g. screens; cameras; microphones; speakers; hart-rate, pressure, temperature, light, proximity and other sensors, GPS, etc.) that constitute a sub-class of the mobile and autonomous CPS. Examples of them are: light, sound or physically reactive dresses, smart heated clothes, connected T-shirt, monitoring T-shirt, Bluetooth jewellery, Apple Watch, personal locator watch, and other smart watches, activity tracker band, sun-exposure monitoring band, identity bracelet, smart glasses, vision-enhancement glasses, Google glass motorcycle helmet, jet fighter pilot helmets, hearing aids, wireless EEG headset, wireless ECG monitor, wireless glucose monitor, eButton (wearable health and activity monitoring system), etc. Many wearables are devoted to wireless health, personal assistance, well-being or sport applications, enabling (distant) in-action monitoring, treatment and/or communication. The application area of wearables is however much broader and includes many segments: from fashion and glamour, life-style computing and personal communication, through sport, well-being, medical and business, to safety, security and military. In 2015 wearable devices shipments are expected to exceed 90 million units and account for almost $20 billion revenue. This market is expected to further grow at almost 40% a year over the next six years, to surpass 340 million units and account for nearly $57 billion in revenue by the end of 2020 (SNS Research, August 2014). The rapidly growing market of more and more complex and sophisticated wearables will create a strong market pull for miniaturized sensors and actuators. Since the number of sensors in the future wearables will be higher than in the today's once (in 2019 an average wearable device will include 4.1 sensors compared to 1.4 sensors in 2013) the market of sensors for wearables will grow even faster than the wearables market. From 67 million units in 2013 it is expected to grow to about 85 million units in 2014, 175 million units in 2015, and 466 million units in 2019 (IHS Technology). So, an exponential growth of the miniaturized sensors is expected by factor seven in six years. A related sub-class of the mobile and autonomous CPS, but with even more stringent ultra-low energy consumption requirements, is this of smart miniaturized implants (e.g. miniaturized pace-makers, neuro-stimu-lators, implantable defibrillators, ophthalmic implants, ear cochlear implants, drug-delivery pumps, etc.) and pill-size medical devices (e.g. endoscopic devices with sub-mm cameras). Finally, body area networks (BAN), representing somewhat more complex CPS involving several implanted or worn externally sensing, processing and communicating devices, can remotely monitor several vital bio-signals and communicate them wirelessly. It is predicted that over the period 2011-2016 the global market for microelectronic medical implants will grow at an 8.9% compound annual growth rate (CAGR), from $15.1 billion in 2011 to $23.1 billion in 2016. The fastest-growing segments will be: ear implants (18.2%), neuro-stimulators (10.5%) and implantable drug pumps (10.5%) (BCC Research, May 2011). In parallel to the (ultra-)low energy and high-performance demands, for wearable and implantable systems very important issues are these of geometrical dimensions and form, as well as, integration of the miniaturized sensors and actuators with information processing and communication sub-systems on one chip or in one package. CPS connected through various combinations of heterogeneous interfaces and communication media to the global Internet are forming the so called Internet of Things (IoT). Due to a large-scale and rapidly growing invention of new modern CPS systems, the Internet of Things started an explosive growth. In 2013 there were approximately 12.1 billion internet-connected devices, and their number is expected to more than quadruple to nearly 50 billion devices by 2025. While in 2013, more than 87% of the internet-connected devices were in communications, computers, and consumer electronics, this share is expected to decline to about 59% during the next 12 years, as the industrial market is expected to become the fastest growing market (specifically, its manufacturing, medical, automotive, military and aerospace sectors), followed by the consumer market (IHS Technology, Q1 2014). The huge and rapidly developing markets of the modern CPS systems represent great opportunities both for private enterprises (from large multi-nationals, as e.g. Intel, Qualcomm, Google, etc., to small start-up SMEs) and country economies, including social systems. Summing up, the spectacular advances in microelectronics and information technology created unusual new opportunities briefly discussed above. However, at the same time they introduced unusual complexity: - Silicon Complexity, in the sense of extremely high number, diversity, small dimensions and huge density of devices and interconnects, huge length of interconnects, increased number of serious issues and changed relationships among the issues; and - System Complexity, in the sense of a huge number of possible system states, large number and diversity of subsystems, and extremely complex interactions and interrelations between the subsystems. Additionally, most of the modern "smart"CPS are highly demanding in the sense of requiring (ultra-)low energy consumption and guaranteed real-time high performance, as well as, high safety, security and dependability. The above discussed and some additional factors cause that the gap between the nano-electronic technology capability and the system designers' productivity increases rapidly. This all results in serious system development challenges, such as: - guaranteeing the real-time high performance, while at the same time satisfying the requirements of (ultra-)low energy consumption, and high safety, security and dependability; - accounting in design for more aspects and changed relationships among aspects (e.g. leakage power, negligible in the past, is a very serious issue now; increased influence of interconnects on major physical system characteristics); - complex multi-objective MPSoC optimization; - adequate resolution of numerous complex design tradeoffs; - reduction of the design productivity gap for the increasingly complex and sophisticated systems; - reduction of the time-to market and development costs without compromising the system quality, etc. These challenges cannot be overcome without a substantial system and design methodology adaptation. Already more than 20 years ago I predicted the current situation and started research that aimed at answering the question: what system technology and design technology will be adequate to serve the development of the future complex and highly demanding embedded and cyber-physical systems? When considering the system and design methodology adaptation to the situation in the field of the modern complex and highly-demanding systems, we have first to ask: what general system approach and design approach seem to be adequate to address the above mentioned problems and resolve the challenges? More than 15 years ago I proposed such system paradigm and design paradigm, namely, the paradigms of: life-inspired systems [1, 2, 3], and quality-driven design [4, 5, 6, 7]. Based on them, I developed the methodology of quality-driven model-based system design [6, 7, 8, 9]. From that time my research team and our industrial and academic collaborators are researching and applying this methodology to the multi-objective (semi-)automatic architecture exploration and synthesis for real-time embedded MPSoCs [9-29], and this research confirms the adequacy of this methodology. What are the life-inspired systems? 3 Life-inspired systems The paradigm of life-inspired systems originated from my observation that: - the complexity, operation domains and roles of the microelectronic-based systems - more and more resemble - the complexity, operation domains and roles of the (intelligent) life organisms or organized populations of such organisms. Based on this parallel, I formulated the hypothesis that: the future microelectronic-based systems should have characteristics that resemble the characteristics of the (intelligent) life organisms or their populations. Consequently, the basic concepts, principles, functional and structural organization etc. of the microelectronic-based systems should resemble these of the (intelligent) life organisms. Similarly to a real brain, a life-inspired system should effectively and efficiently solve complex problems, take and implement difficult decisions, adapt to changing conditions, learn, etc., also in relation to itself. To achieve these diverse aims effectively and efficiently in relation to complex applications and in the light of changing, noisy, or unreliable environment and own interior, a life-inspired system must be largely autonomous, self-contained, dynamic and robust, and it has to include adequate self-organization, adaptation and regulation mechanisms. Like a real organism or brain, a life-inspired system should be highly decentralized and composed of largely autonomous, diverse, having their own particular aims and optimized for these aims sub-systems (organs or centres). To form a coherent system the autonomous sub-systems have to be adequately (hierarchically) organized, interconnected with an appropriate network of effective and efficient communication channels, properly coordinated and adequately collaborating with each other, to synergis-tically achieve the global system aims. To achieve a high performance and energy efficiency of the system processing and communication: - information and intelligence, and in consequence computation, storage and communication resources, of the life-inspired system should be properly distributed over all its sub-systems; - effective application-specific computing and storage should be implemented in the sub-systems, and efficient application-specific communication should be provided inside and between the subsystems; and - application parallelism should be extensively exploited, so that all kinds of application parallelism will be adequately supported both by hardware and software. Realization of such life-inspired system, having the basic characteristics and organized according to the basic principles as sketched above, requires: - autonomous heterogeneous sub-systems, implemented using multiple clock and/or power domains, and asynchronous or GALS techniques; - local distributed application-specific memories for the sub-systems, enabling effective and efficient parallelism exploitation at the system and sub-system levels: multi-port, multi-bank and/or vector memories; - (more) global (multi-port, multi-bank) memories for sharing data and for communication between the sub-systems; - memory-centric processing for massive data -computations must come to data; - adequate application-specific mixture of effective and efficient application-specific communication schemes and mechanisms of all kinds (i.e. NoCs, busses, switches, point to point communication, etc.); - (massively) parallel application-specific processing sub-systems efficiently exploiting all kinds of application parallelism and involving in hardware implemented application-specific computation operators; and - (re-)configurable hardware to realize the flexibility often required and implement the application-specific processing and communication schemes effectively and efficiently. Specifically, (re-)configuration plays a very important role and serves numerous purposes: - computation speedup and energy usage reduction in comparison to standard software solutions, due to computing platform specialization involving (massively) parallel application-specific processing, as well as, effective implementation of application-specific operations and (massively parallel) computation patterns directly in the (re-) configurable hardware; - product differentiation and adaptability in relation to applications and standards; - adaptability to changing operation conditions (e.g. adaptive control, filtering, interfacing, etc., but also self-diagnosis and fault-tolerance); - design reuse and computational resource sharing; - development and fabrication effort re-use that results in reducing the design productivity gap, reducing the design costs and shortening the time-to-market. In particular, generic system solutions, and specifically, generic system platforms and architecture templates serve the above purposes, and additionally, enable an efficient (semi-)automatic system architecture synthesis. Through adequate instantiation and/or extension, generic system platforms and architecture templates can be reused and adapted to (better) suit a particular application. The general form of a generic solution constrains the solution search space to such a degree that the construction of particular system solution instances for a particular application can efficiently be performed (semi-)automatically. The (semi-)automatic system construction can be performed through an appropriate instantiation and/or extension of the generic architecture platform or template, and computation process scheduling and mapping on the constructed this way instance of the platform or template. Observe, that the concept of generic system solution is strictly parallel to genotype in the life organisms: genotype is mutated to better fit to particular conditions, while generic solution is adequately instantiated to better serve a particular application. The system (re-)configuration is strictly parallel to the adaptation in life organisms. Summing up: the paradigm of life-inspired systems specifies: - basic principles, characteristics, as well as, functional and structural organization of embedded and CPS systems through analogy to the (intelligent) life organisms, and - basic mechanisms and architectural solutions of systems that are necessary to implement the principles, characteristics and organization. More information on the paradigm of life-inspired systems can be found in [1, 8]. The heterogeneous MPSoC technology needed to implement the highly-demanding embedded and cy-ber-physical systems, and based on configurable and extensible application-specific instruction-set processors (ASIPs) and HW accelerators is a specific practical realization of several architectural solutions and mechanisms proposed by the paradigm of life-inspired systems. 4 Issues and challenges of the modern cyber-physical system design As explained in Section 2, many of the modern embedded and cyber-physical applications impose difficult to satisfy ultra-high demands. Applications involving big instant data generated/consumed by video sensors/actuators and other multi-sensors/actuators demand (ultra-)high performance in computing and communication. Examples of such applications include: video sensing and monitoring, computer vision, augmented reality, ultra HDTV, image and multi-sensor processing (e.g. in some wireless-health or automo- tive applications), etc. (see Fig. 1). For instance, the 4G communication requires as high as 1Gbps throughput, while various HDTV standards require as high throughputs as 1 - 6 Gbps. Applications providing continuous autonomous service in a long time demand (ultra-)low energy consumption (e.g. continuous monitoring, communication or control in remote or poorly accessible places). For instance, various modern mobile communication, multimedia and medical applications require power consumption close to or below 1 W. Many of the modern applications require not only high computing power and low energy consumption, but high security, safety and reliability as well. For wearable and implantable systems, geometrical size and form play also a very important role. The remaining part of the paper focuses on explaining which kind of cyber technology is needed to implement the embedded and cyber-physical applications that impose demands of (ultra-)high performance and/or (ultra-) low energy consumption. Figure 1: Demands of ultra-high performance and ultra-low energy consumption. The modern very complex applications, that require very high throughput and ultra-low energy consumption, usually include numerous different algorithms involving various kinds of massive parallelism: data parallelism, and task-level, instruction-level and operation-level functional parallelism. They are from their very nature heterogeneous. To adequately serve these applications heterogeneous computation platforms have to be exploited. To adequately exploit the coarse task and data parallelism of applications, application-specific processing engines with parallel multi-processor macro-architectures have to be constructed. Multiple identical or different processors, each operating on a (partly) different data subset, have to work concurrently to realize the ultra-high throughput and/ or ultra-low energy consumption. The different parts of the complex applications involving different kinds of information processing (e.g. different parallel processing structures) should be implemented using different application-part specific hardware architectures well supporting the different required kinds of process- ing. However, the contemporary multi-core general purpose processors (GPP) are homogeneous. They involve several identical cores. Therefore, no contemporary standard general purpose processor (^P, ^C, DSP, GPGPU) and no network of such processors is able to satisfy the ultra-high demands of applications that require the ultra-high throughput (e.g. in the range of several Gbps) or ultra-low power consumption (e.g. close to or below 1 W). Moreover, to realize the so high throughput they would require the clock-speed in the range of hundreds GHz and this would result in an extremely high (impossible to realize) power dissipation. To satisfy the (ultra-)high application demands, highly-optimized application-specific heterogeneous multiprocessor systems-on-a-chip (MPSoCs) are required, often involving hardware multi-processors to execute the critical computations. The stringent requirements of the highly-demanding applications can only be addressed by highly effective and efficient application-specific heterogeneous HW/SW and HW solutions, such as application-specific heterogeneous MPSoCs based on adaptable (massively) parallel ASIPs and/or hardware processors (accelerators) (see Fig.2). Figure 2: Intel ASIPs in heterogeneous MPSoCs at 32 nm (by courtesy of Intel Benelux) Moreover, many of the highly-demanding applications involve complex algorithms with data parallelism and multi-input multi-output (MIMO) operations. To satisfy their stringent throughput and energy requirements data, operation and instruction parallelism, as well as, custom hardware implementation have to be exploited in the micro-architectures of processors for these applications. Thus, the application's parallelism has to be explored and exploited at two architecture levels in combination: macro-architecture and microarchitecture level. Observe that similar performances can be achieved with fewer processors, each being more parallel and better targeted to a particular part of an application, as with more processors, each being less parallel or less application-specific. However, each of the alternatives can have different physical and economic characteristics, such as power consumption or circuit area. This results in the necessity to explore and decide various possible tradeoffs between the micro-architecture and macro-architecture design. Moreover, each micro-/macro- architecture combina- tion requires different compatible parallel memory and communication architectures. Exploitation of data parallelism in a computing unit micro-architecture usually demands getting the data in parallel for processing. This requires simultaneous access to parallel memories and simultaneous data transmission. Furthermore, in multi-processors for ultra-high-performance applications, parallelism has to be exploited on a massive scale [22-25]. However, due to the stringent energy consumption and area requirements, partially parallel architectures have usually to be used, which are much more complex to design than the fully parallel architectures exploiting the application's maximum available parallelism in a straightforward way. 5 Heterogeneous MPSoC technology for highly-demanding CPS applications To satisfy the demands of modern highly-demanding embedded and CPS applications, highly-optimized application-specific heterogeneous multi-processor systems-on-a-chip (MPSoCs) are required. For the ultrahigh throughput and/or ultra-low energy consumption massively parallel application-specific hardware multiprocessors have to be used. For less stringent throughput and/or energy requirements programmable (massively) parallel application-specific multi-ASIP systems can be applied. In many cases, for complex applications involving parts of different character, the mixed ASIP and accelerator based systems can provide the best solution. In those application-specific heterogeneous MPSoCs, the application's parallelism has to be exploited at two architecture levels in combination: macro-architecture and micro-architecture level. The coarse task and data parallelism is exploited through constructing application-specific parallel multi-processor macro-architectures, while the fine grain data, operation and instruction parallelism is exploited through constructing the application-specific parallel micro-architectures of particular processors (see Fig. 4). Also various possible tradeoffs have to be resolved between the microarchitecture and macro-architecture designs, as well as, among the processor, memory and communication architectures. Below the low-density parity-check code [30] (LDPC) decoding will be used as an example application to further explain the application-specific heterogeneous MPSoC technology. Example: heterogeneous massively parallel multiprocessors for LDPC decoding The LDPC decoding is used as an advanced error-correcting scheme in the newest wired/wireless com- munication standards, like IEEE 802.11n, 802.16e/m, 802.15.3c, 802.3an, etc. [31]. Some of these standards, for instance the IEEE 802.15.3c, specify as high as 5-6 Gbps throughputs [31]. The ultra-high performance demands in the Gbps range and other high demands of these applications cannot be satisfied by systems implemented using general purpose processors. For instance, an implementation of LDPC decoders on the famous Texas Instruments TMS320C64xx DSP processor running at 600 MHz delivers a throughput of only 5 Mbps [32]. Similarly, implementations of LDPC decoders on the general-purpose multi-cores result in throughputs of only 1-2 Mbps, and range from 40 Mbps on the GPUs to nearly 70 Mbps on the CELL Broadband Engine (CELL/B.E), as reported in [33]. For the realization of the so high throughput as several Gbps massively parallel application-specific hardware multi-processors are indispensable. A systematic LDPC encoder encodes a message of k bits into a codeword of length n with k message bits followed by m parity checks. Each parity check is computed based on a sub-set of the message bits. To define an LDPC code, a parity check matrix (PCM) of size mxn is used. In Fig. 3 an example PCM for a (7,4) LDPC code is given. Each PCM can be represented by its corresponding task graph, referred to as Tanner graph. The Tanner graph corresponding to an (n,k) LDPC code consists of n variable nodes (VN) and m = n - k check nodes (CN), connected with each other through edges, as shown in Fig. 3. Each row in the parity check matrix represents a parity check equation c^ and each column represents a code bit vj. An edge exists between a CN^ and VNj, if the corresponding value PCMij is non-zero, what means that vj is involved in the parity check equation c^. Usually, iterative Message Passing algorithms (MAP) are used for decoding of the LDPC codes [34]. The algorithm starts with the so-called intrinsic log-likelihood ratios (LLRs) of the received symbols based on the channel observations. During decoding specific messages are exchanged among the check nodes and variable nodes along the edges of the corresponding Tanner graph for a number of iterations. The variable and check node processors (VNP, CNP) corresponding to the VN and CN computations, iteratively update each other data, until all the parity checks are satisfied or the maximum number of iterations is reached. Since Tanner graphs of practical LDPC codes involve hundreds nodes and even more edges, the LDPC decoding represents a massive complex computation and communication task. Figure 3: An example PCM for a (7, 4) LDPC code and its corresponding Tanner graph Since each node receives several inputs and computes several outputs, the operations performed in nodes are multi-input multi-output (MIMO) operations. The micro-architecture level exploration is related here to the more or less parallel realization of various RTLlevel MIMO operations. For example, the IEEE 802.15.3c standard specifies four different LDPC codes with variable nodes from a single input/output to maximally 4 inputs/outputs and the check nodes from minimally 5 inputs/outputs to as many as 32 inputs/outputs. The implementation spectrum of the corresponding microarchitectures spans from the fully-serial to the fully-parallel, with numerous partially-parallel micro-architectures in between. The fully-parallel implementation (resulting in a long critical path delay and large hardware) or fully-serial implementation (resulting in a high number of computation cycles) of MIMO operations may not be satisfactory for different stringent design constraints and objectives, which necessitates a careful exploration of partially-parallel architectures at the micro-architecture level (see Fig. 4). Due to the massive data parallelism at the macro-level and task-level functional parallelism, multiple such partially-parallel processors have to be considered at the macro-architecture level to satisfy the ultra-high throughput requirements (see Fig. 4). For example, the rate-1/2 672-bit IEEE 802.15.3c LDPC code consist of 672 variable nodes and 336 check nodes that correspond to maximally the same number of variable and check node processors, respectively, in the macro-architecture level design. Consequently, depending on the actual performance requirements, different massively-parallel multi-processors have to be build of the elementary processors to satisfy the requirements, with micro-architectures of the elementary processors spanning the full spectrum from serial, through partially-parallel, to fully parallel. This results in a very high number of possible macro-architecture/micro-architecture combinations, as well as, related compatible memory and communication structures, and task (node) and data mappings, defining a huge design space of various possible multi-processor architectures with different characteristics (see Fig. 4). Therefore, construction of the optimal massively-parallel heterogeneous multi-processor architectures is a very difficult and time-consuming task. Fortunately, this task can be automated to a high degree [22, 25]. End of Example. More information on architectures and designing of massively parallel application-specific hardware multiprocessors for highly-demanding embedded and CPS applications, as well as, on distributed parallel memories and hierarchical communication structures can be found in our recent papers [22, 23]. However, in parallel to hardware (multi-)processors implementing the most demanding tasks, an application-specific heterogeneous MPSoC includes several different programmable processors to realize the design reuse and computational resource sharing, as well as, flexibility required for product differentiation, adaptability to changing applications and standards, and accommodation of the late design changes. In general, the heterogeneous MPSoC technology exploits heterogeneous (application-specific) computation and communication resources involving: general- purpose processors (GPPs), application-specific instruction-set processors (ASIPs), HW (multi-)processors, distributed parallel memories and hierarchical communication structures. Usually, the general-purpose processors of a heterogeneous MPSoC perform some low-performance control, synchronization and communication tasks, ASIPs perform the main high-performance tasks, and application-specific HW (multi-)processors the most critical ultra-high performance tasks. Although usually the ASIPs and HW processors occupy only a few times larger area on an MPSoC than the GPPs (see Fig. 2), due to the effective and efficient implementation of the application parallelism with their application-specific parallel hardware they can deliver more than hundred times higher processing power than the GPPs. In the recently finished research project ASAM [27] (http://www.asam-project.org/) of the European Industrial Research Program in Embedded Systems ARTEMIS, we developed automated architecture synthesis and application mapping technology for heterogeneous MPSoCs based on adaptable ASIPs customizable to a particular application through instantiation and extension. Only a few companies in the world possess such a heterogeneous MPSoCs technology based on adaptable ASIPs: one of them is Intel Benelux - one of the main industrial partners of ASAM. Figure 4: Massively-parallel multi-processors with parallel processor micro-architectures The architecture platform targeted in the ASAM project was the heterogeneous multi-ASIP MPSoC platform of Intel Benelux (previously Silicon Hive), which can be instantiated and extended for specific applications. Each ASIP of the platform is a VLIW processor capable of executing parallel software with a single thread of control. The generic ASIP architecture of the targeted MPSoC platform is graphically represented in Fig. 5. It includes a processor core (core) performing the actual data processing, and core I/O (coreio) implementing the local memories and I/O subsystem enabling the communication of the ASIP with the rest of the system. The ASIP core includes a VLIW datapath controlled by a sequencer that uses status and control registers, and executes programs from the local program memory. The datapath contains scalar and/or vector function units organized in several parallel issue slots. The issue slots are connected via programmable input and output interconnections to registers organized in several register files. The function units perform computations on intermediate data stored in the register files. Both SIMD and MIMD processing can be realized. The coreio, implementing the local memories and I/O subsystem, enables an easy integration of the ASIP in any larger system. Any other processors or devices of the MPSoC can access the devices in coreio via master and slave interfaces. Numerous divergent application-specific ASIP instances can be created through configuration and extension of the generic ASIP architecture. The parameters to be explored and set to create an ASIP instance include: the number and type of issue slots and (scalar or vector) instructions inside the issue slots, the number and type of issue slot clusters to optimize parallelism exploitation and communication between the issue slots, the number and size of register files, the type, data width, and size of local memories, the architecture and the parameters of the local communication structure, the scheduling and mapping of the application part assigned to the ASIP onto the ASIP parallel issue slots and their data onto the local memories; etc. In particular, ASIPs can be constructed that contain both the scalar issue slots and vector issue slots with different vector lengths, as well as, different scalar and vector memories. Figure 5: Generic VLIW ASIP architecture (by courtesy of Intel Benelux) Several different ASIPs, each customized for a particular part of a complex application, can be interconnected via direct buffered connections, busses or a Network-on-Chip (NoC) including shared memories and DMAs to form and MPSoC (see Fig. 2). The parameters to be explored and set at the MPSoC system-level include: the number and types of ASIPs; the number, type and size of shared memories; the scheduling and mapping of the application parts onto the ASIPs and their data onto the memories; and the architecture and parameters of the global communication structure. The potential of the ASIP-based MPSoC can be illustrated as follows. Several ASIPs with approximately 100 issue slots in total, each for 64-way vector processing, can be placed on a single chip implemented in the currently exploited 22 nm CMOS technology. When operated at only 400-600 MHz, these ASIPs can deliver more than 1 Tops/s, with power consumption far below the upper limit of mobile devices. Such ASIP-based heterogeneous MPSoC platforms enable efficient exploitation of various kinds of parallelism: the multiple ASIPs enable the coarse-grain data and task parallelism exploitation, while the ASIP's parallel issue slots and vector instructions enable the fine-grain data, instruction and operation parallelism exploitation. This adaptable ASIP-based MPSoC technology addresses several fundamental challenges for the development of the highly-demanding embedded and CPS applications: - it is able to deliver high performance, high flexibility and low energy consumption at the same time; - it is relevant for a very broad range of application domains; - it is applicable to several implementation technologies, e.g.: semi-custom SOC or ASIC, structured ASIC, and FPGA. Provided that an effective and efficient highly automated customization technology will become available, it will become possible to build adaptable ASIP-based MPSoCs at substantially lower costs and with shorter time to market than for the hardwired ASICs or processors build from scratch. This was the primary target of the ASAM project. With the state of the art in the design technology before the ASAM project: - the architecture, software, and hardware of the customizable ASIP-based systems had to be designed by experts supported by only some point tools; - these experts have to possess deep knowledge of the application analysis and restructuring, target technology, ASIP and multi-ASIP system architecture design, as well as, software mapping and compilation processes; - even for an expert the application analysis and construction of a high-quality software structure and corresponding hardware platform for a complex application constitute a very complex, time-consuming and error-prone task; - the difficulty and complexity of this task dramatically reduced the abilities of a high-quality systematic exploration of the system, hardware and software design spaces and resulted in a low productivity and/or decreased design quality. In the ASAM project we developed an effective automated design technology that efficiently performs the HW/SW MPSoC architecture exploration and synthesis both at the ASIP and system level, and implemented the prototype EDA tools of this technology. More information on the architecture of heterogeneous ASIP-based MPSoCs and automated HW/SW co-design technology for the MPSoCs can be found in our recently published papers [17-20][26-29] and on the ASAM project website: http://www.asam-project.org/ . By comparing information on the application-specific heterogeneous MPSoC technology presented in this section with information on the paradigm of life-inspired systems presented in Section 3, one can easily conclude that the heterogeneous MPSoC technology is a specific practical realization of several architectural solutions and mechanisms proposed by the paradigm of life-inspired systems. 6 Conclusion Spectacular advances in microelectronics and information technology created unusual opportunities, and particularly, big stimulus towards development of various kinds of high-performance embedded and cyber-physical systems. However, they also introduced unusual complexity and heterogeneity. Moreover, many modern embedded and cyber-physical applications are not only complex and heterogeneous, but highly-demanding as well. This all combined results in numerous serious system development challenges briefly discussed in this paper. To overcome these challenges a substantial system and design methodology adaptation was necessary. This included development of the paradigms of life-inspired systems and quality-driven design, as well as, of new system and design technologies implementing them. The new system and design technologies replaced several concepts by the required new once, as e.g.: sequential computing by highly parallel computing, homogeneous architectures by heterogeneous architectures, simple flat architectures by complex hierarchical architectures, separate HW and SW design by actual coherent HW/SW co-design, separate processing, memory and communication design by their co-design, separate macro- and micro-architecture design by their co-design, simple optimization by complex multi-objective optimization and trade-off exploration, etc. In our former projects and in the recently finalized ASAM project we performed a substantial pioneer R&D work towards an adequate system and design technology for the modern highly-demanding embedded and cyber-physical systems. Nevertheless, much R&D work is still needed in this revolutionary developing area being of primary importance for individuals, societies, industries and countries. 7 References L. J0žwiak: Life-inspired Systems, DSD'2004 - Euromicro Symposium on Digital System Design, August 31st - September 3rd , 2004, Rennes, France, ISBN 0-7695-2003-0, IEEE Computer Society Press, Los Alamitos, CA, USA, pp. 36-43 (Invited Paper). L. J0žwiak: Life-inspired Systems: Assuring Quality in the Era of Complexity, IWSOC'2005 - 5th IEEE International Workshop on System-on-Chip for Real-Time Applications, July 20 - 24, 2005, Banff, Alberta, Canada, ISBN 0-7695-2403-6, IEEE Computer Society Press, Los Alamitos, CA, USA, pp. 139-142 (Invited Paper). 3. L. J0žwiak: Life-inspired Systems and Their Quality-driven Design, ARCS'06: 19th International Conference on Architecture of Computing Systems - System Aspects in Organic Computing, Frankfurt/Main, Germany, March 13 - 16, 2006, pp. 1-9, (Keynote Paper). 4. L. J0žwiak: Modern Concepts of Quality and Their Relations to Model Libraries, IFIP/ESPRIT Workshop on Libraries, Component Modelling, and Quality Assurance, Nantes, France, 26-27 Apr., 1995. 5. L. J0žwiak: Quality-driven Design of Hardware/ Software Systems, Proc. IEEE/IFAC International Conference on Recent Advances in Mechatronics, ICRAM '95, Istanbul, Turkey, 14 August 1995, ISBN 975-518-063-X; IEEE Computer Society Press, Los Alamitos, CA, 1995, pp. 459-466. 6. L. J0žwiak: Modern Concepts of Quality and Their Relationship to Design Reuse and Model Libraries, in the book series Current Issues in Electronic Modelling, Chapter 8, vol. 5, Kluwer Academic Publishers, Dordrecht, 1995. 7. L. J0žwiak: Quality-driven Design in the System-on-a-Chip Era: Why and how?, Journal of Systems Architecture, vol. 47, no. 3-4, Apr. 2001, pp. 201224, (Keynote Paper). 8. L. J0žwiak: Life-inspired Systems and Their Quality-driven Design, Lecture Notes in Computer Science, Vol. 3894, 2006, Springer, pp. 1-16. 9. L. J0žwiak and S.-A. Ong: Quality-driven Modelbased Architecture Synthesis for Real-time Embedded SoCs, Journal of Systems Architecture, Elsevier Science, Amsterdam, The Netherlands, ISSN 1383-7621, Vol. 54, No 3-4, March-April 2008, pp. 349-368. 10. L. J0žwiak: Quality-Driven Design Space Exploration in Electronic System Design, IEEE International Symposium on Industrial Electronics, Warsaw, Poland, June 17-20, 1996, pp. 1049-1054. 11. L. J0žwiak and S.A. Ong: Quality-Driven Decision Making Methodology for System-Level Design, EUROMICRO'96 Conference, IEEE Computer Society Press, Prague, Czech Republic, Sept. 02-05, 1996, pp. 08-18. 12. S.A. Ong, L. J0žwiak, K. Tiensyrja: Interactive Codesign for Real Time Embedded Control Systems: Task Graphs Generation from SA/VHDL Models, Proc. EUROMICRO 97, 23rd Conference "New Frontiers of Information Technology", Budapest, Hungary, Sept. 1 4, 1997, ISBN 0 8186 8129 2; IEEE Computer Society Press, Los Alamitos, CA, USA, 1997, pp. 172 181. 13. S.A. Ong, L. J0žwiak, K. Tiensyrja: Interactive Codesign for Real Time Embedded Control Systems, Proc. ISIE 97, IEEE International Symposium on Industrial Electronics, Guinaraes, Portugal, July 7 11, 1997, ISBN 0 7803 3334 9; IEEE Press, 1997, pp. 170-175. 14. L. J0žwiak: Subjective Aspects of Quality in the Design of Complex Hardware/Software Systems, SCI'2001 - World Multiconference on Systemics, Cybernetics and Informatics, July 22-25, 2001, Orlando, Florida, USA, IIIS Press, ISBN 980-07-7551-X, pp. 223 - 228. 15. L. J0žwiak, Sien-An Ong: Quality-driven Template-based Architecture Synthesis for Real-time Embedded SoCs, DSD'2006 - 9th Euromicro Conference on Digital System Design, August 30 - September 1, 2006, Cavtat near Dubrovnik, Croatia, ISBN 0-7695-2443-8, IEEE Computer Society Press, Los Alamitos, CA, USA, pp. 397-406. 16. L. J0žwiak and Y. Jan: Quality-driven Methodology for Demanding Accelerator Design, ISQED'2010 - The IEEE International Conference on Quality Electronic Design, San Jose, CA, USA, March 2224, 2010, ISBN 978-1-4244-6454-8, IEEE Computer Society Press, Los Alamitos, CA, USA, pp. 380 - 389. 17. L. J0žwiak and M. Lindwer: Issues and Challenges in Development of Massively-Parallel Heterogeneous MPSoCs Based on Adaptable ASIPs, PDP'2011 - 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing, pp. 483-487. 18. L. J0žwiak, M. Lindwer, R. Corvino, P. Meloni, L. Micconi, J. Madsen, E. Diken, D. Gangadharan, R. Jordans, S. Pomata, P. Pop, G. Tuveri, L. Raffo: ASAM: Automatic Architecture Synthesis and Application Mapping, Proc. DSD 2012 - 15th Euromicro Conference on Digital System Design, Cesme, Izmir, Turkey, 5-7 September 2012, IEEE CPS, pp. 216 - 225. 19. R. Jordans, R. Corvino, L. J0žwiak, H. Corporaal: Exploring Processor Parallelism: Estimation Methods and Optimization Strategies. In DDECS 2013 -16th IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems, pages 18-23, Karlovy Vary, Czech Republic, 2013. DOI: 10.1109/ DDECS.2013.6549782 (Best Paper Award). 20. R. Jordans, R. Corvino, L. J0žwiak, H. Corporaal: An Efficient Method for Energy Estimation of Application Specific Instruction-set Processors. In DSD 2013 - 16th Euromicro Conference on Digital System Design, pages 471-474, Santander, Spain, 2013. DOI: 10.1109/DSD.2013.120 21. L. J0žwiak, N. Nedjah, M. Figueroa: Modern Development Methods and Tools For Embedded Re-configurable Systems - A survey, Integration - The VLSI Journal, Elsevier Science, ISSN: 0167-9260, Volume 43, No 1, January 2010, pp. 1-33. 22. Y. Jan and L. J0žwiak: Scalable Communication Architectures for Massively Parallel Hardware MultiProcessors, Special Issue on Communication Architectures for Scalable Systems, Journal of Parallel and Distributed Computing, Elsevier Science, Vol. 72, Issue 11, doi:10.1016/j.jpdc.2012.01.017, November 2012, pp. 1450-1463. 23. Y. Jan and L. J0žwiak: Communication and Memory Architecture Design of Application-Specific High-End Multiprocessors, VLSI Design, Vol. 2012, Hindawi Publishing Corporation, doi:10.1155/2012/794753, January 2012, pp. 1 -20. 24. J0žwiak, L. and Jan, Y.: Design of Massively Parallel Hardware Multi-Processors for Highly-Demanding Embedded Applications. Microprocessors and Microsystems, Volume 37, Issue 8, November 2013, pp. 1155-1172. 25. Jan, Y. and J0žwiak, L.: Processor Architecture Exploration and Synthesis of Massively Parallel Multi-Processor Accelerators in Application to LDPC Decoding. Microprocessors and Microsystems, Vol. 38, Issue 2, March, 2014, pp. 152-169. 26. R. Jordans, R. Corvino, L. J0žwiak, H. Corporaal: Exploring Processor Parallelism: Estimation Methods and Optimization Strategies, International Journal of Microelectronics and Computer Science, Issue 4, No 2, 2013, pp. 55-64. ISSN: 20808755. 27. J0žwiak, L.; Lindwer, M.; Corvino, R.; Meloni, P.; Micconi, L.; Madsen, J.; Diken, E.; Gangadharan, D.; Jordans, R.; Pomata, S.; Pop, P.; Tuveri, G.; Raffo, L. and Notarangelo, G.: ASAM: Automatic Architecture Synthesis and Application Mapping, Microprocessors and Microsystems journal, Vol.37, No 8, pp. 1002-1019, 2013. 28. A. S. Nery, N. Nedjah, F. Franga and L. J0žwiak: A Framework for Automatic Custom Instruction Identification on Multi-Issue ASIPs. In: International Conference on Industrial Informatics, 2014, Porto Alegre. 12th IEEE International Conference on Industrial Informatics. Los Alamitos: IEEE Computer Society Press, 2014. p. 428-433. 29. A. S. Nery, N. Nedjah, F. Franga and L. J0žwiak: Automatic Complex Instruction Identification for Efficient Application Mapping onto ASIPs. 5th IEEE Latin American Symposium on Circuits and Systems - LASCAS 2014, 2014, Santiago, Chile, Los Alamitos: IEEE Computer Society Press, 2014, pp. 1-4. 30. R. Gallager: Low-density Parity-Check Codes, IRE Transactions on Information Theory 8 (1), (1962), pp. 21-28, http://dx.doi.org/10.1109/ TIT.1962.1057683. 31. IEEE Standard For Information Technology - Telecommunications and Information Exchange Between Systems - Local and Metropolitan Area Networks - Specific Requirements. Part 15.3: Wireless Medium Access Control (mac) and Physical Layer (phy) Specifications for High Rate Wireless Personal Area Networks (wpans) Amendment 2: Millimeter-Wave-Based Alternative Physical Layer Extension, IEEE Std 802.15.3c-2009 (Amendment to IEEE Std 802.15.3-2003), 2009 c1 -187, doi:http:// dx.doi.org/10.1109/IEEESTD.2009.5284444. 32. G. Lechner, J. Sayir, M. Rupp: Efficient DSP Implementation of an LDPC Decoder, in: Proceedings, IEEE International Conference on, Acoustics, Speech, and Signal Processing (ICASSP '04), vol. 4, 2004, pp. iv-665-iv-668, doi:http://dx.doi. org/10.1109/ICASSP.2004.1326914. 33. G. F. Fernandes, L. Sousa, V. Silva: Massively LDPC Decoding on Multicore Architectures, IEEE Transaction on Parallel and Distributed Systems 22 (2) (2011), pp. 309-322. 34. D. MacKay: Good Error-Correcting Codes Based on Very Sparse Matrices, IEEE Transactions on Information Theory 45 (2) (1999), pp. 399-431, http://dx.doi.org/10.1109/18.748992. Arrived: 02. 10. 2014 Accepted: 17. 11. 2014