A study of interaction modalities of an interactive multimedia system Jože Guna, Jan Šuštar, Emilija Stojmenova, Andrej Kos and Matevž Pogačnik Univerza v Ljubljani, Fakulteta za elektrotehniko, Tržaška 25, 1000 Ljubljana, Slovenija E-pošta: joze.guna@fe.uni-lj.si Abstract. We present results of a usability study of new interaction modalities applied to a standard iTV multimedia platform. To ensure the comparability of results, a functionally identical multimedia user interface based on the open XBMC platform was created. Two interaction study setups were tested - one based on a remote-control-based interactive multimedia system and the other on a touch-screen-based system. In the first study setup, a traditional TV remote-control interaction was compared to the interaction using a simplified TV remote controller in combination with voice commands and simple tactile gestures. In the second study setup, interaction using a simplified and an extended set of touch gestures was explored. Both setups were evaluated by an usability study with 14 participants. A standard SUS method was used for usability measurement. By adding other modalities, such as voice commands and tactile gestures, improved the obtained usability score (81.8) when compared to the traditional TV remote-control interaction modality (75.4). The touch-gesture-enabled second-screen device also improved the user experience (86.1 for a simple set of the touch gestures). Summarizing the results in one sentence, "Less and simpler is more." can be concluded. Keywords: interactive television, interaction modality, user study, HCI Študija modalnosti interakcij na primeru interaktivne multimedijske aplikacije Opisan je sistem za večmodalno interakcijo na primeru XBMC multimedijske platforme interaktivne televizije ter podani rezultati študije uporabnosti in kvalitete uporabniške izkušnje. Raziskava obsega študijo dveh scenarijev. V prvem scenariju raziskujemo uporabo multimedijskega sistema z uporabo klasičnega daljinskega upravljalnika ter z uporabo sodobnejšega poenostavljenega daljinskega upravljalnika z možnostjo taktilnih gest in glasovnih ukazov. Drugi scenarij temelji na uporabi modalnosti gest z dotikom. Za dosego primerljivosti rezultatov, je bila v obeh primerih uporabljena ista multimedijska aplikacija in funkcionalno enak vmesnik. Rezultati študije potrjujejo hipotezo, da so uporabniki bolje sprejeli večmodalno upravljanje s preprostejšim daljinskim upravljalnikom (81.8). Pri scenariju z uporabo modalnosti z gestami na dotik se je prav tako bolje izkazala uporaba preprostejšega nabora gest (86.1 ). 1 Introduction Human beings are intrinsically socially and communicatively oriented. In fact, a famous quote by dr. Paul Watzlawick, a theoretician in communication theory and radical constructivism, states: "Man kann nicht nicht kommunizieren." [1]. Communication and interactions are part of our daily lives, whether we are aware of them (verbal communication) or not (nonverbal communication). Consequently, the continuous and significant technological advances, especially in the field of interactive television (iTV) and multimodal user interfaces, have resulted in the enrichment of the television (TV) experience in a number of ways and have thus changed our communication habits significantly. The TV sets are no longer used only for passive watching of linear TV programs. Instead, they allow the end users to connect to online social platforms and video sites, play games and access their home multimedia content (e.g. pictures, videos, music) [2]. Moreover, the way the users control their TV sets has changed drastically. The era of big remote controllers with dozens of buttons and controls seems to be in decline, which is a reasonable change due to the poor user experience they provide. Nowadays, more and more TV sets and set top boxes come with intelligent remote controllers, providing multimedia rich and enjoyable experience through intuitive, aesthetic and simple to use multimodal interfaces. In general, an interaction modality in terms of the Human Computer Interaction (HCI), designates a method of communication between the human and the computer and is usually bidirectional in nature. The output modalities, such as vision, hearing or haptic modalities, coincide with the human senses and provide a sense through which the humans receive the output of the computer. The input modalities allow the computer Received 14 August 2014 Accepted 26 September 2014 to receive the input from the humans using various sensors and other devices, such as keyboard, mouse, accelerometer, camera, microphone, touch-enabled interface, etc. To provide a better user experience in HCI, multiple modalities are combined and a new class of interfaces emerges, i.e. "multimodal interfaces". A definition of the multimodal interfaces is given in [3] by Sharon Oviatt: "Multimodal systems process two or more combined user input modes — such as speech, pen, touch, manual gestures, gaze, and head and body movements — in a coordinated manner with the multimedia system output." Compared to a traditional keyboard and mouse interface or a classical remote controller device, the multimodal interfaces provide a flexible use of the input modes and allow the users a choice of which modality or their combinations to use and when [4]. The multimodal interfaces are thus perceived as easier to use and applicable in the entertainment and educational segments, particularly in combination with virtual environments, such as the Second Life [5], as well as for a strict industrial use [6]. Following the rise of the smart-phone and tablet sales, a number of solutions support remote control of TV sets via these additional devices. These solutions are better known as the "second-screen" applications. Multiple software platforms are available for this purpose, such as DLNA, AirPlay, Anymote and other specific solutions from the TV manufacturers. Besides mapping the basic remote-controller functions on the device screen, the second-screen devices give an added value to an easier browsing through the Video on Demand (VoD) catalogues, Electronic Program Guides (EPGs) and social applications [7]. The touch based interaction is the common interaction modality used. The idea behind our study was to evaluate the added value of new interaction modalities applied to a common multimedia rich platform. For this purpose, an open source XBMC platform [8] was selected which is available for nearly all the mainstream systems (Windows, Linux, ARM, Xbox, iOS, MAC OS). The XBMC provides an open, adaptable, capable and widespread platform. Functionally the same, but adapted to specifics of the tested interaction modalities, suitable user XBMC skins were developed for a multiple of the devices. Two interaction study setups were tested - one based on a remote-control interactive multimedia system and the other on a touch-screen-based system. • In the first case, a traditional TV remote interaction was compared to the interaction using a simplified TV remote controller in combination with voice commands and simple tactile gestures. For the traditional part of the study using the TV remote-controller, the commercial IPTV system "INNBOX" developed by the Iskratel Company [9] was used. • In the second case, an exclusively touch-gestures-driven XBMC interface was tested as a fully functional multimedia system per se. The extended and simple touch-gesture sets were compared. In both cases, the standard System Usability Scale (SUS) questionnaire in combined with a think aloud technique was selected as a usability estimation method. The purpose of our study was to test the hypothesis that simplifying the remote-control and including voice commands should improve the usability and consequently the user experience for the most frequent multimedia tasks and navigational scenarios, such as navigation between the main menus (e.g. video, music, pictures, weather info, etc.). In addition to that we hypothesized that a simpler set of the control touch gestures should prove to provide a better user experience despite limitations in functionality. The rest of the paper is organized as follows. In Section 2, the related work is presented. Interaction modalities, experimental environment, method and evaluation procedure are described in Sections 3 and 4, respectively. Results are shown in Section 5, while discussion and key conclusions are drawn in Sections 6 and 7, including suggestions and motivation for future work. 2 Related work The present work is a continuation of the study in [10] where the authors present the results of a user evaluation study by comparing a standard unimodal TV remote controller with a multimodal and simple to use WiiMote controller including tactile gesture and voice commands. A standard methodology including SUS, AttrakDiff and think-aloud methods were used. The results clearly show that the users prefer the multimodal approach to the unimodal approach (82 average SUS score vs. 75 average SUS score). The study [11] presents a concept of a speech-enabled TV remote controller. It is simple in design and is equipped with a built-in microphone and press-talk switch. The authors focus mainly on the TV remote controller and speech-recognition system and less on the TV multimedia interface itself. The chosen speech-recognition trigger strategy is "press to talk" and is not continuous. In [12] the authors present results of comparing two remote controls (unimodal vs. multimodal) for a linear TV and VOD movie entertainment system. They show that multimodality improves user experience despite the clear differences in usability, and that multimodal interfaces are actually providing a universal access. Different user groups show there are slight differences when interacting with the multimodal remote control while the older users seem to benefit from multimodality. In [13], a gesture interface based on unicursal gestures for a TV control via touchscreen devices is presented. The unicursal gestures are made without lifting the thumb from the screen. A simple TV navigation and EPG control are tested. The trial study results show good feasibility of the proposed interface with more than 96% recognition success rate. In [14], usability of interactive TV applications is evaluated. It is shown, that iTV should be treated as a unique medium with its own set of constraints and opportunities. Special attention should be paid to simplicity, consistence of the user interface, navigation and cultural aspects. A set of heuristics to be considered when evaluating the iTV applications usability is presented. 3 Interaction modalities study setup In our study we experimented with the following two interaction modalities: • a remote-control-based interactive multimedia system and • a touchscreen-based multimedia system. A description of the interaction modalities and of the belonging system architecture is given below. 3.1 The remote-control-based multimedia system The TV multimedia system used in our experiment was a slightly modified prototype of the commercial IPTV system INNBOX, based on the XBMC multimedia platform. The INNBOX user-interface skin was modified in such a way that it was the same for both setups. Thus we were able to compare the interaction modalities without impacting the GUI specifics. 3.1.1 Interaction modalities The following two interaction-modality setups were compared: • interaction using only a standard TV remote controller (IR-based), and • interaction using a simplified TV remote controller (RF-based) in combination with voice commands and simple tactile gestures. The standard remote controller (Figure 1-top) used directional buttons, "HOME" button, shortcut buttons to video, music, picture and linear TV menus, playback-related buttons (PLAY, PAUSE, FFWD, etc.), colour buttons and ten buttons for text input. The device itself was IR-based. On the other hand, the simplified remote controller was a programmable Nintendo WiiMote [15] remote controller (Figure 1-bottom) with directional buttons, "OK" button, "BACK" button and "HOME" button. Four simple tactile gestures in a free 3D space were implemented. The gestures swing-up and swing-down consequently resulted in "PAGE-UP" and "PAGEDOWN" commands, allowing the user to scroll faster through the menus and multimedia item lists. The gestures swipe-left and swipe-right were used for navigating between the previous and next media items. In addition to this, we integrated the following voice commands: "HOME", "MUSIC", "VIDEO", "WEATHER", "PICTURES", "OK", "LEFT", "RIGHT", "UP" and "DOWN". Voice recognition was continuous and was running without any special user interventions. These interaction modalities are described in Table 1. Table 1. WiiMote modalities Action WiiMote buttons WiiMote gestures Voice Basic navigation (directional buttons) "left", "right", "up", "down" OK button ® "ok" HOME button ® "home" BACK button a1 PageUp/PageDown ]J J "swing down" PreviousItem/ NextItem "horizontal swipe" Pictures shortcut "pictures" Music shortcut "music" Video shortcut "video" Weather shortcut "weather" The Nintendo Wii remote was selected for being much simpler than the standard TV remote, which usually implements a lot of different buttons in a complex layout. Some of the missing buttons like "PLAY" and "PAUSE" were replaced with contextual interpretation of the existing commands. For example, if the user wanted to pause/play the video playback the "OK" button on the remote had to be pushed. At the same time, this button had the "OK/SELECT" functionality in other menus. This proved to minimize the cognitive load of the setup as noted from the user comments and observations in discussion. The WiiMote is RF-based using the standard bluetooth connection technology. 1 II ft 1 1 - § ' 5 ft inn',! L um» %k & ^ _ - - Figure 1. Traditional TV remote controller (top) and multimodal WiiMote controller (bottom). 3.1.2 System architecture The experimental setups were installed on a standard laptop (Intel i5 CPU, 4GB RAM). The INNBOX setup using a unimodal remote controller was running on the Linux platform, while the voice-enhanced setup was running on the Windows XP OS and was using a customized version of the XBMC multimedia system. The voice-recognition software was developed in C# using the Microsoft Kinect and Speech SDK [16] and Microsoft Kinect sensor as a cost-effective and handsfree microphone array input device. The Nintendo Wii remote was programmed using the GlovePie environment [17]. A 42" Panasonic plasma TV was used in both cases as the display device. The system architecture is presented in Figure 2. A XBMC application with GUI 1 Multimodal control script Voice Recognition App Uhura (C#) user Tactile gestures (Glove Pie) Voice recognition (Kinect SDK) Operating System and drivers (Windows) PC hardware and sound interface Bluetooth hardware WiiMote Special attention was paid to the selection of the suitable touch-gesture sets. The selected gestures can be drawn in a single stroke (i.e. unicursal), they are not too complicated and therefore intuitive for the users, they have a meaningful connection to the task they represent, they are sufficiently different in order to ensure a good memorability and recognition success and, last but not least, the gesture set itself should be sufficiently small. The simplified set of the touch gestures included ten gestures for: directional movements (up, down, left and right), volume-control gestures (volume up and volume down), play-control gestures (pause/play) and multimedia item-selection gestures (previous, next item). The touch-gesture set was expanded with the additional gestures for the pictures, music, video and weather sections shortcuts (picture, music, video, and weather gestures), picture-control gestures (zoom in, zoom out and rotate), media-stop gesture (stop) and volume-mute gesture (mute). The rounded end depicts the gesture starting point, while the arrow represents the drawing direction and the end point. These interactions are presented in Table 2. Table 2. Touch gesture modalities Figure 2. System architecture. 3.2 The touchscreen-based multimedia system In our experiment we used an integrated all-in-one touch-enabled PC. A functionally identical, but touchscreen adapted XBMC GUI skin was developed. 3.2.1 Interaction modalities Only the touch gestures were used to control the multimedia system. Two scenarios were studied: • interaction using a simplified set of the touch gestures and • interaction using an extended set of the touch gestures. Action Simplified gesture set Extended gesture set Basic navigation (directional controls) Play > > Pause N N PreviousItem < < NextItem > > Volume up a A Volume down A A Stop S Mute h Rotate £ Zoom in / Zoom out S Pictures shortcut f> Music shortcut O Video shortcut V Weather shortcut w An adapted XBMC user interface for the touch-based interactions was developed for this part of the experiment. Although the interaction method and the terminal device themselves were different, the functionalities were absolutely identical in order to ensure the results comparability. The main differences in the interface were in the enlarged button and icon sizes. An example of the picture-selection screen adapted for the touch-based interface alone is shown in Figure 3. Figure 3. Touch-adapted XBMC interface (picture-selection screen). 3.2.2 System architecture A touch-enabled PC Asus ET2002T PC [18] (dual core Atom 330 and Nvidia ION chipset based) running the Windows OS and XBMC platform was used. A "Just Gestures" application [19] with a suitable touch-gesture definition and belonging actions was used for gesture recognition. The central part of the screen was used for the gesture input and the gesture paths were drawn in order to assist the users. The system architecture is presented in Figure 4. XBMC application with touch input adapted 1 GUI | Touch gestures definition and action script User Just Gestures application Operating System and drivers (Windows) All in one touch enabled PC Figure 4. System architecture. 4 Experiment setup In this section we present the scenario, study-participant data and evaluation method. 4.1 Scenario In each of the four experiments, the same evaluation scenario was used. It consisted of the following tasks: 1. Start with the Home screen menu. 2. Go to the "Video" menu, find the video titled "RatatouiUe" and start the playback. 3. Mute the sound (optional step). 4. Unmute the sound (optional step). 5. Stop/pause the video playback. 6. Go to the "Music" menu to find the song titled "Bob Marley - Sunshine Reggae" and start playing. 7. Go to the menu "Pictures" and locate the image titled "Universe". Scroll through the picture list. Find the image titled "Stadium". 8. Go to the menu "Weather" and see the current weather forecast for the city of Ljubljana. The scenario and interaction modalities were explained to the participants before the test. The users were also able to freely choose the interaction modalities they liked (e.g. tactile gesture, voice, etc.) to accomplish the tasks within the limits of each part of the experiment. 4.2 Remote-control- based multimedia system Fourteen participants took part in the user evaluation study. Ten of them were male and four were female. The youngest participant was 22 years old, whilst the oldest one was 42 years old. The average participant's age was 32 years. Four participants declared themselves as expert users of the multimedia systems and eight of them declared as regular users. Two participants said they were occasional users and only had little experience with the multimedia systems. Their previous experience with multimedia systems was assessed on a five-point Likert scale, where 1 indicated little or no experience and 5 an expert user. An expert user has a lot of experience with the multimedia systems (e.g. IPTV, VoD, home multimedia repositories, etc.) on different devices (set-top-box, smart phone, tablet PC, etc.) on a daily basis. In the first part of the study (RC), the participants were interacting with a remote control-based-multimedia system using only a standard remote controller. In the second part of the evaluation study (RC and voice), the whole procedure was repeated. However, in this part the participants were interacting with the remote-control-based multimedia system using a remote controller in combination with voice commands and tactile gestures. 4.3 Touchscreen-based multimedia system Another fourteen participants took part in the user evaluation study, since this experiment was a continuation of the previous. There were seven male and seven female participants, aged between 22 and 39 years. The average age was 30. All of them were familiar with the touch-based interfaces (previous experience with the touchscreen smart phones or tablet PCs). The participants followed through the same scenario using both sets of the touch gestures (simplified and extended). 4.4 Evaluation method To evaluate the usability, a standard System Usability Scale (SUS) questionnaire was used. During the study, the participants were encouraged to think aloud, i.e. to talk about the problems they encountered and to express their general impressions of navigating the multimedia system using different modalities. At the end the study, they were invited to suggest how the system should be improved to better fit with their expectations. The evaluation procedure was counterbalanced by dividing the participants into two halves (odd number, even number principle) and performing the two parts of each experiment in an alternating order. 5 Results The average SUS score for the remote-control-based multimedia system for the interaction with the standard TV remote controller was 75.4 (20.8 standard deviation). The lowest obtained SUS score was 35 (participant 8) while the highest was 97.5 (participant 6). The average SUS score for the interaction with a remote controller in combination with voice commands was 81.8 (13.2 standard deviation). The lowest obtained individual SUS score in this part of the study was 50 (participant 11), while the highest was 97.5 (participant 2). Detailed results per user are presented in Figure 5. was 75 (participant 10), while the highest was 95 (participant 7). Detailed results per user are presented in Figure 6. Figure 5. Individual SUS scores for the remote-control-based multimedia system. The average SUS score for the touchscreen-based multimedia system when using the extended set of the touch gestures was 79.5 (20.4 standard deviation). The lowest obtained SUS score was 42.5 (participant 1), while the highest was 100 (participant 6). The average SUS score when using the simplified set of the touch gestures was 86.1 (5.3 standard deviation). The lowest obtained individual SUS score in this part of the study I I ii III J I I I I I I II I I I I I I I I I I I I I I I I I I ■■■■■■■■■■■■■I ■ Extended touch gestures 4Î.5 95 90 ft'. 97.5 1Û0 97 5 0€ S2.5 70 87 5 90 S2.5 bib ■ SnwtetouchsBtara 87.5 87.5 90 90 90 85 | 95 | 80 »0 75 82.5 87.5 90 65 9 10 11 12 13 14 Figure 6. Individual SUS scores for the touchscreen-based multimedia system. 6 Discussion We present results of the usability study of new interaction modalities applied to a common iTV multimedia-rich platform. To ensure the comparability of the results, a functionally identical multimedia user interface based on the XBMC platform was used. Two interaction study setups were tested - one based on a remote-control-based interactive multimedia system and the other on a touch-screen-based system. The latter is a logical continuation of the former one, but including also the users suggestions of using additional gestures ("Additional second screen device with tactile screen interface and touch gestures."). Since the touchscreen interaction modality study was conducted afterwards, the test user groups were not identical. They were, however identical in their size (14 participants), very similar in their age distribution (average age 32 vs. 30.) and in their previous technology experience. The usability evaluation method used in both cases was the SUS questionnaire, which in combination with the same XBMC multimedia platform ensured the comparability of results. To explain the meaning of obtained SUS score results we used the Bangor et al. findings [20]. According to them, the SUS result of 71.4 can be considered as "Good" and the SUS result of 85.5 as "Excellent". Therefore in terms of adjective rating, all the obtained average SUS scores for the different tested interaction modalities can be described as "Good" or in the case of the interaction using the simplified set of the touch gestures even as "Excellent". The average obtained SUS results can be thus sorted in the following succession: • interaction using only a standard TV remote controller - 75.4, • interaction using the extended set of the touch gestures - 79.5, • interaction using the simplified TV remote controller in combination with the voice commands and simple tactile gestures - 81.8, and finally, • interaction using the simplified set of the touch gestures - 86.1. The lowest but still "Good" score was obtained with the traditional TV remote-control interaction modality. This is not surprising as the majority of users are still very much accustomed to this kind of the interaction. An addition of the voice shortcuts consequently increased the overall usability score to 81.8. The voice interaction was accepted well, but the reliability of the voice-command recognition system also had a significant impact on the overall user satisfaction. From the comments received we concluded that the functionally-wise simpler (less commands) but more reliable system was better accepted than the functionally-rich but less reliable system. The users also liked the RF-based simpler remote controller (WiiMote) with less buttons and better layout (some of the participants'' thoughts were: "WiiMote is simpler to use", "RF remote is better than IR remote because there is no need to point it towards the TV set or set top box"). The users were able to freely choose the interaction modality to achieve the scenario goals. When using the enhanced WiiMote, they usually chose the buttons over the tactile gestures, however. This can be explained by a fact that since gestures are still not commonly used in iTV interfaces the users simply —forget" to use them. The scrolling-tactile gesture shortcuts were well accepted, though if rarely used. In case of the touch gestures, the trend that can be clearly seen is "Less is more". The simpler set of the touch gestures was better accepted although some functionalities were omitted (e.g. mute, zoom, media shortcuts). In general, the unicursal gestures were well accepted with occasional difficulties with the more complex gestures, such as the "G" for the "Music" media shortcut. Also, the obtained high average SUS scores for the touch gesture modality can be explained by the fact that this is a relatively new modality, especially when used in combination with the multimedia interfaces, and by the participants' enthusiasm for new technologies in general. While interacting with the remote-control-based multimedia system, the study participants also reported some of the problems they had encountered: • "Searching for the buttons on the remote control can be distractive" (traditional TV remote). • "The remote itself is unnecessarily complex and has too many buttons" (traditional TV remote). • "The RF remote is better than the IR remote." • "The traditional remote still dominates the home environment, but is limiting and lacks intuitiveness." • "The system is not reliable enough and is sensitive to the environmental sounds." Note that the user was interacting with a remote control-based-multimedia system in a room with small children in the background. • "Sometimes the voice commands are not recognized when talking in a lower voice." • "Voice shortcuts are cool. However, the overall reliability could be an issue." 7 Conclusion Results of a user evaluation study comparing two interaction modalities are presented - one based on the remote-control-based interactive multimedia system, and the other on a touch-screen-based system. The overall usability of tested modalities was evaluated by using the standard SUS method. Adding other modalities (voice, tactile gestures) improved the obtained usability score. The touch-enabled second-screen device also improved the user experience. Should the results be summarized in one sentence, the conclusion would be - "Less and simpler is more." Judging from the study results, there is still a lot of potential in multimodal interaction with the TV and other multimedia sets; particularly in combination with the second-screen devices. References [1] Watzlawick, P., Beavin, J. H., Jackson, D. D., Menschliche Kommunikation. Stuttgart, Wien, Hans Huber Verlag, 1969. [2] Schatz, R., Baillie, L., Froehlich, P., Egger, S., Grechenig, T., "What Are You Viewing?" Exploring the Pervasive Social TV Experience. Human-Computer Interaction Series, 2010, Mobile TV: Customizing Content and Experience, Part 5, 255-290. [3] Oviatt, S., Breaking the Robustness Barrier: Recent Progress on the Design of Robust Multimodal Systems. Advances in computers, vol. 56, 2002, 305-341. [4] Barthelmess, P., Oviatt, S., Multimodal Interfaces: Combining Interfaces to Accomplish a Single Task. HCI Beyond the GUI, 2008, 391-444. [5] Guna, J., Kos, A., Pogačnik, M., Evaluation of the multimodal interaction concept in virtual worlds. Electrotechnical Review, Journal of Electrical Engineering and Computer Science, Vol 77(5), 2010. [6] Sodnik, J., Dicke, C., Tomazic, S., Billinghurst, M., A user study of auditory versus visual interfaces for use while driving. International Journal of Human-Computer Studies, vol. 66(5), 2008, 318-332. [7] Kovačič A., Guna J., A Platform Enabling 2nd Screen Functionality for Mobile Applications. Proceedings of the 21st International Electrotechnical and Computer Science Conference, Portorož 2012. [8] XBMC, accessed at http://xbmc.org/ (accessed on 23.7.2014). [9] Iskratel Innbox, accessed at http://www.innbox.net/en/ (accessed on 23.7.2014). [10] Guna, J., Stojemnova, E., Geerts, D., Kos, A., Pogačnik, M., FutureTV, 3rd International Workshop on Future Television, 2012. [11] Fujita, K., Kuwano, H., Tsuzuki, T., Ono, Y., Ishihara, T., A new digital TV interface employing speech recognition. Consumer Electronics, IEEE Transactions on, 2003, 765-769. [12] Wechsung, I., Naumann, A.B., Evaluating a multimodal remote control: The interplay between user experience and usability. Quality of Multimedia Experience, International Workshop on, 2009, 19-22. [13] Aoki, R., Ihara, M., Maeda, A., Kobayashi, M., Kagami, S., Unicursal gesture interface for TV remote with touch screens. Consumer Electronics (ICCE), 2011 IEEE International Conference on, vol., no., pp.99-100, 9-12 Jan. 2011. [14] Collazos, C.A., Rusu, C., Arciniegas, J.L., Roncagliolo, S., Designing and Evaluating Interactive Television from a Usability Perspective, Advances in Computer-Human Interactions, 2009. ACHI '09. Second International Conferences on, vol., no., pp.381-385, 1-7 Feb. 2009. [15] Lee Johnny C.: Hacking the Nintendo Wii Remote, IEEE Pervasive Computing, Volume 7, Issue 3, 2008 [16] Microsoft Kinect and Speech SDK, accessed at http://www.microsoft.com/en-us/kinectforwindows/ (accessed on 23.7.2014). [17] GlovePie Software, accessed at http://glovepie.org/glovepie.php (accessed on 23.7.2014). [18] Asus ET2002T PC terminal device, accessed at http://www.asus.com/ph/AllinOne_PCs/EeeTop_PC_ET2002T/ (accessed on 23.7.2014). [19] JustGestures Software, accessed at http://justgestures.com/ (accessed on 23.7.2014). [20] Bangor, A., Kortum, P., Miller, J., Determining what individual SUS scores mean: Adding an adjective rating scale. Journal of Usability Studies (2009), Volume: 4, Issue: 3, 114-123. Jože Guna received his B.Sc., M. Sc. and Ph.D. degrees respectively, from the Faculty of Electrical Engineering, University of Ljubljana. His area of research includes Internet technologies, multimedia technologies and IPTV systems with special emphasis on user centred design, user interaction modalities and designing the user experience, including gamification and flow aspects. Currently he is involved in a number of projects focusing on the development of intuitive user interfaces for elderly users of eHealth application and interactive multimedia HBBTV applications. He is an expert in Internet, ICT and IPTV technologies and holds several industrial certificates from CISCO, Comptia and Apple, including trainer licenses from Cisco and Apple. He is an active member of the IEEE organization. Jan Šuštar graduated in 2013, from the Faculty of Electrical Engineering, University of Ljubljana. His area of research includes programing labels for telecommunication companies, designing the user interfaces and developing gestures for controlling multimedia devices with the Microsoft Kinect sensor. Currently, he is employed in the Comtrade d.o.o. He is an expert in programming in various languages, Internet, network technologies, designing user interfaces, developing gestures, data storage and agile technics. Emilija Stojmenova is a researcher at the Laboratory for Telecommunications at the Faculty of electrical engineering, University of Ljubljana, where she is involved in various research and development projects. In 2013, she received her Ph.D. in Electrical engineering for her doctoral thesis "User-centered Design for Multi-screen e- HealthApplications for Elderly People". One of the major dissertation contributions was her proposal for improving an existing standard (ISO9241-210:2010), demonstrating the practical application of the results in an industrial standard. Her research work includes mainly the fields of the user-centred design and methodologies for evaluating user experience and usability for specific groups of users, such as: the elderly people, children and people with disabilities. Before joining the Laboratory for Telecommunications, she was employed with Iskratel, Ltd., as a user experience manager, where she was responsible for the overall user experience in the company. She presides the "World Usability Day Slovenia", IEEE Women in engineering (WIE) Slovenia section and is an active member of IEEE, ACM UxPA, IxDA. Since September 2013, she has been actively involved in the Demola Network as the head of RAZ:UM, Demola Slovenia operator. In the Demola project, ideas and needs come from the project partners, companies and organizations or international Demola Network partners. Andrej Kos is an associate professor at the Faculty of Electrical Engineering, University of Ljubljana, as well as deputy head of the Laboratory for Telecommunications. He began working in the field of telecommunications in 1996. Since 1999 he has then specialized in modelling and design of high-speed networks and services. Currently, at the centre of his work are broadband networks and aspects of Internet of Things with multimedia applications. Matevž Pogačnik received his B.Sc. and Ph.D. degrees in 1997 and 2004 respectively, in the field of telecommunication and informatics from the University of Ljubljana. His main field of interest is interactive multimedia services development, more specifically the mobile applications, IPTV/smartTV applications and digital TV services. His recent research work has been mostly focused on improvement of user experience and introduction of new interaction modalities. He is also an IEEE member.