EvaSIM - The EVA Robot Simulator Software

The use of software simulators has a long tradition in robotics, allowing the rapid prototyping of behaviors thus saving time and cost. In recent years there has been an increased interest in the development of social robots, designed to interact with humans, and Socially Assistive Robots (SARs), aimed at aiding humans through social interaction. This work describes the development and evaluation of a simulator software for the EVA robot, named EvaSIM. The EVA (Embodied Voice Assistant) robot is an open-source robotics platform created to support research in human-robot interaction. It has been successfully applied in the development of therapeutic interventions for people with dementia. The EvaSIM simulator can interpret the codes of the scripts generated with the EVA visual programming language (VPL) as well as the codes generated with the EvaML
language, an XML-based language for developing interactive sessions for the EVA robot. EvaSIM is capable of simulating the robot’s multimodal interaction capabilities, emphasizing its social affordances.

1. EvaSIM

1.1 The User Interface

In order to provide a test environment for EvaML scripts, we developed a software that simulates the EVA robot, which is called EvaSIM. The simulator was developed in Python and uses some extra modules.

Figure 1: EvaSIM - User Interface

As we can see in Figure 1, the simulator for EvaML script language, EvaSIM, tries to imitate, in a simplified way, the robot system. All the elements in its user interface will be described as follows: the EVA robot figure (1), the representation of the Matrix Voice board RGB LEDs (2), the smart bulb (3), the Amoled 5.5 touch display (4), the webcam (5), buttons to import, turn on/off, and control the execution of EvaML script (6), a terminal emulator where some important information is presented, such as: the actions being performed by the robot, details of operations with variables, colors and states of the smart bulb, the text that the robot is speaking and some alert messages and possible script error messages (7), the itens (8) and (9) indicate the memory map tables. These two tables are intended to dynamically show the system and user variable values during the execution of the script. The upper table shows the system variable values that stores the responses obtained from user interaction processes, such as voice capture and facial expression recognition. This variable set also holds the values generated by the VPL random number generation command. Those system variables are indexed using the "\$" character. Since the robot memory is full of values from different sources, the upper table has, in addition to the index and content columns, an extra column that shows the source of this variable value. The second table presents the variables created by the script developer, with their names and their respective values.

1.1.2 Listening Simulation

EVA can communicate with the user through voice interaction, capturing the audio of the answers and transforming it into text (using Google cloud API). In order to facilitate this process, within the simulator, this type of multimodal interaction was represented by a text box, where the user can answer using written text instead of speech. We can see, in Figure 2, the simulation of the listening process of the robot.

Figure 2: EvaSIM - Listening Command Simulation

1.1.3 Facial Expression Recognition Simulation

In Figure 3, EvaSIM simulates the facial recognition process. This function was implemented in the EVA simulator using a window with facial expressions represented by Emojis. In the simulator, the answers that can be provided through the window representing the user's facial expressions are: "NEUTRAL", "HAPPY", "ANGRY", "SAD", "SURPRISED".
The execution of the
<userEmotion> command, which is responsible for capturing the user's expression through the webcam, opens a window with a set of facial expressions. The user, through the mouse, can indicate his/her facial expression. This answer is processed by the simulator in the same way as the physical robot. We have a short https://youtu.be/OfGelKZIA9c showing the simulation of the Imitation Game (https://link.springer.com/chapter/10.1007/978-3-030-99194-4_25) in EvaSIM.

Figure 3: EvaSIM - Listening Command Simulation

The EVA robot simulator works with only 5 types of expressions, while the physical robot’s facial recognition module can return up to 7 types of facial expressions.

1.1.4 Head Movement Simulation

EvaSIM was designed to be a lightweight tool that required low computational power to run. Therefore, no sophisticated animations were implemented in this 2D version of the simulator. The physical robot can move its head. This movement used in conjunction with other elements can increase the robot's expressiveness. When executing a script for the robot and finding a <motion> element, EvaSIM uses the terminal to indicate that a movement is being executed, also indicating the type of movement performed. Figure 4 shows an example of the message indicating the execution of the <motion> element and the value of the <type> attribute.

Figure 4: Head Movement Messages in Terminal

1.1.5 Importing and Executing an EvaML or JSON Script

Figure 5 shows the image of the simulator interface buttons. To start using EvaSIM you need to click on button "Power On" (1). EvaSIM will speak a greeting text and wait for a script to be loaded into its memory. A script can be loaded by pressing button "Import Script File..." (2), which will open the file open dialog. After importing the file, the script can be run by clicking button "Run" (3). Button "Stop" (4), if clicked, stops the script execution and button "Clear Term." (5) clears the EvaSIM terminal emulator.

Figure 5: Operating the EvaSIM

Figure 6 shows the terminal emulator after executing script "script01_EvaML.xml". It presents the emulation of a terminal where some important information is presented, such as: the actions being performed by the robot, details of operations with variables, colors and states of the smart bulb, the text that the robot is speaking and some alert messages and possible script error messages. We can see the voice selection, the smart bulb state and color being set, the texts being spoken, the capture of the username via the <listen> command, and the manipulation of the x variable.

Figure 6: EvaSIM - Terminal Emulator

1.2 How EvaSIM Works

EvaSIM uses some extra modules that can be installed easily using the Python package manager pip, which are: Ibm-watson module to access the Text-To-Speech API from IBM Watson, the Playsound module to allow audio files to be played and the Tkinter module used for creating the simulator GUI.

1.2.1 Visual Programming Language

The EVA platform includes a component for designing and deploying interactive sessions. The user can create interaction scripts using a visual programming language (VPL), defining the script flow using sequences, conditions and loops.

The VPL has several elements that can be used to create interaction scripts, some of which include: an element to define the language to be used by the STT and TTS components, an element to control the expression of the robot's gaze, voice recognition, recognition of user's facial expressions using the webcam, control of light sensory effects using the smart bulb. Figure 1.2.1 shows a sample interaction that was constructed using the VPL.

Figure 7: An example of a VPL small script

The flow of execution of a script in the VPL occurs from top to bottom and in the case of conditional elements, from left to right. A script in VPL is represented by a graph where the nodes are the language elements (robot capabilities) and the edges indicate the sequence of the execution flow. Through the use of conditional elements it is possible to change the course of script execution depending on a condition. As can be seen in Figure 7, the script represented in the image has five distinct elements, the Voice element, which is used to configure the language and voice timbre used in TTS and STT. There is the Random element, which generates a random natural number within a range specified in its parameters. Next, it shows two Condition elements that evaluate the number randomly generated by the previous command, depending on the result, the script execution flow can follow the left or right path. The element that follows the left path is Light, which controls the smart bulb, its state (on or off) and its color. The element that follows the right path is Talk, which makes EVA speak the text specified as a parameter. Each VPL element has a set of parameters that are configured at the moment of its insertion in the script.

Table 1: VPL Commands Simulated by EvaSIM

Table 1 shows the VPL commands that can be simulated by EvaSIM and their parameters. The following is a brief description of each of them. The Voice command selects the voice tone and language that will be used in the script. The Random command generates random numbers in a user-defined range. The Wait command causes the script to pause for a specified amount of time. The Talk command causes text to be turned into audio and then spoken by the simulator. The Light command controls the smart bulb. The evaEmotion command determines the expression in the robot's face. The Audio command can play sounds and music specified in its source attribute. The Led command controls the simulation of the LEDs on the board that is on the robot's chest. The Counter command allows creating variables and performing mathematical operations on them. The Condition command evaluates the result of a logical expression and can change the script execution flow. The Listen command makes the robot capture the user's speech converting it to text and, finally, the userEmotion command that uses the webcam to identify the user's facial expression. These last two commands, Listen and userEmotion, are simulated in EvaSIM as follows. For Listen, a window with a text box opens and the user enters the information in text format, for the userEmotion command, it opens a window with five facial expressions in the form of an emoji allowing the user to choose the desired facial expression. The window with options for the userEmotion command can be seen in Figure 3 (9).

1.2.2 EvaSIM - General Architecture

Figure 8 shows the general architecture of EvaSIM divided into three main modules. The Parser module receives as input a code from the VPL editor (in JSON format) doing the parsing process of the robot's script. This process outputs a new representation of the VPL code structure, now transformed into an XML file. The XML file is passed to the Execution module, which is responsible for traversing the XML tree structure, identifying the nodes that represent the language commands and emitting a series of signals of various types that are sent to the next module. The last module of the simulator architecture is the Graphical User Interface (GUI) module that controls the GUI elements of the simulator.

Figure 8: EvaSIM general architecture

To understand the entire process of interpreting a VPL script and the simulation process as a whole, it is necessary to first understand how the coding of a script developed with the VPL is constructed. As shown in Figure 7, the graphic representation of a script in the VPL is represented as a graph. The graph nodes represent the robot's capabilities (language elements) and edges (which from now on will be called links) indicate the script execution flow. This encoding is done through a file in JSON format. Figure 9 shows code snippets that represent: (a) the structure of an empty script, (b) an example of a Condition node, and (c) an example of a link with its from and to attributes.

Figure 9: (a) The structure of an empty VPL script. (b) A sample of a Condition type node. (c) An example of representing an edge (link)

Figure 10 illustrates the VPL code processing and simulation using EvaSIM in detailed. As previously discussed, and illustrated in Figure 8, there are two main steps involved in running a VPL script before sending commands to the graphical user interface. The first step parses a VPL JSON script, called JSON Data Mapping, and the second step runs the script, called EvaSIM XML Code Processing, which are discussed as follows.

Figure 10: VPL Code Processing and Simulation with EvaSIM

JSON Data Mapping - To start the simulation process, you need to run the EvaSIM using Python 3. The application starts and to control it you must use the control buttons that are located in the top center of the EvaSIM GUI. When clicking on the Power On button, the robot's display lights up and the simulator speaks a presentation text showing its status in the terminal emulator. At this point the robot's memory module is initialized. The next step is to import a VPL script in JSON format. For this you must use the Import Script File button, at which time a file opening dialog window will open and the JSON file can be selected.

The robot's memory module in EvaSIM is responsible for storing the variables created by the user, storing the key and value pair in a dictionary, and is also responsible for managing the robot's special memory that stores the responses from the Listen and userEmotion interaction commands. These values are stored in this memory in the order in which they are generated, and can be accessed through an index that is used together with the "$" character. For example, if during the execution of a script, three facial recognition commands are executed, the first response can be accessed using "$1", the second one can be accessed using "$2" and the third response using "$3". To access the last information obtained, just use "$". The "$" character always refers to the last information obtained, either by capturing audio (Listen command), by facial expression recognition (userEmotion command) or by the command to generate random numbers (Random command). It is also possible to access the penultimate answer using the notation "$-1" and so on.

After loading the JSON file, a data mapping process begins. As shown in Figure 9(a), an encoded VPL script is composed of three attributes, the _id attribute that is a code that uniquely identifies the script in the robot’s database, the attribute name and an attribute called data that contains two arrays, an array of node objects and another array of link objects. As previously mentioned, a node represents a command of the EVA language, and as can be seen in Figure 9(b), a node has a set of key and value pairs that represent, at the same time, the graphic element of the VPL, such as the color and name attributes, and information that relates to the language command being represented, such as the type, key and text attributes. Each VPL element has its specific set of attributes. The Voice element has an attribute that allows the user to define the voice timbre and the language that the application will recognize in the TTS and STT processes. The Light element, for example, has an attribute that determines its color and another that determines whether the bulb should be turned on or off. The attribute selection process reads the list of nodes from the JSON file, selecting for each node type the attributes that are relevant for its execution in the simulator. Each node has a type which is indicated by the type attribute. Some robot code attributes come in the form of a composite string, that is, there is more than one attribute within the string. These composite attributes are passed to the part of the code that uses regular expressions to decompose the string into individual attributes. As an example, the attribute text of the Condition command, represents the boolean expression "$ == 2" containing two operands and a relational operator. The expression, after being sent to the decomposition function, returns three attributes, an attribute called var, which contains the "$", another attribute called value, which contains number "2", and another attribute called op which contains the relational equality operator.

The output of the JSON Data Mapping step is an XML file used as input for the following step, which effectively runs the EVA script, as illustrated in Figure 10.

EvaSIM XML Code Processing - This processing step takes as input an XML file resulting from the previous step. The XML code formatted for EvaSIM can be seen in Listing 1. The <script> section of the XML code contains as child elements the nodes that represent the robot language commands. The <links> section contains the links that determine the script execution flow. As can be seen in Listing 1, when compared to Figure 9, the mapping and conversion of the JSON file to XML brought more readability to the code since the graphic properties of the VPL elements were discarded and the composite elements, in the case of the logical expression of the Condition element, were decomposed into three XML attributes: var="$", op="eq", and value="2". Relational operators "==", ">", "<", ">=", "<=" and "!=" correspond to "eq", "gt", "lt", "gte", "lte", "ne" op attribute values respectively.

<?xml version='1.0' encoding='utf8'?>
<evaml  name="script01">
	<script>
		<voice  key="1646387240212" tone="en-US_AllisonV3Voice"/>
		<random key="1646387248220" min="1" max="2"/>
		<case   key="1646387260007" op="eq" value="1" var="$"/>
		<case   key="1646387268473" op="eq" value="2" var="$"/>
		<light  key="1646387289446" state="on" color="#00b0ff"/>
		<talk   key="1646387340575">Hello!</talk>
	</script>
	<links>
		<link from="1646387240212" to="1646387248220"/>
		<link from="1646387248220" to="1646387260007"/>
		<link from="1646387248220" to="1646387268473"/>
		<link from="1646387260007" to="1646387289446"/>
		<link from="1646387268473" to="1646387340575"/>
	</links>
</evaml>

The EvaSIM XML Code processing step searches and selects one or more links in the list of links, and inserts them in the Link Queue. The Link Queue is then processed recursively until it is empty. Queue processing begins with the removal of the first link. The value of the key contained in the from attribute of the link is passed to the function that performs a lookup on the list of nodes (VPL commands) in the <script> section of the XML code. The function searches for a node by its key attribute. After the node is found, it is sent to the function responsible for processing the nodes. The node type (language element) is obtained through the name of the corresponding XML element. The command is then executed and can communicate with the elements of the EvaSIM graphical interface sending some types of information, such as: animation commands, which generate some kind of graphical animation on the robot figure and its components, status messages and error messages, which are information sent to the EvaSIM terminal emulator, and variable values, which are information related to the robot memory module variables and which are sent to the memory map tables in the EvaSIM window. The script execution ends when the value of the key contained in the to attribute of a link, after the link search process, does not match any value of the key of from attribute of a link in the list of links. This indicates that the element referenced by the key in the to attribute of the link being processed corresponds to a leaf node of the XML tree. The key contained in the to attribute is passed to the node search function. Then, the node is found and processed, terminating the script.