Saluto - Process Overview

Part 1: Visualizing Conversational AI


Date: 2018

Instructor: Kyuha Shim, Daphne Peters

Collaborator: Ema Karavdic, Matt Prindible, Khushi Shah

TYPE: Conversational AI, UX, Interaction Design - project at CMU



“The secret of type is that it speaks.” — Paul Claudel


A new online counseling service is looking to improve upon existing AI counseling services by offering users richer, context-specific experiences beyond a text message interface. They are looking to you to design and develop a screen-based, interactive application that responds to content and emotions being expressed in conversations between counselors and clients.


01—Monday, October 22nd


There are a range of visualizations used to support conversation. Some animations are familiar and appear in everyday life—Cortana, Siri, and Google Assistant, for example. While Siri’s visual form is responsive to voice input, Cortana displays a bit more “personality:”


Google’s visualization of its smart assistant, on the other hand, is much more expressive and has a few specific states:

The conversation we’re supporting, however, is not todo lists or daily tasks, it’s a private and personal conversation between a client and counselor. More exploratory approaches to visualizing a conversation should be considered to understand the poetic space for voice, visuals, text, and motion.

ChildLine: First Step - by Buck

There can’t be a conversation about time and motion without considering The 12 Basic Principles of Animation — and this rebuttal from Issara Willenskomer on why they need updated.

The illusion of life

“ The 12 Animation Principles were designed to solve the problem of how do you represent physical reality in 2D space, and user experiences are not physical realities. They are their own unique medium different from anything else and they require different, unique principles that are not the 12 animation principles,” Willenskomer said.

p5.js is a easy way to start exploring the interactive generation of these visual form. Unlike rapid prototyping tools like Storyspeaker or Sayspring which only perform speech in and speech out, p5.js will allow us to output some visual response to user voice input. The provided speech library makes it trivial to spin up demos of a user manipulating DOM objects with speech.

Look, ma—no hands!

In order to situate our Conversational AI, we were provided five transcripts of interactions between a client and a counselor. Each transcript was distinct in the kind of issue being worked through as well as the approach the therapist was providing. Ultimately, we decided on Transcript 4.


In this scenario, both the client, Mary, and the clinician, Joan play a very active role—there are very clear personalities and quite a range of emotions at play on both sides of the table and we think this will be important to unpack in our visualization’s expression or animism. The structured nature of the clinician’s approach, Cognitive Behavioral Therapy (CBT), was also appealing to us because we can spend our time generating a breadth of possible outcomes against a set of well-defined constraints. Finally, what was perhaps most appealing to this scenario was that the outcomes of CBT are very much focused on the individual (as opposed to their relationship with others) and action-oriented.


02—Wednesday, October 24th


It’s not our job to evaluate the efficacy of CBT. So what we assume for this project is that this type of therapy is effective for a specific kind of person. And that our example transcript is representative of a productive conversation (even though at times it feels antagonistic). It’s our job, then, to carefully define our persona as a person that responds to this kind of therapy. Knowing what we know about CBT, we worked on defining Mary as a person, along with some of the contextual parameters that might be driving the larger situation and this conversation.

We talked specifically about the qualities of her personality and the situations in her life that make Cognitive Behavioral Therapy appropriate. We considered some of the things that could be happening in her life that are distorting her reality and self-image. And how that creates a vicious feedback loop that amplifies the original distress. And finally, what characteristics in her make this type of treatment effective? These are important to consider early because we will need to create a visual form to which Mary will respond favorably.

[Need Persona Here]

We’re going to involve a few experts to help validate the assumptions in our persona. We’ll be talking with two behavioral specialists that practice CBT. We also want to consider some of the critical moments that happen during this type of counseling: What are signs that this is an appropriate measure? What are signals of progress or success? What are some of key milestones or indicators of progress? What else happens along this journey?

As we were discussing what makes our transcript an example of a successful conversation, came continually addressed the importance of trust. It’s apparent that the history between our two participants has yielded some level of trust. Representation of conversation history and that trust matters a lot and it’s something we want to make sure we carry through to the next step of the project.

We used the remaining studio time to present some of our initial thinking for feedback. On Monday, we’ll be presenting a range of visual forms and motion for peer critique, so we were urged to begin immediately considering the role of color, type, and motion and how they can elicit specific emotions.

We pushed some of our initial thinking (which consisted primarily of the actors involved) into some larger systemic considerations, specifically the role that access to text messages, phone calls, email, photos, calendar, Spotify, etc. might play in building this relationship. Access to some of these sources could create really interesting moments where the questions or provocations being generated by the AI are using them in meaningful ways.

Modeling in visual form the status of each participant was another key consideration. “Is the AI on the same page as the user?” Since CBT is the process of replacing one type of thinking (the client’s current cognitive distortion) with a more realistic type of thinking (the AI’s), it’ll be important the see each other’s status, progress, and intention—it’s a careful dance.

We then talked about the role voice will play in our conversation and visual form. We asked our TAs to do a quick table read of our conversation (that recording will not appear here, we promise) so we could get a dispassionate view of the conversation unfolding. We were looking for specific arcs of emotion to help consider the role of the screen when voice is the primary user input. What visuals can we use when asking questions (either generated forms or data we’ve been provided access)? What sort of visual representation could draw more deep thoughts and engagement from the user? We’re starting to consider the micro-interactions that support this conversation—and the ways to ask ‘why’ and ‘what’s the case?’ for each of them.

It led to our next conversation about the relationships between inputs and outputs. Too often we think about the part-to-part relationships—if this, then that. Less often do we consider the integrated whole. What is the integrated, holistic view of the entire relational model? Think more spectrum-to-spectrum or range-to-range. It might be advantageous to constrain the total number of parts and spend more time on the holistic qualities—it could be a more power statement than a laundry list of cause and effect.

We speculated a bit on the role of facial expressions and tone of voice and considered that instead of a more literal use of either of these data points, to use a sentiment analysis over the course of the user journey to find a range of emotion. Does our user have an emotional range of -1 to 1 or is it more like -394 to 750? Of course, recurring nouns, verbs, and adjectives could be interesting over the a long period of time.

Looking toward our meeting on Saturday, we’ll be working to situate our transcript into two distinct storyboards. Our conversation seems like a formal engagement, so where might she be? What time is it? Was this scheduled ahead of time like an appointment? Was it in the moment? What’s Mary’s current state of mind? Is she wearing headphones? What’s the level of ambient noise? What are the opportunities for deep focus, reflection, and care? What other things might be vying for her attention?

What kind of timeframe is this occurring? CBT is known for relatively short engagements (usually 10 to 20 sessions)—so where are we in that journey? Maybe this take 100 hours of work in the real world, but in AI land can this be done quicker—shorter, more intense bursts, maybe? The temporal area of the interaction could be very provocative to explore. Especially when there are fewer barriers to access in an AI interaction.

03—Saturday, October 27th

The meaningful moments that drive a conversation and support a shared understanding between clinician and client should have equally meaningful visual representation. We opened our work session by sketching out what some of those moments (at the highest level) might be. We know that each of our visual explorations will need to cover a range of concrete and abstract representation because, as our conversation unfolds over time, it might be important to move back and forth between this two kinds of representation. For example, as Mary’s relationship with Joan grows, does Joan’s representation change from something abstract to something more concrete? Time plays an interesting role at both micro and macro scales.

We also needed to consider some of the visual representation “in session” and “out of session.” What might this interface look like during a period of intense engagement and what might this interface look like day-to-day. There is a lot of “homework” that is part of CBT—how will we support this?

We revisited the parts of our transcript that are most important for our story: we want to focus on demonstrating a holistic view of our conversational visualizations and want to show a spectrum-to-spectrum example of visuals (rather than point to point). In order to do this, we wanted to select a part of the transcript that features more unclear, abstract emotions and, in contrast, a part of the transcript that has some more concrete feelings and concrete visuals. Ultimately, we decided on looking deeper at the discussion of grades (“I’m not getting all straight A’s”) because to us this is an ongoing and perhaps growing problem with Mary. It’s also an interesting moment because we want to focus on times when both the AI and Mary might need to make some adjustments in their routine—in this moment, Joan might need to soften up and become more empathetic toward this growing distress. What do those moments look like in visualization?

The discussion about the dishes piling up is a good concrete example because there is clear, literal imagery that might be interesting to play with in combination with the abstract visualization of our AI. Also, there is a nice moment of self-realization (“No, but it’s a downward spiral”) in the conversation that we’d like to see visualized — a kind of “Eureka!” moment with strong mutual understanding.

We want to build out our two storyboards so that we can identify the key moments that we want to focus on in our visuals and micro-interactions. We’ll use the framework we created at the beginning of this work session to drive those explorations. We used our discussion from the previous session to drive the distinction between our two storyboard: in one situation, Mary has planned “an appointment” with Joan—much the same way she might schedule an appointment with a human therapist. We chose this to consider some of the environmental factors that Mary can control: time, place, level of comfort, number of distractions, etc. All of this leads to important considerations about headspace and state of mind.

In the other scenario, Mary is using the AI in semi-private space: a meditation room at her university. This engagement is much more in-the-moment and will provide contrasting contextual information that could inform our visual form in other interesting ways.

We’re seeking to identify specific moments in each of these storyboards where rich visualizations can provide enough bandwidth to support complex emotions and interactions—that, at a glance, Mary can infer the same amount of information she might infer from the body language of another human being.


04—Monday, October 29th

Peer critique on storyboards and form ideas!

At the beginning of today’s class, we reviewed previous IxD work, their topic is Cigarette Vending Machine — Using AI to create conversation to prevent people from smoking. Although they only had 2 weeks for finishing the project, their ideas are all amazing and inspiring and gave us some sense of how to use visual to facilitate conversation.

The focus of today’s session was for peer critique on form ideas and story boards. Each of should have 2 form ideas, and we worked on the storyboards together over the weekend.

Oct. 29th, 2018 Progress for Peer Critique

Storyboards: We developed 2 versions of storyboards. 1st one we had the scenario happen at home, a weekly regularly scheduled session, each session takes about one hour. The 2nd one happens at university mediation room. It’s a quick session. The scenario happens when Mary feels frustrated and needs to talk with her personal counselor, which is Joan.

Storyboard 1

Storyboard 1 (continued)


Storyboard 2

Framework-input type: We went through the transcript again and sub-categorized different types of input. Some of the input is about concrete facts, such as describing activities, assigning homework, etc. Some of the input were emotion, such as when Mary talks ‘I feel absolutely flat about it…’, ‘It feels hopeless’.


Transcript categorized

Framework-abstract emotions: We then looked at different emotions. Most of the emotions Mary expresses are negative, such as feeling overwhelmed, hopeless, frustrated. However, there’s also moments that Mary feels positively about changing, such as ‘All of it. It’s a small apartment’.


Our goals: using AI to represent and respond the two types of input — concrete action input and abstract emotion input, as well as the transition in between to make the whole conversion consistent and coherent.