Create human-centered AI with the Human-AI eXperience (HAX) Toolkit webinar
hi my name is lima murshi i'm very excited to be here today together with my colleague mihayla vorvaranu to present this webinar on how to create human-centered ai with the human ai experiences toolkit this is a collaboration between microsoft research and ether which is microsoft's advisor committee on ai ethics and effects in engineering and research so before we begin a little bit about me i'm a senior principal research manager at microsoft research where i manage the human ai experiences group i also co-co-chair the ether working group on human ai interaction and collaboration and i'll be sticking around after the presentation so i hope you stay for the live q a so in this webinar we're going to talk about human-centered ai what is it and why do we need it and then we'll introduce the hacks toolkit which contains a suite of tools we've been developing to help ai creators and teams build ai technologies in a human-centered way okay so first what do we mean by human-centered ai why do we need it so let's start with some context here you see a map of the 2020 data and ai landscape published every year by first mark venture capital and this just goes to show the breadth and the reach of ai in our everyday applications and services and how it's fundamentally shaping and reshaping our everyday technologies now while we know ai and machine learning technologies are very powerful and have the potential to enhance our capabilities and improve our lives as ai extends into every aspect of our lives we're also starting to see more and more evidence of its potential to cause harm and for this i'd say you know pick your headlines right on the left here we see computer vision technology used to help blind and low vision people sense who and where people are in their environment whereas on the right we see the same technology used by law enforcement officers in new york who use an undisclosed facial recognition system to find and harass a black lives matter activist here named derrick ingram that you see here and so it's because of this potential for harm that society is really starting to demand change and accountability particularly for when ai is used in critical and sometimes life-changing scenarios so the industry is really being called upon right now to reflect and rethink how we build our ai technologies to ensure that we do so in a responsible way now there are many challenges to creating ai responsibly this includes cultural challenges around shifting people's mindsets and embracing diversity throughout the process organizational challenges including aligning business and responsible ai objectives and making room for this type of work but today i'll be focusing on the technical challenges meaning challenges around how we build our ai technologies in a responsible way now when i say technical i won't be talking about algorithmic and challenges or automated solutions for building ai and that's not because it's not necessary it's because we're actually quite a long way from purely technical solutions that can automate all aspects of developing ai technologies and in fact we may never get here because a lot of responsible ai requires human judgment and decision making so this is what i'll be talking about today i'll be talking about how we can support our the people who are building our ai technologies so they can do so in a responsible way this is where human centeredness comes in okay that is building ai responsibly really requires that we adopt human-centered practices okay so what's that so people use this term in many different ways i like to relate it back to its origins in human-centered design which says that it's about ensuring that what we build benefits people and society and that how we build it begins and ends with people in mind okay now you may be thinking you know the sounds nice and you know it's a very warm and fuzzy concept but how do we actually do this in practice okay now there's some general best practices for building ai in a human-centered way this includes first thinking about the user early and using that understanding about the users to drive all other technical decisions of the system so what that means is that human centeredness prescribes doing all that upfront understanding and research about your users and the variety of contexts in which they may be using your system and you use that to drive all other decisions about building your ai based technology so for example if your ai is supposed to work on a broad range of people your data that you should collect should be representative of all of those people similarly if your ai technology is going to need to provide an explanation to a user so that they can make appropriate decisions it's best to choose some sort of interpretable model or choose a technology that can generate an explanation now another best practice in human centered ai is that if your ai is intended to work for a broad range of people and scenarios it's important to involve diverse perspectives throughout development so that includes talking to diverse sets of potential users but it also can involve diverse members of your team team being involved throughout the development process when such as when making critical decisions about the system's functionality and capabilities similarly human centered ai suggests planning for failures so that people can recover when things inevitably go wrong okay so these are just some of the best practices for human-centered ai but it's easier said than done so in this work we created a toolkit to help the hex toolkit is a set of tools that can help operationalize human-centered practices for creating responsible ai so in the rest of this talk we'll be introducing you to some of the tools in the toolkit that you can start using right away so currently the toolkit includes these tools that help with different parts of the ai product development process and more will be coming soon and all of these are available at the link that you see on the screen the first tool is the guidelines for human ai interaction which prescribe best practices for how ai systems should behave during human ai interaction the second is a hacks workbook which is a tool to guide teams through planning and implementing human ai interaction best practices the hacks design patterns are a set of flexible and reusable solutions to recurring human ai interaction problems and finally the hex playbook is a tool for generating scenarios to test based on likely human ai interaction failures okay so let's dive in so first the guidelines for human ai interaction so the guidelines were a collaboration with people across microsoft um including research and product to synthesize and validate best practices for human ai interaction and now we created the guidelines because not only is ai extending into our everyday lives and our technologies it's also fundamentally changing how we interact with those technologies for example it's introducing new methods of interaction and new sensing capabilities so along with changing how we interact with our systems that's also creating new challenges for the people who have to interact with them and we see this in everything from humorous ai failures like when our conversational agents misunderstand us like you see in this example here to really dangerous situations like when ai is being used in high-stakes scenarios right so over on the right we see an image of a semi-autonomous vehicle where what happened here was the vehicle missed a fire truck stopped on the road and the driver was not able to intervene in time okay so we created the guidelines to help ai design and developers create safe and effective human interaction with ai now i won't get into details of how we created the guidelines other than we did four rounds of synthesis iteration and evaluation of the guidelines with over 60 user experience professionals so you can learn more about this in the paper but the point here is just to say that we didn't just make these up we endeavored to take a very systematic and rigorous approach to develop the guidelines so we ourselves could feel confident in their effectiveness now uh before i introduce the guidelines just some disclaimers to follow like as you're using them the guidelines shouldn't be thought of as a checklist and not all of the guidelines will apply to every human ai interaction scenario additionally additional guidelines may be necessary in some scenarios and you're really using the guidelines in the right way if you consider them during the development process so here are the guidelines they're broken up into four categories roughly based on when they would apply as a user interacts with an ai system now these are rough categories not hard assignments that just make it easier to remember so the first category is how an ai system should behave initially and this is really all about setting the right expectations for our users which is really important in ai because of the way that ais are portrayed in the media and because of their complexity that really sets up sometimes unrealistic expectations for people when they use ai systems which can then lead to problems the second set of guidelines is about how to design every regular interaction with the user and this is really about considering the context in which a user will be interacting with your system including their environmental as well as social context the third category is all about what to do when the ai is inevitably wrong and i can't stress this enough your ai will fail and so it's very important to design interactions so that people can understand and intervene when that happens and finally this the last set of guidelines the last category here is about how the ai should behave over time and this is really important in ai because um one of the key benefits of ai and machine learning technologies is their ability to personalize and adapt over time but that has to also be done carefully so that humans can effectively interact with them now i'm not going to get into details of of the individual guidelines but you'll be hearing about some of them throughout the webinar but um they are being used by teams across the company uh and externally and what some folks have said are that these guidelines are all about creating trustworthy and assistive systems because what really matters here is the model for human in a human interaction with a system that has been given really tremendous authority over human life in many cases similarly another team that used the guidelines said you know it took us four years to come up with 80 of what's in the guidelines and if you institutionalize this into the design parts of products you really give people an opportunity to build a much better product in gen 1. right and that's really the point right to help teams create effective human ai interaction from the start now while teams are using the guidelines um and people have been eager to learn about them we've actually conducted many workshops and talks and engagements with teams to help them use the guidelines in their everyday development processes and in that in in working with teams we've also learned about challenges that they're facing in how to apply these um in their everyday uh work lives so for example um some of the challenges people have said are that the guidelines are based a lot upon engineering more than design and some of the guidelines would actually require a full scale overhaul of the back end right so that means that implementing the guidelines can't be done at the ui layer or at the end it has to be thought up up front another challenge that we've heard is that if the spec doesn't have the guidelines or the interactions built into the from the beginning it's going to be too rigid to respond meaning if you miss a boat in the beginning you just have to sort of yell from the shore and ask them to like change things um and where they're steering things which is obviously difficult to do so to address some of these challenges we created other tools in the hacks toolkit and so for this i'll hand it over to my colleague mihayla vorvaranu to introduce some of our other tools thanks salima i'm mikhaila vorovorano i'm director of ux research and responsible ai education for ether and i'm super excited to walk you through the next two tools in the hacks toolkit so we'll start with the hacks workbook which is a tool to guide teams through planning and implementing best practices for human ai interaction in this case our guidelines for human eye interaction before we launch into the next set of tools i would like to give credit to the team these tools the workbook the design patterns the playbook were led by these research interns you see on the screen and you see the mentors at the bottom of course the entire set of projects was led by salima okay so let's take a quick look at the hacks workbook this hacks workbook just like everything we do is grounded in observed needs and in interviews and it was actually co-developed iteratively with more than 40 practitioners or teams across the company it uh supports two goals for planning and implementing human ai interaction best practices first it helps teams define the right breakdown and sequence of steps that are needed to plan the ux early and it helps them accurately estimate resources needed to implement the guidelines also the workbook provides the right level of guidance and prompts so that teams can anticipate the impact of each guideline on the user and the user experience and thus helps them prioritize the hacks workbook is available in an excel spreadsheet that anyone can download and use and it consists of four main steps followed by a fifth step for taking notes and tracking i will be walking you through each one of the steps first on the left side columns we see each guideline along with a set of examples for how a team would answer for each of the steps for a fictional example for that guideline so let's see what step 1 entails in step 1 the team would first go through each of the 18 guidelines and select which ones are relevant as salima explained not all guidelines are applicable to all products and so for step one for each guideline teams would answer yes no maybe or already done of course they would do this for any feature or product they have in mind that is ai powered and users interact with once they make that decision for their feature or product for each guideline they can filter and only see the relevant guidelines before moving to step two in step two this is where teams try to imagine or even better conduct some user research to estimate the impact of relevant guidelines on users and so the workbook guides them with two questions how might it affect a user if you apply this guideline how might it affect the user if you choose to not apply this guideline and uh teams can type in their answers below and then taking those um into consideration um estimate the impact on the user using high medium or low then they can also sort or filter to show only high impact guidelines in an effort to help them prioritize work and move on to step three so for the guidelines that they have chosen as relevant in step one and as high impact in step two they start brainstorming in step three requirements for implementing this guideline this is really really important to have the entire team together because implementing a guideline requires collaboration and has implications for data science for program management for user experience design and research and so it's important to brainstorm all of these requirements in step three and then using t-shirt sizing estimate the resource commitment for implementing each guideline now in step four we take into consideration the impact on the user and the resource commitment and then we determine what the priority is for each of the guidelines that we've selected in the previous steps of course we can select using from p0 to p3 and then in step 5 we can track work or we could just move work items into any sort of program and process management software that a team might be happening to use so next i will be walking you through these steps using the example of a voice assistant that can call people or can remind you to call people and this is we're going to work through this example using guideline 10 scope services when in doubt so let's see how we would be answering at each step for step one is this guideline relevant we would answer yes if the assistant is unsure whom to call then asking for clarification is less costly to a user than calling the wrong person right and so we would want to um to pay attention to this guideline then in step two how might applying this guideline or not impact our users well a person may become frustrated or embarrassed if the assistant calls someone that they didn't mean to in step three we brainstorm how this guideline might be implemented and so here's what might be required first the ai model has to be able to compute its own uncertainty then when it is uncertain it has to act somehow in this case we can suggest uh it could disambiguate the user's request by asking them to clarify whom they uh intend to call and then in step 4 we make decisions about this guidelines prioritization in this case we say that the user benefits outweigh the resource costs but without the context of the other guidelines we're not exactly able in this example to put a p0123 on this particular guideline and so we encourage teams to use this workbook early on when they have an idea for a new user-facing ai feature and they're starting to define requirements or when they have an existing user-facing ai feature or prototype that they want to evolve or improve we also want to clarify and insist that people don't do this alone this is a team exercise and it's really important to involve people and collaborators from different disciplines because implementing the guidelines can impact a system's not only ui but also data and ai models we also encourage teams who use this workbook to engage with users and stakeholders through user research when using this workflow to better understand their needs their priorities and how they might be affected by applying or not applying some guidelines we have piloted this workbook with a number of teams at microsoft and here is what they had to say this is a short selection of quotes so a pm pointed out that this is helpful also as a team alignment tool and trying to speak the same language as the other side as the intelligence side that they work with having a framework to do that around is very helpful we see a lot of research in um about ai practitioners that points to this difficulty of communicating across disciplines and we are happy to see that the hacks workbook can function as such a boundary object that can help people communicate information and collaborate across disciplines an engineer said that they found a lot of points they hadn't considered before and so this was helpful okay this is something that we've observed again and again when interacting with um teams that piloted the workbook that it helped them think about how the feature should behave and how it would evolve in in ways that maybe wouldn't have occurred to them otherwise another pm said that this is helpful for framing things to bosses and to other disciplines as they make arguments for resources and a designer appreciating um finding some sort of research evidence in the guidelines to support their design decisions well and so having talked about the hacks workbook which again we encourage teams to download and use from the hacks toolkit website now we're going to move on to the next tool in the toolkit which is the hacks design patterns so first of all let's talk a little bit about design patterns what are they and why are they useful design patterns have been used for a long time starting with architecture then in programming in design in art as well they are useful because they capture flexible solutions to recurring problems and so when we have a type of problem we can match it with a type of solution and by using these established solutions we can save time and we can help create consistently high quality user experiences now i also want you to think about the step in the workbook step 4 which asked how might you implement a certain guideline at that particular step it would be useful to bring in the design patterns and as you've decided to implement let's say guideline 10 we can browse through the design patterns and see ideas for how we might implement that guideline so we have synthesized and validated 33 design patterns for a selection of 8 out of the 18 guidelines for human ai interaction we have selected guidelines that we thought were particularly unique and possibly difficult to implement in the context of human ai interaction we synthesize these patterns using a collection of examples for the guidelines that have been submitted through a previous user research project so you can see here the list of patterns and i'm going to illustrate a pattern again for guideline 10 with the same scenario of a voice-based virtual assistant that can call people in this case we have synthesized three patterns that implement that can be used to implement guideline 10 a disambiguate before acting b avoid cold starts by eliciting user preferences and c fall back to other strategies so in this case i pointed out that disambiguating before acting might be a more appropriate pattern in for this scenario let's take a look at what this pattern looks like so this pattern just like all the other patterns follows a systematic structure with the same headings it has a problem statement a solution statement when to use how to use and it further proceeds not shown on the screen into common pitfalls to avoid when implementing this pattern user benefits related patterns as well as a list of references that are useful in implementing this pattern or that have informed the writing of the pattern each pattern as well has at least two examples in this case i'm going to show you one of the examples and this is a predictive keyboard that disambiguates before acting by asking the user which one of the three likely words below the user was intending to type so in this case disambiguate before acting for our voice assistant um is really something like who do you mean to call do you mean to call satya n or satya k so with that i've provided you an overview of the hacks workbook and the design patterns and i'm going to hand it over back to salima to talk to you about the hacks playbook thanks mahala so the final tool i'll introduce today is the hacks playbook which is a tool for generating scenarios to test based on likely human ai interaction failures now we created this tool because while we often do offline tests of model performance we typically do less testing with humans in the loop prior to deployment in fact through interviews with practitioners spanning dozens of teams working on ai-based systems we found that in many cases human ai interaction testing was hardly happening at all when it was happening it was often focused on testing the ideal or mvp scenarios or sometimes called the hero scenarios and if error cases were tested at all it was often done in an ad hoc way like using a trial and error type of approach one person said you know there's not much of a discipline around going through and evaluating or auditing the error states or how you're going to handle those errors one of the main reasons now people gave for not testing ai experiences was really the difficulty and anticipating the types and the range of ai failures that could potentially occur and that's because ai is probabilistic it's designed to generalize to new scenarios and can learn over time all of which can make it difficult to know when it could fail and so one person said you know it's so hard to know until the bad things happen which is often obviously not when we want to notice a failure so while it's true that it's hard to know when an ai may have any particular failure there are actually some common and foreseeable types of failures that we can plan for and test in advance so this is what the hacks playbook is for it's a tool intended to help teams proactively and systematically explore various human ai experience failures that may occur for their particular ai product or feature now the tool currently supports nlp scenarios such as search information retrieval dialog systems but it's open sourced and it's extensible to other ai scenarios okay so i'll switch over to the tool so here is what the hacks playbook looks like the tool is based on a taxonomy of foreseeable failures in nlp that we developed together with researchers and practitioners in natural language processing while also examining common nlp products and tasks okay so the way that you use the playbook is by describing your ai feature by answering a series of questions on the left okay so going back to the example we've been using throughout this webinar of voice based personal assistants that you can ask to call people you can as you can start answering the questions based on that scenario so um what is which of these systems is closest to what you're designing that would be a conversational ai system um it's primary input modality you know could be text or speech so we would say speech and you know will your conversational ai system have a clear way of knowing when it should trigger in this case maybe you know we'd want to be able to use some sort of wake word so i'd say no right as opposed to using a button or some other clear signal now if you've noticed as i've been answering questions the playbook has been generating a set of test scenarios that should be tested beyond the the hero or mvp scenario on the right right so for example um because your system will be using speech you'll have um you off you will typically have different types of input errors on like transcription errors or noisy channels you might have errors in when the system triggers here for example you know the system might be uncertain about when to uh start up it may have missed a trigger or it might trigger when it thought it hurt a trigger but it was actually a false positive so the the playbook lists these types of failure cases along with actionable and contextually relevant guidance about how to simulate those errors during testing and now if you use the tool early enough in your development processes you can actually explore different choices to help you better design your user experience so for example if these trigger errors are something you want to avoid you could explore different types of errors you might have to deal with by changing your response to that question now after going through this tool and generating the set of test scenarios you can export the scenarios to your different project management tools and you can use it so your team can track the type of work that needs to get done so practitioners who co-develop this tool with us said things like this will really standardize the error case design the playbook forces a consistent bar across different teams and in different organizations because as i said earlier teams are really exploring the aerospace in a very ad hoc ways because of the wide range of errors that can happen making it difficult to foresee also we also another use of the playbook is to help interdisciplinary teams communicate with each other one designer that we worked with said you know we sometimes feel like we're speaking different languages and the playbook puts everybody on the same level by using term terminology and information about tests that the whole team can talk about together okay so to summarize in this webinar we introduce the hacks toolkit which includes a set of tools for creating responsible and human-centered ai you can start using these tools now at the link below and more tools that we're developing will be coming there soon and just for some takeaways i just want to remind you that you know responsible ai really requires human-centered practices and the hacks toolkit can help you build your ai technologies in a human-centered way so we encouraged you to use them and give us feedback and to work with us to help people and teams create responsible and human centered ai technologies so with that i'll end and thank you for your time and please also stick around after this for a live q a hi everyone thanks for attending our webinar on creating human centered ai with the hacks toolkit i'm selena merci i'm here with mahela borvaranu who co-led this work and over the next 15 minutes we're going to answer some of the top questions that have been submitted by the audience so let's go ahead and get started so our first question is uh are there any overarching ethical principles that are integral to the hacks toolkit uh that is a great question and yes there are microsoft has responsible ai principles that drive all of the work that we do across the company these include fairness reliability and safety privacy and security inclusiveness transparency and accountability and so you can think of the hacks toolkit as being tools to help product teams build ai systems in a way that can achieve these principles that can allow the end users who are ultimately going to be using or impacted by an ai system to achieve these principles so thanks for so much for the question um our next question i'll hand it over to mickey for the next one thank you salima so the next question asks whether the um training colleagues on using the hacks toolkit whether there are any key specific skills and expertise that traditional hci practitioners need for learning to use the hacks toolkit and to that question i would answer based on my own experience yes and no so for a little bit of context um believe it or not my background is in the liberal arts i do not have a computer science major and yet somehow and uh some of them some people might say miraculously i ended up as a co-author on the hacks toolkit so what does this mean for you um i had to learn a lot along the way i had to learn the fundamentals for machine learning and how ai systems work without actually getting so technical that i would actually be able to do that kind of technical work myself and so even if you look at the guidelines you see that there are implications for uh machine learning in applying the guidelines i think it is important for hci practitioners to be somewhat aware of those implications but i don't think that they necessarily need to become experts i think what's really really more important is for everyone to have the patience to collaborate and communicate with each other knowing that sometimes across disciplines though even the language that we use might be different and so the hacks toolkit helps support these cross-discipline and cross-roll discussions and collaborations but i think hci practitioners should launch into it and use it and try to learn a little bit but not be intimidated because there's no intimidating amount of learning that needs to happen in my opinion thanks so much mickey let's go to our next question which is does your toolkit also look into paying for impact as well as interactions i'm specifically interested in the return on investment and what the ultimate impact of implementing ai will be in terms of loss of employees and dramatic changes to workflow are there ethical questions or standards included or applied this is an extremely important question that we think about a lot at microsoft and we advocate for teams to think about this very carefully this is really about do you build an ai system or not at all um and this is something that we don't have a tool for currently in the in the hex toolkit this toolkit assumes that you've made a decision to build an ai system how do you build it such that people can interact with it appropriately but these questions around should you build something at all are very important very difficult to answer because there are a lot of potential ways an ai can go wrong that's something that we do have in the hex toolkit we have some ways to help teams sort of proactively think about ways that an ai can go wrong but even with ais that don't go wrong or are not misused can have unintended consequences and sometimes these are hard to anticipate because they are long-term consequences in some cases or consequences that have accumulated through use or of an ai in our everyday lives so this is an open problem that many in the industry are thinking about including ourselves if we do have guidance or tools that we create that we can we think teams can that can help teams we will share them in the hacks toolkit so thank you for the question mickey do you want to take the next one yes and in the next question i'm actually going to combine a couple of questions that we've received for the audi from the audience and um both of them ask about how to handle cultural differences in and conflicting user preferences so for example if the product is being used in very different cultural contexts and some of these contexts might be varying also and might include people from different cultural or demographic minorities and so i think that's on a great question an amazing question and really i would like to direct people to guidelines five and six in our um toolkit about delivering user experiences in ways that are in line with cultural norms and also mitigating social biases and avoiding um prop promoting or existing unwanted stereotypes or biases um in practice of course this is not easy to do but um really i cannot think of a better way than user research in order to understand these cultures right the understanding of culture is traditionally the the work of anthropology and ethnography and i think user research can really help us understand um what these social norms and cultural expectations are and it's a really really great start in beginning to make decisions and mitigate these these differences in order to serve each population well and especially avoid underserving populations that might be already marginalized thanks mickey i'll take the next one we have a question saying is the hex toolkit designed or meant to be an evolving library in other words given the rapidly changing landscape of ai how will the hex team monitor and if needed adjust various parts of the toolkit i love this question because this is exactly what we want to do with the toolkit if you should see some links to the right of your screen which point to the toolkit and on the toolkit website you can find a way to contact us with your feedback we're researchers at heart we're always looking to learn and listen from the community and evolve as we go in particular the design library was meant to be an evolving and growing tool so as the community learns about how to develop and interactions with a people in ai we want people to contribute to the library innovate in new patterns and new examples and share so that others can learn and we can really advance the field so absolutely yes so i think we have time for maybe one or maybe two more questions at most um i think this is the last one that i'm going to take and maybe if there's time salima can take the very last question i would like to answer the question can you describe how aspects of the toolkit overlap with traditional user experience evaluation methods such as a cognitive walkthrough and the key differences this is an excellent question so if you think about the cognitive walkthrough for example that is more of an evaluation method as the original person who asked the question said now of course the guidelines could be used for evaluation however we find that when there is an existing ai feature or product that's being evaluated using the guidelines it might be too late to change anything you might discover issues that you would might want to fix but these could be very very costly because the guidelines have deep implications all the way down to the data that you collect for uh the ml model that powers the system and so we don't think of the toolkit as a method for evaluation we think of the toolkit as a set of methods for planning for human ai experiences really early on as soon as you have some idea for an ai feature or product that you would like to develop it's also fine to use it for evaluation also but really if you're using these tools for evaluation only you might find that it's too late and there's not much you can do to improve the human ai experience so now i'm going to turn it over to salima i don't know salima if you'd like to take one more question or just wrap up the session for us yeah thanks mickey i think i'll answer quickly answer one more question which i think is important and then we'll wrap up um the last question is i'm a new developer in ai in ai and ml is a hacked toolkit for a single user or for a team which manages or someone who manages a team and this is a great question the hacks toolkit was designed for interdisciplinary teams building responsible humane experiences really requires all disciplines to come together including user research design ai and data scientists engineers pms and so it's designed for for all members of the team so with that i want to thank again everyone again for attending today we really appreciate your participation and your interest in the subject of creating responsible and human-centered ai please visit the hex toolkit you can see links to it to the right of your screen use it contribute to it give us feedback and we look forward to learning from you so thank you and have a great day
2021-08-09 01:20