An interview with Jo Aggarwal: Building a safe chatbot for mental health

We can now converse with machines in the form of chatbots. Some of you might have used a chatbot when visiting a company’s website. They have even entered the world of healthcare. I note that the pharmaceutical company, Lupin, has rolled out Anya, India’s first chatbot for disease awareness, in this case, for Diabetes. Even for mental health, chatbots have recently been developed, such as Woebot, Wysa and Youper. It’s an interesting concept and given the unmet need around the world, these could be an additional tool that might help make a difference in someone’s life. However, there was a recent BBC article highlighting how two of the most well known chatbots (Woebot and Wysa) don’t always perform well when children use the service. I’ve performed my own real world testing of these chatbots in the past, and gotten to know the people who have created these products. So after the BBC article got published, Jo Aggarwal, CEO and co-founder of Touchkin, the company that has made Wysa, got back in touch with me to discuss trust and safety when using chatbots. It was such an insightful conversation, I offered to interview her for this blog post as I think the story of how a chatbot for mental health is developed, deployed and maintained is a complex and fascinating journey.

1. How safe is Wysa, from your perspective?
Given all the attention this topic typically receives, and its own importance to us, I think it is really important to understand first what we mean by safety. For us, Wysa being safe means having comfort around three questions. First, is it doing what it is designed to do, well enough, for the audience it’s been designed for? Second, how have users been involved in Wysa’s design and how are their interests safeguarded? And third, how do we identify and handle ‘edge cases’ where Wysa might need to serve a user - even if it’s not meant to be used as such?

Let’s start with the first question. Wysa is an interactive journal, focused on emotional wellbeing, that lets people talk about their mood, and talk through their worries or negative thoughts. It has been designed and tested for a 13+ audience, where for instance, it asks users to take parental consent as a part of its terms and conditions for users under 18. It cannot, and should not, be used for crisis support, or by users who are children - those who are less than 12 years old. This distinction is important, because it directs product design in terms of the choice of content as well as the kind of things Wysa would listen for. For its intended audience and expected use in self-help, Wysa provides an interactive experience that is far superior to current alternatives: worksheets, writing in journals, or reading educational material. We’re also gradually building an evidence base here on how well it works, through independent research.

The answer to the second question needs a bit more description of how Wysa is actually built. Here, we follow a user-centred design process that is underpinned by a strong, recognised clinical safety standard.

When we launched Wysa, it was for a 13+ audience, and we tested it with an adolescent user group as a co-design effort. For each new pathway and every model added in Wysa, we continue to test the safety against a defined risk matrix developed as a part of our clinical safety process. This is aligned to the DCB 0129 and DCB 0160 standards of clinical safety, which are recommended for use by NHS Digital.

As a result of this process, we developed some pretty stringent safety-related design and testing steps during product design:

At the time of writing a Wysa conversation or tool concept, the first script is reviewed by a clinician to identify safety issues, specifically - any times when this could be contra-indicated, or be a trigger, and alternative pathways for such conditions.

When a development version of a new Wysa conversation is produced, the clinicians review it again specifically from an adherence to clinical process and potential safety issues as per our risk matrix.

Each aspect of the risk matrix has test cases. For instance, if the risk is that using Wysa may increase the risk of self harm in a person, we run two test cases - one where a person is intending self harm but it has not been detected as such (normal statements) and one where self-harm statements detected from the past are run through the Wysa conversation, at every Wysa node or ‘question id’. This is typically done on a training set of a few thousand user statements. A team then tags the response for appropriateness. A 90% appropriateness level is considered adequate for the next step of review.

The inappropriate statements (typically less than 10%) are then reviewed for safety, where the question asked is - will this inappropriate statement increase the risk of the user indulging in harmful behavior? If there is even one such case, the Wysa conversation pathway is redesigned to prevent this and the process is repeated.

The output of this process is shared with a psychologist and any contentious issues are escalated to our Clinical Safety Officer.

Equally important for safety, of course, is the third question. How do we handle ‘out of scope’ user input, for example, if the user talks about suicidal thoughts, self-harm, or abuse? What can we do if Wysa isn’t able to catch this well enough?

To deal with this question, we did a lot of work to extend the scope of Wysa so that it does listen for self-harm and suicidal thoughts, as well as abuse in general. On recognising this kind of input, Wysa gives an empathetic response, clarifies that it is a bot and unable to deal with such serious situations, and signpost to external helplines. It’s important to note that this is not Wysa’s core purpose - and it will probably never be able to detect all crisis situations 100% - neither can Siri or Google Assistant or any other Artificial Intelligence (AI) solution. That doesn’t make these solutions unsafe, for their expected use. But even here, our clinical safety standard would mean that even if the technology fails, we need to ensure it does not cause cause harm - or in our case, increase the risk of harmful behavior. Hence, all Wysa’s statements and content modules are tested against safety cases to ensure that they do not increase risk of harmful behavior even if the AI fails.

We watch this very closely, and add content or listening models where we feel coverage is not enough, and Wysa needs to extend. This was the case specifically with the BBC article, where we will now relax our stand that we will never take personally identifiable data from users, explicitly listen (and check) for age, and if under 12 direct them out of Wysa towards specialist services.

So how safe is Wysa? It is safe within its expected use, and the design process follows a defined safety standard to minimize risk on an ongoing basis. In case more serious issues are identified, Wysa directs users to more appropriate services - and makes sure at the very least it does not increase the risk of harmful behaviour.

2. In plain English, what can Wysa do today and what can’t it do?
Wysa is a journal married to a self-help workbook, with a conversational interface. It is a more user friendly version of a worksheet - asking mostly the same questions with added models to provide different paths if, for instance, a person is anxious about exams or grieving for a dog that died.

It is an easy way to learn and practice self help techniques - to vent and observe our thoughts, practice gratitude or mindfulness, learn to accept your emotions as valid and find the positive intent in the most negative thoughts.

Wysa doesn’t always understand context - it definitely will not pass the Turing test for ‘appearing to be completely human’. That is definitely not its intended purpose, and we’re careful in telling users that they’re talking to a bot (or as they often tell us, a penguin).

Secondly, Wysa is definitely not intended for crisis support. A small percentage of people do talk to Wysa about self harm or suicidal thoughts, who are given an empathetic response and directed to helplines.

Beyond self harm, detecting sexual and physical abuse statements is a hard AI problem - there are no models globally that do this well. For instance ‘My boyfriend hurts me’ may be emotional, physical, or sexual. Also, most abuse statements that people share with Wysa tend to be in the past: ‘I was abused when I was 12’ needs a very different response from ‘I was abused and I am 12’. Our response here is currently to appreciate the courage it takes to share something like this, ask a user if they are in crisis, and if yes, say that as a bot Wysa is not suited for a crisis and offer a list of helplines.

3. Has Wysa been developed specifically for children? How have children been involved in the development of the product?
No, Wysa hasn’t been developed specifically for children.

However, as I mentioned earlier, we have co-designed with a range of users, including adolescents.

4. What exactly have you done when you’ve designed Wysa with users?
For us, the biggest risk was that someone’s data may be leaked and therefore cause them harm. To deal with this, we took the hard decision of not taking any personally identifiable data at all from users, because of which they also started trusting Wysa. This meant that we had to compromise on certain parts of the product design, but we felt it was a tradeoff well worth making.

After launch, for the first few months, Wysa was an invite-only app, where a number of these features were tested first from a safety perspective. For example, SOS detection and pathways to helplines were a part of the first release of Wysa, which our clinical team saw as a prerequisite for launch.

Since then, design continues to be led by users. For the first million conversations, Wysa stayed a beta product, as we didn’t have enough of a response base to test new pathways. There is no one ‘launch’ of Wysa - it is continuously being developed and improved based on what people talk to it. For instance, the initial version of Wysa did not handle abuse (physical or sexual) at all as it was not expected that people would talk to it about these things. When they began to, we created pathways to deal with these in consultation with experts.

An example of a co-design initiative with adolescents was a study with Safe Lab at Columbia University to understand how at-risk youth would interact with Wysa and the different nuances of language used by these youth.

4. Can a user of Wysa really trust it in a crisis? What happens when Wysa makes a mistake and doesn’t provide an appropriate response?
People should not use Wysa in a crisis - it is not intended for this purpose. We keep reinforcing this message across various channels: on the website, the app descriptions on Google Play or the iTunes App Store, even responses to user reviews or on Twitter.

However, anyone who receives information about a crisis has a responsibility to do the most that they can to signpost the user to those who can help. Most of the time, Wysa will do this appropriately - we measure how well each month, and keep working to improve this. The important thing is that Wysa should not make things worse even when it misdetects, so users should not be unsafe ie. we should not increase the risk of harmful behaviour.

One of the things we are adding based on suggestions from clinicians is a direct SOS button to helplines so users have another path when they recognise they are in crisis, so the dependency on Wysa to recognise a crisis in conversation is lower. This is being co-designed with adolescents and clinicians to ensure that it is visible, but so that the presence of such a button does not act as a trigger.

For inappropriate responses, we constantly improve and also handle cases where the if user shares that Wysa’s response was wrong, respond in a way that places the onus entirely on Wysa. If a user objects to a path Wysa is taking, saying this is not helpful or this is making me feel worse, immediately change the path; emphasise that it is Wysa’s, not the user’s mistake; and that Wysa is a bot that is still learning. We closely track where and when this happens, and any responses that meet our criteria for a safety hazard are immediately raised to our clinical safety process which includes review with children’s mental health professionals.

We constantly strive to improve our detection, and are also starting to collaborate with other people dealing with similar issues and create a common pool of resources.

5. I understand that Wysa uses AI. I also note that there are so many discussions around the world relating to trust (or lack of it) in products and services that use AI. A user wants to trust a product, and if it’s health related, then trust becomes even more critical. What have you done as a company to ensure that Wysa (and the AI behind the scenes) can be trusted?
You’re so right about all so many discussions about AI, how this data is used, and how it can be misused. We explicitly tell users that their chats stays private (not just anonymous), that this will never be shared with third parties. In line with GDPR, we also give users the right to ask for their data to be deleted.

After downloading, there is no sign-in. We don’t collect any personally identifiable data about the user: you just give yourself a nickname and start chatting with Wysa. The first conversation reinforces this message, and this really helps in building trust as well as engagement.

AI of the generative variety will not be ready for products like Wysa for a long time - perhaps never. They have in the past turned racist or worse. The use of AI in applications like Wysa is limited to detection and classification of user free text, not generating ‘advice’. So the AI here is auditable, testable, quantifiable - not something that may suddenly learn to go rogue. We feel that trust is based on honesty, so we do our best to be honest about the technical limitations of Wysa.

Every Wysa response and question goes through a clinical safety process, and is designed and reviewed by a clinical psychologist. For example, we place links to journal articles in each tool and technique that we share with the user.

6. What could you and your peers who make products like this do to foster greater trust in these products?
As a field, the use of conversational AI agents in mental health is very new, and growing fast. There is great concern around privacy, so anonymity and security of data is key.

After that, it is important to conduct rigorous independent trials of the product and share data openly. A peer reviewed mixed method study of Wysa’s efficacy has been recently published in JMIR, for this reason, and we working with universities to further develop these. It’s important that advancements in this field are science-driven.

Lastly, we need to be very transparent about the limitations of these products - clear on what they can and cannot do. These products are not a replacement for professional mental health support - they are more of a gym, where people learn and practice proven, effective techniques to cope with distress.

7. What could regulators to foster an environment where we as a user feel reassured that these chatbots are going to work as we expect them to?
Leading from your question above, there is a big opportunity to come together and share standards, tools, models and resources.

For example, if a user enters a search term around suicide in Google, or posts about self-harm on Instagram, maybe we can have a common library of Natural Language Processing (NLP) models to recognise and provide an appropriate response?

Going further, maybe we can provide this as an open-source to resource to anyone building a chatbot that children might use? Could this be a public project, funded and sponsored by government agencies, or a regulator?

In addition, there are several other roles a regulator could play. They could fund research that proves efficacy, defines standards and outlines the proof required (the NICE guidelines recently released are a great example), or even a regulatory sandbox where technology providers, health institutions and public agencies come together and experiment before coming to a view.

8. Your website mentions that “Wysa is... your 4 am friend, For when you have no one to talk to..” – Shouldn’t we be working in society to provide more human support for people who have no one to talk to? Surely, everyone would prefer to deal with a human than a bot? Is there really a need for something like Wysa?
We believed the same to be true. Wysa was not born of a hypothesis that a bot could help - it was an accidental discovery.

We started our work in mental health to simply detect depression through AI and connect people to therapy. We did a trial in semi-rural India, and were able to use the way a person’s phone moved about to detect depression to a 90% accuracy. To get the sensor data from the phone, we needed an app, which we built as a simple mood-logging chatbot.

Three months in, we checked on the progress of the 30 people we had detected with moderate to severe depression and whose doctor had prescribed therapy. It turned out that only one of them took therapy. The rest were okay with being prescribed antidepressants but for different reasons, ranging from access to stigma, did not take therapy. All of them, however, continued to use the chatbot, and months later reported to feeling better.

This was the genesis of Wysa - we didn’t want to be the reason for a spurt in anti-depressant sales, so we bid farewell to the cool AI tech we were doing, and began to realise that it didn’t matter if people were clinically depressed - everyone has stressors, and we all need to develop our mental health skills.

Wysa has had 40 million conversations with about 800,000 people so far - growing entirely through word of mouth. We have understood some things about human support along the way.

For users ready to talk to another person about their inner experience, there is nothing as useful as a compassionate ear, the ability to share without being judged. Human interactions, however, seem fraught with opinions and judgements. When we struggle emotionally, it affects our self image - for some people, it is easier to talk to an anonymous AI interface, which is kind of an extension of ourselves, than another person. For example, this study found that US Veterans were three times as likely to reveal their PTSD to a bot as a human: But still human support is key - so we run weekly Ask Me Anything (AMA) sessions on the top topics that Wysa users propose, to discuss every week with a mental health professional. We had a recent AMA where over 500 teenagers shared their concerns about sharing their mental health issues or sexuality with their parents. Even within Wysa, we encourage users to create a support system outside.

Still, the most frequent user story for Wysa is someone troubled with worries or negative thoughts at 4 am, unable to sleep, not wanting to wake someone up, scrolling social media compulsively and feeling worse. People share how they now talk to Wysa to break the negative cycle and use the sleep meditations to drift off. That is why we call it your 4 am friend.

9. Do you think there is enough room in the market for multiple chatbots in mental health?
I think there is a need for multiple conversational interfaces, different styles and content. We have only scratched the surface, only just begun. Some of these issues that we are grappling with today are like the issues people used to grapple with in the early days of ecommerce - each company solving for ‘hygiene factors’ and safety through their own algorithms. I think over time many of the AI models will become standardised, and bots will work for different use cases - from building emotional resilience skills, to targeted support for substance abuse.

10. How do you see the future of support for mental health, in terms of technology, not just products like Wysa, but generally, what might the future look like in 2030?
The first thing that comes to mind is that we will need to turn the tide from the damage caused by technology on mental health. I think there will be a backlash against addictive technologies, I am seeing the tech giants becoming conscious of the mental health impact of making their products addictive, and facing the pressure to change.

I hope that by 2030, safeguarding mental health will become a part of the design ethos of a product, much as accessibility and privacy has become in the last 15 years. By 2030, human computer interfaces will look very different, and voice and language barriers will be fewer.

Whenever there is a trend, there is also a counter trend. So while technologies will play a central role in creating large scale early mental health support - especially crossing stigma, language and literacy barriers in countries like India and China, we will also see social prescribing gain ground. Walks in the park or art circles become prescriptions for better mental health, and people will have to be prescribed non-tech activities because so much of people’s lives are on their devices.

[Disclosure: I have no commercial ties to any of the individuals or organizations mentioned in this post]

Enter your email address to get notified by email every time I publish a new post:

Delivered by FeedBurner

Honesty is the best medicine

In this post, I want to talk about lies. It’s ironic that I’m writing this on the day of the US midterm election where the truth continues to be a rare sight to witness. Many in the UK feel they were lied to by politicians over the Brexit referendum. Apparently, politicians face a choice, lie or lose. Deception, deceit, lying, however you want to describe it, it’s part of what makes us human. I reckon we’ve all told a lie at some point, even if we’ve told a ‘white lie’ to avoid hurting someone’s feelings. Now, some of us are better at spotting when others are not telling the truth. Some of us prefer to build a culture of trust. What if we had a new superpower? A future where machines tell us in real time who is lying.

What compelled me to write this post was reading a news article about a new trial in the EU of virtual border agents powered by Artificial Intelligence (AI), which aims to “ramp up security using an automated border-control system that will put travellers to the test using lie-detecting avatars.” I was fascinated to read statements about the new system such as “IBORDERCTRL’s system will collect data that will move beyond biometrics and on to biomarkers of deceit.” Apparently, the system can analyse micro expressions on your face and include that information as part of a risk score, which will then be used to determine what happens next. At this point in time, it’s not aimed at replacing human border agents, but simply to help to pre-screen travellers. It sounds sensible right, if we can use machines to help keep borders secure? However, the accuracy rate of the system isn’t that great and some are labeling this type of system as pseudoscience and it will lead to unfair outcomes. It’s essential we all pay attention to these developments, and subject them to close scrutiny.

What if machines could one day automatically detect if someone speaking in court is lying? Researchers are working towards that. Check out the project called, DARE: Deception Analysis and Reasoning Engine, where the abstract of their paper opens with “We present a system for covert automated deception detection in real-life courtroom trial videos.“ As algorithms get more advanced, the ability to detect lies could go beyond analysing videos of us speaking, it could even spot when we our written statements are false. In Spain, police are rolling out a new tool called VeriPol which claims to be able to spot false robbery claims, i.e. where someone has submitted a report to the police claiming they have been robbed, but the tool can find patterns that indicate the report is fraudulent. Apparently, the tool has a success rate of over 80%. I came across as British startup, Human, that states on their website, “We use machine learning to better understand human's feelings, emotions, characteristics and personality, with minimum human bias” and honesty is included in the list of characteristics their algorithm examines. It does seem like we are heading for a world where it will be more difficult to lie.

What about healthcare? Could AI help spot when people are lying? How useful would it be to know if your patient (or your doctor) is not telling you the truth? In this 2014 survey in the USA, the patient deception report stated that 50% of respondents said they withhold information from their doctor during a visit, lying most frequently about drug, alcohol and tobacco use. Zocdoc’s 2015 survey found that 25% of patients lie to their doctor. There was an interesting report about why some patients are not adhering to what a doctor’s advice, and it’s because of financial strain, and that some low income patients are reluctant to discuss their situation with their doctor. The reasons why a patient might be lying are not black and white. How does an algorithm take that into account? In terms of doctors not telling patients the truth, is there ever a role for benevolent deception? Can a lie ever be considered therapeutic? From what I’ve read, lying appears to be a path some have to take when caring for those living with Dementia, to protect the patient.

shutterstock_570913984.jpg

Imagine you have a video call with your doctor and on the other side, the doctor has access to an AI system analysing your face and voice in real time and determining not just if you’re lying or not, but your emotional state too? That’s what is set to happen in Dubai with the rollout of a new app. How does that make you feel, either as a doctor or as a patient? If the AI thinks the patient is lying about their alcohol intake, would it include that determination against the patient’s medical record? What if the AI is wrong? Given the accuracy of these AI lie detectors is far from perfect, there are serious implications if they become part of the system. How might that work during an actual visit to the doctor’s office? In some countries, will we see CCTV in the doctor’s office with AI systems analysing every moment of the encounter to figure out which answers were truthful? What comes next? Smart glasses that a patient can wear when visiting the doctor and the glasses tell the patient how likely it is that the doctor is lying to them about their treatment options? Which institutions will turns to this new technology because it feels easier (and cheaper) than fostering a culture of trust, mutual respect and integrity?

What if we don’t want to tell the truth but the machines around us that are tracking everything reveal the truth for us? I share this satirical video below of Amazon Alexa fitted to a car, do watch it. Whilst it might be funny, there are potential challenges ahead in terms of our human rights and civil liberties in this new era. Is AI powered lie detection the path towards ensuring we have a society with enough transparency and integrity or are we heading down a dangerous path by trusting the machines? Is honesty really the best medicine?

[Disclosure: I have no commercial ties with any of the organisations mentioned in this post]

Enter your email address to get notified by email every time I publish a new post:

Delivered by FeedBurner

AI in healthcare: Involving the public in the conversation

As we begin the 21st century, we are in an era of unprecedented innovation, where computers are becoming smarter, being used to deliver products and services powered by Artificial Intelligence (AI). I was fascinated how AI is being used in advertising, when I saw a TV advert this week from Microsoft where a musician was talking about the benefits of AI. Organisations in every sector, including healthcare, are having to think how they can harness the power of AI. I wrote a lot about my own experiences in 2017 using AI products for health in my last blog, You can’t care for patients, you’re not human!

Now when we think of AI in healthcare potentially replacing some of the tasks done by doctors, we think of it as a relatively recent concept. We forget that doctors themselves have been experimenting with technology for a long time. In this video from 1974 (44 years ago!), computers were being tested in the UK with patients to help optimise the time spent by the doctor during the consultation. What I find really interesting is that in the video, it’s mentioned that the computer never gets tired and some patients prefer dealing with the machine than the human doctor.

Fast forward to 2018, where it feels like technology is opening up new possibilities every day, and often from organisations that are not traditionally part of the healthcare system. We think of tech giants like Google and Facebook helping us send emails or share photos with our friends, but researchers at Google are working with AI on being able to improve detection of breast cancer and Facebook has rolled out an AI powered tool to automatically detect if a user’s post shows signs of suicidal ideation.

What about going to the doctor? I remember growing up in the UK that my family doctor would even come and visit me at home when I was not well. Those are simply memories for me, as it feels increasingly difficult to get an appointment to see the doctor in their office, let alone getting a housecall. Given many of us are using modern technology to do our banking and shopping online, without having to travel to a store or a bank and deal with a human being, what if that were possible in healthcare? Can we automate part (or even all) of the tasks done by human doctors? You may think this is a silly question, but we have to step back a second and reflect upon the fact that we have 7.5 billion people on Earth today and that is set to rise to an expected 11 billion by the end of this century. If we have a global shortage of doctors today, and since it’s predicted to get worse, surely the right thing to do is to leverage emerging technology like AI, 4G and smartphones to deliver healthcare anywhere, anytime to anyone?

doctorvisit.jpg

We have the emergence of a new type of app known as Symptom Checkers, which provides anyone with the ability to enter symptoms on their phone and to be given a list of things that may be wrong with them. Note that at present, these apps cannot provide a medical diagnosis, they merely help you decide whether you should go to the hospital or whether you can self care.. However, the emergence of these apps and related services is proving controversial. It’s not just a question of accuracy, but there are huge questions about trust, accountability and power? In my opinion, the future isn’t about humans vs AI, which is the most frequent narrative being paraded in healthcare. The future is about how human healthcare professionals stay relevant to their patients.

It’s critical that in order to create the type of healthcare we want, we involve everyone in the discussion about AI, not just the privileged few. I’ve seen countless debates this past year about AI in healthcare, both in the UK and around the world, but it’s a tiny group of people at present who are contributing to (and steering) this conversation. I wonder how many of these new services are being designed with patients as partners? Many countries are releasing national AI strategies in a bid to signal to the world that they are at the forefront of innovation. I also wonder if the UK government is rushing into the implementation of AI in the NHS too quickly? Who stands to profit the most from this new world of AI powered healthcare? Is this wave of change really about putting the patient first? There are more questions than answers at this point of time, but those questions do need to be answered. Some may consider anyone asking difficult questions about AI in healthcare as standing in the way of progress, but I believe it’s healthy to have a dialogue where we can discuss our shared concerns in a scientific, rational and objective manner.

rational.jpg

That’s why I’m excited that BBC Horizon is airing a documentary this week in the UK, entitled “Diagnosis on Demand? The Computer Will See You Now” – they had behind the scenes access to one of the most well known firms developing AI for healthcare, UK based Babylon Health, whose products are pushing boundaries and triggering controversy. I’m excited because I really do want the general public to understand the benefits and the risks of AI in healthcare so that they can be part of the conversation. The choices we make today could impact how healthcare evolves not just in the UK, but globally. Hence, it’s critical that we have more science based journalism which can help members of the public navigate the jargon and understand the facts so that informed choices can be made. The documentary will be airing in the UK on BBC Two at 9pm on Thursday 1st November 2018. I hope that this program acts as a catalyst for greater public involvement in the conversation about how we can use AI in healthcare in a transparent, ethical and responsible manner.

For my international audience, my understanding is that you can’t watch the program on BBC iPlayer, because at present, BBC shows can only be viewed from the UK.

[Disclosure: I have no commercial ties with any of the organisations mentioned in this post]

Enter your email address to get notified by email every time I publish a new post:

Delivered by FeedBurner