Conversation will be our new User Interface. – Satya Nadella
My day is filled with goal-directed tasks. They range from simple (scheduling, search) to complex (negotiation, persuasion). The more complex the task, the more likely it is to involve a dialog, passing messages to exchange information, to signal agreement, to assign actions.
True Conversational dialogues are held between people, composed of exchanges involving language, non-verbal cues, emotional nuances, and storytelling asides. When I turn to the computer, it is usually for simple tasks (search, compose, compute, connect) that are command/respond interactions. I can’t imagine how those could be made easier by making them conversational.
Microsoft envisions creating conversational apps with three levels of capability to support three different kinds of user tasks. Mediating people-to-people conversations invokes assistance in checking spelling, making connections, and translating languages. A step up finds Digital Assistants that know your context, and are able to take on tasks like scheduling appointments, planning travel, discovering music, and finding nearby places to eat. ‘bots add technical capabilities for machine learning and natural language understanding, and so are able to inform, advise, and anticipate user’s needs, enhanced versions of Cortana / Siri. (Somehow, they are always depicted as a bit creepy…and too often as young female servants)
Tay was intended to be one of the latter apps, but failed miserably when released into the wild to learn about people. Set upon by trolls, the algorithm was taught a set of wild untruths, which it happily used without any contextual understanding or moral discrimination.
At this year’s Build conference, Microsoft released the libraries and APIs that allow anyone to have a go at making a better version of ‘Tay’. There are basic tools for recognizing speech and faces, libraries that sense emotions and others to identify faces and voices. Some claim to Explore relationships among academic papers, journals, and authors or Contextually extend knowledge of people, locations, and events.
If it were true. I’ve been playing with the Linguistic Analysis tools, parsing structure from sentences. it’s pretty good on straightforward text.
But, maliciously, throw it some good colloquialisms and its a much bigger challenge:
Arguably, it’s assigned the tokens in some justifiable way, but I don’t know how you could make sense of the returned array.
I know I stagger through replying to ‘Don’t you want to go with me?’: Yes, I don’t…not…want to go with you…there, maybe.
And this is, I think, where the project, grandly named Microsoft Cognitive Services, goes wrong. There is nothing cognitive about deterministic logic, isolated from intuitive thought, consciousness, and free will. Computers win chess games through deep search, they translate languages by statistical matching, they win at Go by neural network learning.
These are marvelous technical advances, but they are pattern matching exercises within closed problem spaces with well-defined rules. Release the same algorithms onto the road in an autonomous car, and it hits a bus when the situation exceeds it’s experiences. It can learn, but it can’t intuit.