“Turing” the landscape of the Imitation Game

A chatbot designed to respond (in English) like a 13-year-old Ukrainian boy (with limited English skills) was recently reported to have passed the Turing Test. Many commentators were quick to demonstrate that the ‘bot emphatically did not – and cannot – pass any version of the Turing Test having any meaningful connection to intelligence.

In my view, the chatbot, “Eugene Goostman”, merely entered the wrong competition. If it were to enter a competition in which every entrant was introduced as a 13-year-old from Ukraine, and all entrants had been either programmed or coached to impersonate a 13-year-old from Ukraine, we could more validly compare its results to that of its competitors. In the Reading University competition, however, other entrants were (tacitly) assigned different roles – roles arguably more difficult to pull off, such as “full-grown, educated, native English-speaking person.”

When Alan Turing famously proposed that the Imitation Game be used as the “gold standard” for machine intelligence, he did so with a challenging version of the game in mind – the version that humans play. To be considered decently good at the Imitation Game, a player (say, a man or a computer program charged with imitating a woman) would have to prove indistinguishable from “the genuine article” (a variety of actual woman players) a great deal of the time. The perfect “female impersonator” would be able to convince an impartial judge just as often, on average, as a woman can. (Bear in mind that even a genuine woman will be mistaken for a non-woman pretending to be a woman, a certain percentage of the time – especially by a judge wary of being duped.)

If a computer program could convince a panel of skeptical judges that it is a person just as often as the average person could do, that program would be a Master Im-person-ator – and indisputably intelligent.

That’s a big “if”. The best we can (generously) allow, at present, is that a computer program may now perform adequately well at an extremely limited and “dumbed-down” version of the Turing Test. We might envision the space of all Turing-Related Tests as a plane, of which a few tiny slivers – representing such roles as “ignorant, sassy teenager” and “abusive, paranoid weirdo” – are coloured either yellow or a very faint shade of green. All other “tiles” in the plane – “full-grown, educated, native English-speaking person”, “woman”, “award-winning journalist”, etc. –  are either red or uncoloured.

Intelligent systems: Dimensions

Via social media, I received a frank and pithy comment on my previous post (Machine intelligence: Scary? Necessary.):

Machine Intelligence? Dream on. Machines can only really appear intelligent to humans who are not.

In pondering how to respond, I realized I was a bit intellectually lazy when I implied that there might be some “threshold of intelligence.” To the contrary, intelligence is, to my mind, a continuum. Any system (including biological systems) that exhibits any sort of variable behavior in response to variable data (including stimuli) has non-zero intelligence. An earthworm, for example, is obviously very “stupid” in comparison with many creatures – but it is capable of cognition, as a function of its evolutionary “programming”. The algorithms which govern its behaviour are sophisticated and cannot be easily deconstructed – nor (yet) duplicated by human programmers of non-biological systems (“robotic worms”, if such things existed).

I also want to avoid depicting intelligence as a scalar quantity. To try to measure the conventional IQ of a computer system – or any non-human system – is folly. (The value of IQ has been challenged even as a measure of relative human intelligence – but that’s another discussion.) When we think of an intellect – that of a worm, a person, or a cybernetic system – as a “shape” having at least two dimensions – let’s call them breadth and depth – we arrive at a somewhat more vivid and meaningful basis for comparison. IBM Watson (as the example from my previous post) exhibits an impressive breadth of intellect, given that it is designed to process text in the English language that might relate to any topic. In depth, however, it reveals its “stupidity”. For example, Watson is not designed to be original or creative, whatsoever; it is designed to play Jeopardy, to which originality would be, if anything, a disadvantage. Watson’s “knowledge” of any particular subtopic will be revealed to be woefully inconsistent and brittle, upon probing. It’s quite possible – even likely – that Watson will answer several advanced questions on a given topic, successfully – but then come up clueless on what human experts would agree is a basic question. Of course, that’s because Watson is incapable of recognizing and assimilating the core body of knowledge on a topic, distinguishing the fundamental laws from the “esoterica”. It has no deep understanding of any topic, and grasps no themes or theories that might allow it to come up with an answer by extension or analogy. We could quite aptly characterize Watson’s intellect as “a mile wide and an inch deep.”

For an example of an information system having quite the opposite “intellectual dimensions” as IBM Watson – that is, an extremely limited breadth, but significant (and, I find, impressive) depth – see Copycat, designed by the Fluid Analogies Research Group, headed by Douglas R. Hofstadter, at the Center for Research on Concepts and Cognition, University of Indiana, Bloomingdale. Copycat is a system designed to generate solutions to a certain kind of analogy-based problem, and to do it in a way that resembles the human process (as self-reported by human solvers). Here’s an example of Copycat being “smart”:

Human “says” (via coded terms): “I turn the string abc into abd. Copycat, you do the same with xyz.”

Copycat “answers”: “I turn xyz into wyz.” [Like most humans, Copycat “prefers” this answer to wxz, or any other.]

As a final example (or class of examples), let’s consider “crowdsourced” knowledge bases. Arguably, these systems are highly intelligent (by the terms laid out, herein), able to marry the capabilities of data/knowledge management software to store and index a comprehensive library of facts (breadth) with the “wisdom of the crowd”, inherent in aspects of human intelligence, intuition and social interaction – providing depth. One could convincingly argue that the overwhelming lion’s share of such a system’s intellect is due to the contribution of the crowd of humans – but then, the “one” making that argument would presumably be a human, and subject to bias. Suffice it to say that the results (outputs) of such a system – assuming the non-human elements are cleverly designed – are likely to be far “smarter” than the results of a crowd of humans thrown together in a room and asked to draw conclusions on a given topic, absent any “machine augmentation” of their collective intellect.

To sum up my reply to my critic: It doesn’t matter so much whether a given system is relatively “smart” or “stupid”, as it matters that systems can possess intelligence – and that not all systems possessing intelligence are 100% biological.

Next topic: So-called “Business Intelligence” (in capitals, no less!). Stay tuned!