How to tutor AI from an ‘F’ to an ‘A’

In his book “Idea Man,” Paul Allen describes his dream of a Digital Aristotle, an “easy-to-use, all-encompassing knowledge storehouse...to advance the field of AI.” With the creation of the Allen Institute for Artificial Intelligence (AI2) in 2014, Project Aristo (connoting Aristotle as a child) was started as a flagship project towards this goal, initially aimed at grade school level science.

Now, after six years of research, Project Aristo has reached a remarkable milestone, scoring over 90 percent on the non-diagram, multiple choice questions in the NY Regents Science Exam (Eighth Grade). Even back in 2016, the best AI systems flunked this test, scoring less than 60 percent. However, with a combination of sophisticated research at AI2, and the rapid advances in the field of natural language processing (NLP) as a whole, this remarkable success has been achieved. We interviewed Dr. Peter Clark, the leader of the Aristo project, about this achievement.

How did you take Aristo from an ‘F’ to an ‘A’ in such a short time?

It's a combination of factors- perhaps the most impactful thing has been the rapid progress in the field of natural language processing (NLP) as a whole, which we've both contributed to and been able to leverage at AI2. Even five years ago, computers had a lot of difficulty understanding what was written in text. Thanks to a rapid progression of advances, we now have AI systems that are much better able to understand language. In fact, a model called ELMo developed here at AI2 was an important catalyst for the rapid recent improvements in the field of NLP. My team has been able to take these techniques and find innovative ways of applying them to answer science questions.

What is inside Aristo?

Aristo contains several different modules, that we call "solvers," that try to answer science questions in different ways. For example, one solver looks to see if an answer is written down somewhere in a large amount of text. Another tries to answer questions that require reasoning, by combining two pieces of information together. For example it can realize that "an iron nail conducts electricity" because it knows that "iron is a metal" and "metals conduct electricity." Another is a specialist solver that answers questions about comparisons; for example "would a rougher surface have more or less friction than a smooth surface?" And so on. Finally, a special module combines all the different answers together to decide on the overall best answer.

Aristo's long-term goal is not just about passing science tests, it’s about creating a system that has a deeper understanding of science...

Dr. Peter Clark

Does Aristo still struggle with certain types of questions?

Aristo isn't able to handle questions with diagrams very well except in a few special cases. For instance, Aristo can answer questions about food chains, but it can’t answer those that require reading a map or studying a bar chart. It also has difficulty dealing with hypothetical situations. For example, Aristo struggles with the following question: “If you pull the leaves off a plant, what would the result be?” A good answer would be that the plant is no longer able to make its own food. But Aristo struggles with this question because it lacks the ability to create an imaginary world and then predict what might happen in that world.

What are Aristo’s real-world applications?

Aristo's long-term goal is not just about passing science tests, it’s about creating a system that has a deeper understanding of science, with many potential applications. There are three areas in particular that seem promising. The first is in the area of education and personalized education, where Aristo could help a child understand science by providing custom tutoring appropriate to the child's age and learning style. The second is in helping scientists. I can imagine Aristo offering relevant, timely background information on scientific concepts and prior work to a scientist in a laboratory. Finally, longer term, Aristo might help in scientific discovery itself, making new insights and connections where people haven’t been able to in the past, in deeply complex areas such as medicine or engineering. Aristo currently has a long way to go to reach these goals, of course, but performing so well on the Regents Science exam is a tremendous step forward.

Peter Clark-Ai2 Dr. Peter Clark, leader of Project Aristo.

What does passing an 8th grade science test mean for the AI field?

I think this is a particularly compelling and understandable example of what can be done with current technology, and how far we have progressed. Unlike many other benchmark tasks, science exams are natural, understandable, and challenging, and allows people to better understand the state of AI.

In three years, where do you expect AI to be?

It's very difficult to predict. I’ve seen more changes in the last five years than in my nearly 35-year career. Advances in the new technology of deep learning, and new techniques for modeling language, have dramatically changed the field. The impact is a bit analogous to what the silicon chip did for the computer revolution. Today, computers are able to understand language better and perform simple reasoning, but they still struggle with more complex problems. I expect that to change in the next three years.

Do you fear the ‘rise of AI?'

In my view, those risks are very overblown, computers are not about to take over. Rather, many problems today arise from the lack of technology, for example when a disease goes undiagnosed, or a driver is not alerted to an imminent danger, or fraudulent activity goes undetected. The benefits of systems like Aristo and others far outweigh these future risks. And although the AI field is moving forward quickly, Aristo has given me a new appreciation of just how sophisticated human reasoning is and how far away computers are from matching the full range of skills that a person has. There are still a lot of advances to be made.