The Problem

07 Jul 2012

In my last post, I boldly declared that I want to make my computer intelligent — but that I’d start small, by solving a simple problem. The hard part is picking a problem that’s complex enough to be interesting, yet simple enough to solve.

I concluded by saying:

Past observations can be compressed by finding a general rule to describe them all. This rule can be used to extrapolate to deal with new situations and therefore predict present observations, or even the future. The rules are constantly being refined to ensure the predictions match present observations more accurately.

The AI system needs to spot patterns in “observations”, or its input; and extrapolate from those patterns to apply to new inputs. It needs to learn the rule that describes the transformation from input to ouput. This relationship is more precisely described by the mathematical idea of a function.

So, here’s the problem I’m planning to try and solve: given a set of examples of a relationship such as singular -> plural:

phase -> phases
fish -> fish
kiss -> kisses
fairy -> fairies
valley -> valleys
potato -> potatoes

…the system would compare each singular/plural pair like “phase -> phases”, to determine that, in general, the single letter “s” is being appended to the end of the word.

We can then pass in a new word, like “car”, that the system hasn’t seen before, and have the system compute the plural as being “cars” — by extrapolating from the examples.

I think this is called supervised learning. The example input/output pairs are known as the training data. The resulting inferred function should predict the correct output value for any valid input. The hard part is to “generalize from the training data to unseen situations in a ‘reasonable’ way”.

A system that can do this kinda generally — just add “s” — shouldn’t be too hard (I hope), but is still a reasonable challenge and quite an interesting problem.

Simply adding “s” will often be wrong, however: and so there is plenty of opportunity for the system to refine its predictions and learn new, more complex rules — and so plenty of opportunity for improving the software to be able to make these kind of deductions.

The idea of intelligence as compression fits in nicely, too: by learning the general rule, the computer doesn’t have to remember the plurals of every single word; only the exceptions.

So this as the problem I plan to solve: to build a system that can “learn” relationships between things from a set of examples, with singular->plural as the initial goal/test example; and can then use this “experience” to compute the relationship for new data, eg. the plural for an unseen singular.

My long-term plan for designing a fully “intelligent” AI (which does currently suffer rather from having only vague ideas of what it’s trying to do) has this idea of extrapolation at its core. It’s rather crucial to the design — such that if it fails, my whole idea is rubbish — so this would be good to know…