I mentioned Tangle in a previous post. Tangle was a program by TimBL that would encode links between character substrings, as far as I can tell, in bits of prose. It'd've been nice to use it on a dictionary—their whole purpose is to define words in terms of other words. I wonder just how much of an understanding of a language's vocabulary one needs for the ability to bootstrap further terms of that language using a dictionary?
Definitions are often, though not always, restatements of the meaning of a word in simpler terms. So it must be possible to expand some single complicated words into a sequence of less complicated words. There'll be a loss of meaning and connotation, but the principle of simplified English has been widely researched and is really just too close to the NLP problem.
For example, "John defenestrated Bob" could become "John threw out the window Bob". The grammar's off, and this is mainly hypothetical, but the principle's clear.
The quick thought that I'm just trying to scribble down here is that of a system for measuring the threshhold for definitions' complexities. You could count how many times a word is used in other definitions, and give it a commonness index based upon that. Then, in a certain piece of prose, you could expand words that are below a certain index, and keep expanding their definitions until hopefully you were only left with words below that index. If a word already expanded appears in some level of expansion of its definition, then you've got a loop and you could break there.
Many bonus points to anyone crazy enough to implement or have already implemented this.