Home CV Teaching PhDs Research Output Publications Projects Software

Gaudi project Barcelona

Software Development

Grafon-D, Chyp, TDTDT

Software developed (1985-1987) for my PhD included LISP / KRS code for Dutch word-level language technology: text to phonetics (GRAFON-D), hyphenation and syllabification (CHYP), morphological analysis and synthesis, an inheritance-based object-oriented lexical knowledge base, and applications in spelling correction and verb inflection tutoring (TDTDT). The approach was largely frame-based (rules and objects, heavy use of multiple inheritance). Runs on Symbolics Lisp Machines, if you can find one.

TiMBL Memory-Based Learning package

I started implementing a LISP version of memory-based language processing (eventually called WAMBL) after I arrived in Tilburg in 1989, combining k-nn and vdm (as in Stanfill and Waltz’s memory-based reasoning) with information gain weighting of features (infogain as found in ID3, Quinlan’s decision tree learning algorithm). Later, I developed with Antal van den Bosch the IGTree algorithm (an oblivious decision tree learner without pruning to approximate memory-based learning in an efficient way). Based on an early reimplementation by Peter Berck in C, Ko van der Sloot developed TiMBL in C++. TiMBL 1.0 was released in 1998. See the TiMBL reference guide for more history and credits.

Mbt Memory-Based Tagging

This is a wrapper around TiMBL for tagging of sequences. It incorporates facilities for defining left and right context features in a flexible way. Originally designed for part of speech tagging.

Memory-Based Shallow Parsing

If you cut up the parsing process into different disambiguation and segmentation tasks, each of them amenable to supervised classification-based learning (e.g., using TiMBL and Mbt), you have the basic idea underlying memory-based shallow parsing. It contains classifiers for (at least) tokenization, part of speech tagging, phrase chunking, and grammatical relation finding, but the idea can be and has been further extended to include PP-attachment, named entity recognition, semantic role labeling, dependency parsing etc. The original MBSP is described in Daelemans et al. 1999 and Buchholz et al. 1999. Since then many different versions have been built by different people.

A recent one for English is MBSP

For Dutch there is Frog (previously known as Tadpole)

Packages that do Everything

I had some minor input on the design of Tom De Smedt's infamous Pattern package.


Last modified: Mon Jul 8 16:55:46 CEST 2013