This demo is an implementation of a sequence alignment script.

The input is two collections of sequences, viz. tokens and tags. Each sequence in the collection of tokens is paired with a sequence from the collection of tags. The order of the elements of a sequence can be random and is not used by the algorithm.

In this demo you can define lines with tokens and lines with tags. The tags and tokens that are paired should be on the same line. The demo will derive which tag belongs to which token.

For ambiguous problems, tokens can refer to multiple tags and different tokens can refer to the same tag. But, the number of elements in a token sequence should always be the same as the number of elements in the associated tag sequence. Extensive or under-specified vocabularies will not produce valid solutions because of constraints on the processing time.

So, given the example below. The task is to find that e.g. "dog" is "chien", etc.

cat and yellow dog
the cat loves a dog
the yellow dog sleeps well
the cat sleeps and eats
a yellow dog eats well
chatte et chien jaune
la chatte aime un chien
le chien jaune dort bien
la chatte dort et mange
un chien jaune mange bien



If processing takes longer than 40 sec, the computation is aborted. Implying that no answer will be returned upon providing a large amount of data. It also takes more time to process ambiguous problems.


 Simple problem: a token is associated with only one tag
 Single-ambiguous problem: a token is associated with only one tag, except for
      the NONE token, which can be linked to multiple tags (click for example)
 Ambiguous problem: a token may be associated with multiple tags
 Bootstrap using guessing algorithm (Faster, but may make errors)