Distributional bootstrapping with Memory-based learning

TitleDistributional bootstrapping with Memory-based learning
Publication TypeTalks
AuthorsCassani, G., Grimm R., Gillis S., & Daelemans W.
Place Presented26th Meeting of Computational Linguistics in the Netherlands (CLIN26), Amsterdam, The Netherlands
Year of Publication2015
Date Presented18 December 2015
Abstract

In this work, we explore Distributional Bootstrapping using Memory-based learning. In language acquisition, this concept refers to the hypothesis that children start breaking into the language by extracting the distributional patterns of co-occurrence of words and lexically-specific contexts. We started from identifying the pitfalls of past accounts and investigated the usefulness of different kinds of information encoded in distributional patterns, that of different types of contexts, and their interaction. In greater detail, we analysed the impact of three pieces of information that children are able to extract, as shown from several experimental studies: i) token frequency, or how many times a distributional cue occurs in the input; ii) type frequency, or the number of different words a cue occurs with; iii) (average) conditional probability of context given word, or the easiness with which the occurrence of a specific cue can be predicted given the occurrence of a target word, averaging over the words it occurs with. Moreover, we investigated the information conveyed when i) only bi-grams; ii) only trigrams; or iii) both are considered. Using several corpora of Child-directed speech from typologically different languages (English, French, Hebrew), we show the impact of distributional information and contexts on cue selection, performed in an unsupervised way. The goodness of the selected set of cues is assessed using learning curves resulting from a supervised Parts-of-Speech tagging experiment, performed with Memory-based learning. This way, we do not simply get a picture of the end state, but can also compare the learning trajectories that result from the use of different models and evaluate the specific contribution of each of the different pieces of information and types of context. We show that only certain conditions make learning possible, while others do not lead to any improvement.

KeywordsComputational modelling, Cross-linguistic analyses, Distributional bootstrapping, language acquisition, memory-based learning
Type of WorkOral presentation
AttachmentSize
clin26_bootstrapping.pdf1.52 MB
PDF