Abstract
Most or all connectionist sentence processing models have been limited to simple sentences or small domains (St. John & McClelland, 1990; Miikkulainen & Dyer, 1991; Dell, Chang, & Griffin, 1999). Our goal is to develop an integrated connectionist model of sentence comprehension and production that is capable of handling complex sentences from a reasonably broad domain.
The language on which the model has been trained is a subset of English including 36 word stems and permitting active and passive constructions, multiple verb tenses, number agreement, several forms of relative clauses and prepositional phrases, articles, adjectives, adverbs, and dative objects. In these preliminary experiments, sentences were limited to 10 words in length. Comprehension is viewed as the process of mapping from a sequence of words to a static message, or representation of sentence meaning, and production as the reverse mapping. How messages are encoded is of principal importance.
Although hand-coded messages have been used in the past, designing appropriate representations for complex sentences is no easy task and they may hinder the networks' ability to make use of shared information. Therefore, networks were trained to form message encodings given the proposition components of sentence meaning. Although RAAMs (Pollack, 1990) were investigated, better representations were obtained using a query output network (St. John & McClelland, 1990). The message decoder was able to answer 99.0% of queries correctly. A simple recurrent network was then trained on the comprehension task using as targets the messages constructed by the encoder. Given the messages derived from comprehension, the decoder correctly responded to 96.1% of the queries.
A key feature of the model is the suggestion that word prediction during comprehension may provide a principal training signal for production. The word-selection component of production may be viewed as predicting what another person would say to convey a particular meaning. While learning comprehension, the network was also trained to predict the next word in the input, relying on a varying level of context. When given a complete message as context and selecting the most-strongly-predicted word for production, the network was able to correctly produce 86.5% of the sentences. This appears to be significantly better than results obtained in previous work (Dell, Chang, & Griffin, 1999). We have begun to evaluate the model in its pattern of errors, preference for various forms of embedding, and sensitivity to syntactic priming.
References
Dell, G. S., Chang, F., and Griffin, Z. M. (in press). Connectionist models of language production: Lexical access and grammatical encoding. Cognitive Science, 28(4).
Miikkulainen, R. and Dyer, M. G. (1991). Natural language processing with modular PDP networks and distributed lexicon. Cognitive Science, 15, 343-399.
Pollack, J. B. (1990). Recursive distributed representations. Artificial Intelligence, 46, 77-105.
St. John, M. F., and McClelland J. L. (1990). Learning and applying contextual constraints in sentence comprehension. Artificial Intelligence, 46, 217-457.