Computational modeling of infant vocal development

We are creating computational models to try to understand the mechanisms underlying typical and atypical early vocal motor development. In most cases, the models consist of neural networks that control the muscles in an articulatory synthesizer, which itself models the mechanics of a human vocal tract. The neural networks learn through self-organized learning that in some cases is increased or decreased depending on whether or not the models have received reinforcement for the sounds they have produced. As a result of learning, the models demonstrate increased vocal skills such as the ability to produce more speech-like sounds and the ability to imitate sounds they hear. We have been investigating what happens when the models receive reinforcement from different sources, social (from another person) and intrinsic (hearing one's own sounds). We have also been exploring different neural network architectures, including connectionist-style neural networks and spiking neural networks. In future work, we hope to build different versions of these models that address how various disorders and impairments, such as autism and hearing impairment, affect vocal motor development. Here are some of our computational modeling publications:

Warlaumont, A. S., & Finnegan, M. F. (2016). Learning to Produce Syllabic Speech Sounds via Reward-Modulated Neural PlasticityPLOS ONE11(1), e0145096.

Warlaumont, A. S., Westermann, G., Buder, E. H., & Oller, D. K. (2013). Prespeech motor learning in a neural network using reinforcementNeural Networks38, 64-95.

Warlaumont, A. S., Westermann, G., & Oller, D. K. (April, 2011). Self-production facilitates and adult input interferes in a neural network model of infant vowel imitation. In D. Kazakov and G. Tsoulas (Eds.), Proceedings of the AISB 2011 Symposium on Computational Models of Cognitive Development. Society for the Study of Artificial Intelligence and the Simulation of Behaviour, 8-12.

Human vocal development in naturalistic settings

Of course, understanding early vocal development requires also studying how real humans behave. Lately, we primarily work with day-long audio recordings of children (using the LENA system). Collecting day-long, longitudinal samples of children's vocalizations has the advantages of capturing the full range of activities and contexts infants experience as well as the full range of sounds they produce. It also allows us to look at how small local effects might add up to create real differences in overall behavior. For example, one question we are asking is how differences in social interaction as a function of infant behavior can influence children's speech development, and how this process may differ for children with autism and from different socioeconomic backgrounds. Working with large, naturalistic datasets poses many technical challenges. A large part of our efforts in this area are therefore focused on identifying appropriate automated analysis methods.

Here are some papers and presentations on our work with human children:

Warlaumont, A. S., Richards., J. A., Gilkerson, J., & Oller, D. K. (2014). A social feedback loop for speech development and its reduction in autismPsychological Science. doi: 10.1177/0956797614531023 [supplemental materials]

Oller, D. K., Buder, E. H., Ramsdell, H. L., Warlaumont, A. S., Chorna, L., & Bakeman, R. (2013). Functional flexibility of infant vocalization and the emergence of languageProceedings of the National Academy of Sciences of the United States of America. doi: 10.1109/DevLrn.2012.6400842

Warlaumont, A. S., Oller, D. K., Buder, E. H., Dale, R., & Kozma, R. (2010). Data-driven automated acoustic analysis of human infant vocalizations using neural network toolsJournal of the Acoustical Society of America127(4), 2563-2577.

Evolution of vocal signals

We are building a model of the evolution of reflexive vocal signals. The model contains both signalers and receivers. The signalers use genetically-encoded neural networks that set muscle values in an articulatory synthesizer, generating vocal signals. The receivers use genetically-encoded neural networks to try to decode the vocal signals. A hope is that this work will help us understand how physiological constraints shape signal evolution. We also believe that having a computational model of reflexive signal production will eventually be useful in building more complete models of infant vocal learning. Here's a paper describing our work thus far:

Warlaumont, A. S., & Olney, A. M. (2015). Evolution of reflexive signals using a realistic vocal tract modelAdaptive Behavior, 23(4), 183-205.

Our work on infant vocal development also informs our understanding of the evolution of vocal signals. By analyzing the forms and functions of early infant sounds and comparing them to the forms and functions of nonhuman sounds we hope to gain an understanding of what makes human communication unique and which non-human primate sounds are most related to human speech. Here is a paper on this work. Collaborators: Kim Oller and his team at the University of Memphis