Mining Linguistic Content from Vast Audio and Video Archives for Multimodal Poetry

By Vian Rasheed, 14 November, 2019
Language
Year
Record Status
Abstract (in English)

This 20-minute presentation highlights research conducted as a Fellow in the MIT Open Documentary Lab developing a methodology and software for parsing linguistic and semantic information from vast quantities of audio and video files for playback and synchronization across networked computers. The presentation will focus on the expressive potential of this methodology to create new forms of multi-modal digital poems. The goal of this research is to extend recent advances in computational text analysis of written materials to the realm of audio and video media for use in a variety of different language centered media production contexts. This methodology and software provides the ability to parse vast quantities of audio and video files for topics, parts of speech, phonetic content, sentiment, passive/active voice and language patterns and then playback the video or audio content of the search for consideration in an aesthetic context. Queries that are intriguing can be saved and sequenced for playback as a poetic remix of linguistic patterns on one or multiple monitors. For instance, an e-lit poet can create a database of hundreds of audio recordings of poems from the Poetry Foundation and parse the recordings for moments of alliteration; the search can then be played back as a generative remix of alliterations across decades of poems and poets. This new composition could then be sequenced across multiple computers in a gallery setting and spatialized with speakers playing in different locations in a room; imagine the alliteration example above but coming from a dozen locations in a gallery, sometimes the samples playing in a sequence around the room, sometimes all at once with the same phrase other times with pairs of speakers triggering simultaneously. The poetic possibilities that can be explored between the choice of material for the database, the choice of linguistic and semantic parsing and choice of spatial configurations (how many playback devices, where are they located) can foster intriguing new forms of e-literature. To make the concepts concrete I will illustrate the research with video documentation from a recent digital poem I created with the work that uses Youtube typography tutorials as its source material for a sixteen-computer composition. This humorous work demonstrates the multimodal aspect of the research. For example, when parsing for parts of speech like a superlative adjective in the Youtube tutorial database, the visual content of the word the author is constructing in their Adobe Illustrator interface is visually displayed; creating an aural and visual combination that has both sonic and graphic impact. The presentation will provide an overview of the process for making this form of digital poem as well as demonstrate creative applications of the research.