speech-to-text

Talking Cure is an installation that includes live video processing, speech recognition, and a dynamically composed sound environment. It is about seeing, writing, and speaking — about word pictures, the gaze, and cure. It works with the story of Anna O, the patient of Joseph Breuer's who gave to him and Freud the concept of the "talking cure" as well as the word pictures to substantiate it. The reader enters a space with a projection surface at one end and a high-backed chair, facing it, at another. In front of the chair are a video camera and microphone. The video camera's image of the person in the chair is displayed, as text, on the screen. This "word picture" display is formed by reducing the live image to three colors, and then using these colors to determine the mixture between three color-coded layers of text. One of these layers is from Joseph Breuer's case study of Anna O. Another layer of text consists of the words "to torment" repeated — one of the few direct quotations attributed to Anna in the case study. The third layer of text, which becomes visible only when a person is in the chair, reworks Anna's snake hallucinations through the story of the Gorgon Medusa, reconfiguring the analytic gaze. Speaking into the microphone triggers a speech-to-text engine that replaces Anna's words with what it (mis)understands the participant to have said. What is said into the microphone is also recorded, and becomes part of a sound environment that includes recordings of Breuer's words, Anna's words, our words, and all that has been spoken over the length of the installation. Others in the space observe the person in the chair through word pictures on the screen. Readers move their bodies at first to create visual effects, and then to achieve textual ones, creating new reading experiences for themselves and others in the room. Movements range from slowly moving an extended arm in order to recreate left-to-right reading, to head or hand rotation seeking evocative neologisms at the mobile textual borders within the image. The video processing technique was created by Utterback, and has been exhibited separately as Written Forms. The sound environment was designed and implemented by Castiglia, and Nathan Wardrip-Fruin implemented the speech-to-text. Talking Cure was first presented at the 2002 Electronic Literature Organization symposium at UCLA. I have also presented it as a performance/reading, cycling verbally between the layers of text while my image is projected as a different textual mixture on a screen.

(Source: Author's website.)

I ♥ E-Poetry entry

http://iloveepoetry.com/?p=169

Screen shots

Multimedia

Contributors note

with: Camille Utterback, Clilly Castiglia, and Nathan Wardrip-Fruin. The video processing technique was created by Utterback, and has been exhibited separately as Written Forms. The sound environment was designed and implemented by Castiglia, and Nathan Wardrip-Fruin implemented the speech-to-text.

Read more about Talking Cure

Tower

Content type

Creative Work

Author

Simon Biggs

Mark Shovman

Year

2011

URL

Tower (Gallery page at Poetry Beyond Text)

Language

English

Publication Type

Exhibited at gallery or event

Event

Poetry Beyond Text: Vision, Text & Cognition Symposium

Record Status

Tags

multiple points of view

Description (in English)

Simon Biggs with Mark Shovman developed a virtual interactive artwork in response to a commission of the Poetry Beyond Text project. Tower is inspired by the story of the Tower of Babel. Inter-subjective relations are central to this work, which evokes the idea of first-, second- and third-person perspectives. Tower is an interactive work which creates an immersive 3D textual environment combining visualisation, physical interaction, speech recognition and predictive text algorithms. Viewers (or inter-actors) occupy one of three roles: as central inter-actor, wearing a VR head-display, as one of several inter-actors, wearing 3D spectacles, or as spectator, standing outside the interactive zone. The central inter-actor is located at the vertiginous pinnacle of a virtual spiral word structure. When the inter-actor speaks their spoken words appear to float from their mouth and join the spiralling history of previously spoken words. As the uttered word emerges other words, predicted on the basis of statistical frequency within a textual corpus, spring from the spoken word. The second-person inter-actors see words appearing from the first-person inter-actor's mouth and the spiral gradually growing, with the first-person inter-actor at its pinnacle, while the third-person observers stand outside the interactive zone, observing the tableau. As it grows the spiral comes to resemble a Tower of Babel composed of words, spoken and potential. (Source: Poetry Beyond Text)

Technical notes

Tower requires a high powered gaming type PC with multi-screen capability, 3D stereo projection (two instances of), matching 3D spectacles, a head mounted display system, spatial tracking system, a voice recognition optimised wireless microphone and Windows 7 or XP.