speech-to-text

Description (in English)

Talking Cure is an installation that includes live video processing, speech recognition, and a dynamically composed sound environment. It is about seeing, writing, and speaking — about word pictures, the gaze, and cure. It works with the story of Anna O, the patient of Joseph Breuer's who gave to him and Freud the concept of the "talking cure" as well as the word pictures to substantiate it. The reader enters a space with a projection surface at one end and a high-backed chair, facing it, at another. In front of the chair are a video camera and microphone. The video camera's image of the person in the chair is displayed, as text, on the screen. This "word picture" display is formed by reducing the live image to three colors, and then using these colors to determine the mixture between three color-coded layers of text. One of these layers is from Joseph Breuer's case study of Anna O. Another layer of text consists of the words "to torment" repeated — one of the few direct quotations attributed to Anna in the case study. The third layer of text, which becomes visible only when a person is in the chair, reworks Anna's snake hallucinations through the story of the Gorgon Medusa, reconfiguring the analytic gaze. Speaking into the microphone triggers a speech-to-text engine that replaces Anna's words with what it (mis)understands the participant to have said. What is said into the microphone is also recorded, and becomes part of a sound environment that includes recordings of Breuer's words, Anna's words, our words, and all that has been spoken over the length of the installation. Others in the space observe the person in the chair through word pictures on the screen. Readers move their bodies at first to create visual effects, and then to achieve textual ones, creating new reading experiences for themselves and others in the room. Movements range from slowly moving an extended arm in order to recreate left-to-right reading, to head or hand rotation seeking evocative neologisms at the mobile textual borders within the image. The video processing technique was created by Utterback, and has been exhibited separately as Written Forms. The sound environment was designed and implemented by Castiglia, and Nathan Wardrip-Fruin implemented the speech-to-text. Talking Cure was first presented at the 2002 Electronic Literature Organization symposium at UCLA. I have also presented it as a performance/reading, cycling verbally between the layers of text while my image is projected as a different textual mixture on a screen.

(Source: Author's website.)

I ♥ E-Poetry entry
Screen shots
Image
Image
Image
Image
Multimedia
Remote video URL
Contributors note

with: Camille Utterback, Clilly Castiglia, and Nathan Wardrip-Fruin. The video processing technique was created by Utterback, and has been exhibited separately as Written Forms. The sound environment was designed and implemented by Castiglia, and Nathan Wardrip-Fruin implemented the speech-to-text.

Description (in English)

Simon Biggs with Mark Shovman developed a virtual interactive artwork in response to a commission of the Poetry Beyond Text project. Tower is inspired by the story of the Tower of Babel. Inter-subjective relations are central to this work, which evokes the idea of first-, second- and third-person perspectives. Tower is an interactive work which creates an immersive 3D textual environment combining visualisation, physical interaction, speech recognition and predictive text algorithms. Viewers (or inter-actors) occupy one of three roles: as central inter-actor, wearing a VR head-display, as one of several inter-actors, wearing 3D spectacles, or as spectator, standing outside the interactive zone. The central inter-actor is located at the vertiginous pinnacle of a virtual spiral word structure. When the inter-actor speaks their spoken words appear to float from their mouth and join the spiralling history of previously spoken words. As the uttered word emerges other words, predicted on the basis of statistical frequency within a textual corpus, spring from the spoken word. The second-person inter-actors see words appearing from the first-person inter-actor's mouth and the spiral gradually growing, with the first-person inter-actor at its pinnacle, while the third-person observers stand outside the interactive zone, observing the tableau. As it grows the spiral comes to resemble a Tower of Babel composed of words, spoken and potential. (Source: Poetry Beyond Text)

Technical notes

Tower requires a high powered gaming type PC with multi-screen capability, 3D stereo projection (two instances of), matching 3D spectacles, a head mounted display system, spatial tracking system, a voice recognition optimised wireless microphone and Windows 7 or XP.

Description (in English)

"Utter is a new interactive performance work that employs computer speech recognition, motion sensing and digital memory to create an adaptive linguistic palimpsest. The system records speech and the location, movement and orientation of the speaker, using this data to create a dynamic display of texts that can interact with one another" (author-submitted abstract). Older utterances appear darker, smaller and further away whilst recent utterances appear larger, brighter and closer. The actions of the speaker determine the behaviour of the texts. Recorded utterances can recombine with one another, employing structural grammars to create new texts. Grammatical elements can migrate through the emergent 3D ecology of texts and thus through time. Utter engages the performative through the transformative power of language and suggests a system of Chinese-whispers constituted as textual recombinance and migration" (author-submitted abstract).

Multimedia
Remote video URL
Technical notes

Requires Intel based Mac pro computer, video projection, live Firewire video camera, speech recognition optimised wireless microphone and custom software by the artist.