The interdisciplinary Science Data Center for Literature (SDC4Lit) reflects on the demands that net literature and born-digital archival material place on archiving, research and reading. The main goal is to implement appropriate solutions for a sustainable data lifecycle for the archive and for research purposes, which include introductory uses at university and school level. The focus is on the establishment of distributed long-term repositories for net literature and born-digital archival material and the development of a research platform. The repositories will be regularly expanded by the project and its cooperation partners and will form a hub for harvesting various forms of net literature in the future operation of SDC4Lit. The research platform will offer the possibility of computer-assisted work with the archived material. Since such a repository structure, which integrates collecting, archiving, and analysis, can only be accomplished through interdisciplinary collaboration, the project brings together partners with expertise in the subfields of archives, supercomputing, natural language processing, and digital humanities: The German Literature Archive (Deutsches Literaturarchiv) with a focus on archiving and preservation; the High Performance Computing Center Stuttgart (HLRS) with a focus on computing; the Institute for Natural Language Processing and the Institute for Literary Studies at the University of Stuttgart with a focus on NLP, cultural and literary history and digital humanities.
An important task of the project is the modeling of net literature and born-digital literature, which will initially be carried out in an example-oriented manner in dealing with an already existing corpus of net literature and exampes from the large born-digital collection at DLA. Underlying research on both technical and poetological challenges of digital, non-digital, and post-digital literature, e.g. on questions of genre or on computational approaches towards net literature and literary blogs as digital and networked objects.
In addition to digital objects and corresponding metadata, the accruing research data are also stored in a sustainable manner. Research data includes, first, research data generated in the course of the project's work, especially data used by regular services on the platform such as named entity recognition trained with data from the archived material. Secondly, the repository should offer the possibility to store research data generated by users of the research platform in a structured way and to make it available for further research. The connection of archival repository, research platform and research data repository follows standard research data management practices (FAIR principles) and works toward the goal to support a sustainable research data lifecycle for archivists and researchers working with electronic literature (on the web) and born-digital literature archived at the DLA archive and potential future cooperating institutions.