Filter

Subjects

Content Types

Countries

AID systems

API

Certificates

Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type

Keywords

Metadata standards

PID systems

Provider types

Quality management

Repository languages

Software

Syndications

Repository types

Versioning

  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 39 result(s)
The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. The core of the dataset is the feature analysis and metadata for one million songs, provided by The Echo Nest. The dataset does not include any audio, only the derived features. Note, however, that sample audio can be fetched from services like 7digital, using code we provide.
Country
Ecosounds is a repository of environmental audio recordings. This website facilitates the management, access, visualization, and analysis of environmental acoustic data. It uses the Acoustic Workbench software which is open source and available from GitHub. The website is run by the QUT Ecoacoustics Research Group to support bioacoustics and ecoacoustics research.
NASA officially has launched a new resource to help the public search and download out-of-this-world images, videos and audio files by keyword and metadata searches from NASA.gov. The NASA Image and Video Library website consolidates imagery spread across more than 60 collections into one searchable location. NASA Image and Video Library allows users to search, discover and download a treasure trove of more than 140,000 NASA images, videos and audio files from across the agency’s many missions in aeronautics, astrophysics, Earth science, human spaceflight, and more. Users can browse the agency’s most recently uploaded files, as well as discover historic and the most popularly searched images, audio files and videos. Other features include: Automatically scales the interface for mobile phones and tablets Displays the EXIF/camera data that includes exposure, lens used, and other information, when available from the original image Allows for easy public access to high resolution files All video includes a downloadable caption file NASA Image and Video Library’s Application Programmers Interface (API) allows automation of imagery uploads for NASA, and gives members of the public the ability to embed content in their own sites and applications. This public site runs on NASA’s cloud native “infrastructure-as-a-code” technology enabling on-demand use in the cloud.
Country
The Australian National Corpus collates and provides access to assorted examples of Australian English text, transcriptions, audio and audio-visual materials. Text analysis tools are embedded in the interface allowing analysis and downloads in *.CSV format.
Country
aviDa is the RDC for audio-visual data of empirical qualitative social research at the Department of General Sociology at the Technische Universität Berlin, developed in cooperation between the Technische Universität Berlin and the University of Bayreuth. aviDa aims at opening and sharing videographic research data since 2018.
The Henry A. Murray Research Archive is Harvard's endowed, permanent repository for quantitative and qualitative research data at the Institute for Quantitative Social Science, and provides physical storage for the entire IQSS Dataverse Network. Our collection comprises over 100 terabytes of data, audio, and video. We preserve in perpetuity all types of data of interest to the research community, including numerical, video, audio, interview notes, and other data. We accept data deposits through this web site, which is powered by our Dataverse Network software
Content type(s)
A place of living memory, the Phonotheque of the MMSH aims to bring together recordings of the sound heritage that have the value of ethnological, linguistic, historical, musicological or literary information on the Mediterranean area. It documents fields little covered by conventional sources, or completes them with the point of view of actors or witnesses. The collection holds more than 8000 hours of audio archives recorded since the late 1950s concerning all the humanities sciences.
Country
The Media Repository is a web-based digital asset management system to store, organize and share digital media files. Not only images and documents are directly supported – audio and video content is supported as well. The data can be re-used in other systems. The system manages a variety of file formats and metadata schemes. It stores and organizes media data and helps to manage workflows with them. Public web presentations are possible as well as collaborative work in restricted groups. The Media Repository helps both small teams and larger research projects in the management of media assets and their long-term storage.
The Language Archive at the Max Planck Institute in Nijmegen provides a unique record of how people around the world use language in everyday life. It focuses on collecting spoken and signed language materials in audio and video form along with transcriptions, analyses, annotations and other types of relevant material (e.g. photos, accompanying notes).
Country
The KiezDeutsch-Korpus (KiDKo) has been developed by project B6 (PI: Heike Wiese) of the collaborative research centre Information Structure (SFB 632) at the University of Potsdam from 2008 to 2015. KiDKo is a multi-modal digital corpus of spontaneous discourse data from informal, oral peer group situations in multi- and monoethnic speech communities. KiDKo contains audio data from self-recordings, with aligned transcriptions (i.e., at every point in a transcript, one can access the corresponding area in the audio file). The corpus provides parts-of-speech tags as well as an orthographically normalised layer (Rehbein & Schalowski 2013). Another annotation level provides information on syntactic chunks and topological fields. There are several complementary corpora: KiDKo/E (Einstellungen - "attitudes") captures spontaneous data from the public discussion on Kiezdeutsch: it assembles emails and readers' comments posted in reaction to media reports on Kiezdeutsch. By doing so, KiDKo/E provides data on language attitudes, language perceptions, and language ideologies, which became apparent in the context of the debate on Kiezdeutsch, but which frequently related to such broader domains as multilingualism, standard language, language prestige, and social class. KiDKo/LL ("Linguistic Landscape") assembles photos of written language productions in public space from the context of Kiezdeutsch, for instance love notes on walls, park benches, and playgrounds, graffiti in house entrances, and scribbled messages on toilet walls. Contains materials in following languages: Spanish, Italian, Greek, Kurdish, Swedish, French, Croatian, Arabic, Turkish. The corpus is available online via the Hamburger Zentrum für Sprachkorpora (HZSK) https://corpora.uni-hamburg.de/secure/annis-switch.php?instance=kidko .
Welcome to the UCLA Phonetics Lab Archive. For over half a century, the UCLA Phonetics Laboratory has collected recordings of hundreds of languages from around the world, providing source materials for phonetic and phonological research, of value to scholars, speakers of the languages, and language learners alike. The materials on this site comprise audio recordings illustrating phonetic structures from over 200 languages with phonetic transcriptions, plus scans of original field notes where relevant.
Content type(s)
The EVIA Digital Archive Project is a repository of ethnographic video recordings and an infrastructure of tools and systems supporting scholars in the ethnographic disciplines. The project focuses on the fields of ethnomusicology, folklore, anthropology, and dance ethnology.
Content type(s)
CUGIR is an active online repository in the National Spatial Data Clearinghouse program. CUGIR provides geospatial data and metadata for New York State, with special emphasis on those natural features relevant to agriculture, ecology, natural resources, and human-environment interactions. In order to provide the best possible access to geospatial data for New York State, CUGIR coordinates its activities with those of the New York State GIS Clearinghouse
The Buckeye Corpus of conversational speech contains high-quality recordings from 40 speakers in Columbus OH conversing freely with an interviewer. The speech has been orthographically transcribed and phonetically labeled. The audio and text files, together with time-aligned phonetic labels, are stored in a format for use with speech analysis software (Xwaves and Wavesurfer). Software for searching the transcription files is currently being written.
Country
Repository "Open Science Resource Atlas 2.0" aims to increase the accessibility, improve the quality and extend the reusability of science resources. Repository focuses on the digital sharing of resources of great importance to the field of science and economy. These include publications, scripts, lectures, 3D models, audio and video recordings, photos, input and output files of various computer programs, databases collecting data from various fields, machines, systems, language corpora and many others. The target group, apart from academics, students and doctoral students, is everyone interested, including entrepreneurs and, what is important and unique - disabled, blind, visually impaired and deaf people.
This project is an open invitation to anyone and everyone to participate in a decentralized effort to explore the opportunities of open science in neuroimaging. We aim to document how much (scientific) value can be generated from a data release — from the publication of scientific findings derived from this dataset, algorithms and methods evaluated on this dataset, and/or extensions of this dataset by acquisition and incorporation of new data. The project involves the processing of acoustic stimuli. In this study, the scientists have demonstrated an audiodescription of classic "Forrest Gump" to subjects, while researchers using functional magnetic resonance imaging (fMRI) have captured the brain activity of test candidates in the processing of language, music, emotions, memories and pictorial representations.In collaboration with various labs in Magdeburg we acquired and published what is probably the most comprehensive sample of brain activation patterns of natural language processing. Volunteers listened to a two-hour audio movie version of the Hollywood feature film "Forrest Gump" in a 7T MRI scanner. High-resolution brain activation patterns and physiological measurements were recorded continuously. These data have been placed into the public domain, and are freely available to the scientific community and the general public.
Country
Repository of the Faculty of Science is institutional repository that gathers, permanently stores and allows access to the results of scientific and intellectual property of the Faculty of Science, University of Zagreb. The objects that can be stored in the repository are research data, scientific articles, conference papers, theses, dissertations, books, teaching materials, images, video and audio files, and presentations. To improve searchability, all materials are described with predetermined set of metadata.
Country
The Norwegian Polar Institute is a governmental institution for scientific research, mapping and environmental monitoring in the Arctic and the Antarctic. The institute’s Polar Data Centre (NPDC) manages and provides access to scientific data, environmental monitoring data, and topographic and geological map data from the polar regions. The scientific datasets are ranging from human field observations, through in situ and moving sensor data, to remote sensing products. The institute's data holdings also include photographic images, audio and video records.
York Digital Library (YODL) is a University-wide Digital Library service for multimedia resources used in or created through teaching, research and study at the University of York. YODL complements the University's research publications, held in White Rose Research Online and PURE, and the digital teaching materials in the University's Yorkshare Virtual Learning Environment. YODL contains a range of collections, including images, past exam papers, masters dissertations and audio. Some of these are available only to members of the University of York, whilst other material is available to the public. YODL is expanding with more content being added all the time
figshare is the RDM system at the University. It is a cloud-based data repository that supports multiple file formats. Research data in the form of datasets, code, audio, images and more can be disseminated via the University's figshare. Citations can be traced for datasets (not just the final research output/article) and analytics will show who is looking at our research data around the world. figshare enables researchers to store research data in a secure way. The system is user-friendly, with easy access, and shareable with colleagues and collaborators on research projects. Where appropriate it enables researchers to make research data openly accessible.
Open Research Exeter (ORE) is the University of Exeter's repository for all types of research, including research papers, research data and theses. Research in ORE can be viewed and downloaded freely by anyone, anywhere: researchers, students, industry, business and the wider public. ORE's content includes journal articles, conference papers, working papers, reports, book chapters, videos, audio, images, multimedia research project outputs, raw data and analysed data. ORE's content is securely stored, managed and preserved to ensure free, permanent access.
The UC San Diego Library Digital Collections website gathers two categories of content managed by the Library: library collections (including digitized versions of selected collections covering topics such as art, film, music, history and anthropology) and research data collections (including research data generated by UC San Diego researchers).
The Macaulay Library is the world's largest and oldest scientific archive of biodiversity audio and video recordings. The library collects and preserves recordings of each species' behavior and natural history, to facilitate the ability of others to collect and preserve such recordings, and to actively promote the use of these recordings for diverse purposes spanning scientific research, education, conservation, and the arts. All archived analog recordings in the collection, going back to 1929.
The Text Laboratory provides assistance with databases, word lists, corpora and tailored solutions for language technology. We also work on research and development projects alone or in cooperation with others - locally, nationally and internationally. Services and tools: Word and frequency lists, Written corpora, Speech corpora, Multilingual corpora, Databases, Glossa Search Tool, The Oslo-Bergen Tagger, GREI grammar games, Audio files: dialects from Norway and America etc., Nordic Atlas of Language Structures (NALS) Journal, Norwegian in America, NEALT, Ethiopian Language Technology, Access to Corpora