Filter
Reset all

Subjects

Content Types

Countries

AID systems

API

Certificates

Data access

Data access restrictions

Database access

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type

Keywords

Metadata standards

PID systems

Provider types

Quality management

Repository languages

Software

Syndications

Repository types

Versioning

  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 28 result(s)
Competence Centre IULA-UPF-CC CLARIN manages, disseminates and facilitates this catalogue, which provides access to reference information on the use of language technology projects and studies in different disciplines, especially with regard to Humanities and Social Sciences. The Catalog relates information that is organized by Áreas, (disciplines and research topics), Projects (of research that use or have used language technologies), Tasks (that make the tools), Tools (of language technology), Documentation (articles regarding the tools and how they are used) and resources such as Corpora (collections of annotated texts) and Lexica (collections of words for different uses).
The Eurac Research CLARIN Centre (ERCC) is a dedicated repository for language data. It is hosted by the Institute for Applied Linguistics (IAL) at Eurac Research, a private research centre based in Bolzano, South Tyrol. The Centre is part of the Europe-wide CLARIN infrastructure, which means that it follows well-defined international standards for (meta)data and procedures and is well-embedded in the wider European Linguistics infrastructure. The repository hosts data collected at the IAL, but is also open for data deposits from external collaborators.
Språkbanken was established in 1975 as a national center located in the Faculty of Arts, University of Gothenburg. Allén's groundbreaking corpus linguistic research resulted in the creation of one of the first large electronic text corpora in another language than English, with one million words of newspaper text. The task of Språkbanken is to collect, develop, and store (Swedish) text corpora, and to make linguistic data extracted from the corpora available to researchers and to the public.
Country
Lithuanian Data Archive for Social Sciences and Humanities (LiDA) is a virtual digital infrastructure for SSH data and research resources acquisition, long-term preservation and dissemination. All the data and research resources are documented in both English and Lithuanian according to international standards. Access to the resources is provided via Dataverse repository. LiDA curates different types of resources and they are published into catalogues according to the type: Survey Data, Aggregated Data (including Historical Statistics), Encoded Data (including News Media Studies), and Textual Data. Also, LiDA holds collections of social sciences and humanities data deposited by Lithuanian science and higher education institutions and Lithuanian state institutions (Data of Other Institutions). LiDA is hosted by the Centre for Data Analysis and Archiving of Kaunas University of Technology (data.ktu.edu).
CLARIN is a European Research Infrastructure for the Humanities and Social Sciences, focusing on language resources (data and tools). It is being implemented and constantly improved at leading institutions in a large and growing number of European countries, aiming at improving Europe's multi-linguality competence. CLARIN provides several services, such as access to language data and tools to analyze data, and offers to deposit research data, as well as direct access to knowledge about relevant topics in relation to (research on and with) language resources. The main tool is the 'Virtual Language Observatory' providing metadata and access to the different national CLARIN centers and their data.
Country
The Universitat de Barcelona Digital Repository is an institutional resource containing open-access digital versions of publications related to the teaching, research and institutional activities of the UB's teaching staff and other members of the university community, including research data.
Lithuania became a full member of CLARIN ERIC in January of 2015 and soon CLARIN-LT consortium was founded by three partner universities: Vytautas Magnus University, Kaunas Technology University and Vilnius University. The main goal of the consortium is to become a CLARIN B centre, which will be able to serve language users in Lithuania and Europe for storing and accessing language resources.
The CLARIN-D Centre CEDIFOR provides a repository for long-term storage of resources and meta-data. Resources hosted in the repository stem from research of members as well as associated research projects of CEDIFOR. This includes software and web-services as well as corpora of text, lexicons, images and other data.
Country
GAMS is an OAIS compliant asset management system for the management, publication and long-term archiving of digital resources from the Humanities.
ANPERSANA is the digital library of IKER (UMR 5478), a research centre specialized in Basque language and texts. The online library platform receives and disseminates primary sources of data issued from research in Basque language and culture. As of today, two corpora of documents have been published. The first one, is a collection of private letters written in an 18th century variety of Basque, documented in and transcribed to modern standard Basque. The discovery of the collection, named Le Dauphin, has enabled the emerging of new questions about the history and sociology of writing in the domain of minority languages, not only in France, but also among the whole Atlantic Arc. The second of the two corpora is a selection of sound recordings about monodic chant in the Basque Country. The documents were collected as part of a PhD thesis research work that took place between 2003 and 2012. It's a total of 50 hours of interviews with francophone and bascophone cultural representatives carried out at either their workplace of the informers or in public areas. ANPERSANA is bundled with an advanced search engine. The documents have been indexed and geo-localized on an interactive map. The platform is engaged with open access and all the resources can be uploaded freely under the different Creative Commons (CC) licenses.
The University research data repository – BathSPAdata – enables staff to upload their research data into a secure space, and to share this data publicly where appropriate, or where funders or publishers require this as part of their conditions. Resources and toolkits for external use can be made available through this forum, and can be used by Schools, policy makers, business and industry, and the cultural sector.
The Language Bank features text and speech corpora with different kinds of annotations in over 60 languages. There is also a selection of tools for working with them, from linguistic analyzers to programming environments. Corpora are also available via web interfaces, and users can be allowed to download some of them. The IP holders can monitor the use of their resources and view user statistics.
Codex Sinaiticus is one of the most important books in the world. Handwritten well over 1600 years ago, the manuscript contains the Christian Bible in Greek, including the oldest complete copy of the New Testament. The Codex Sinaiticus Project is an international collaboration to reunite the entire manuscript in digital form and make it accessible to a global audience for the first time. Drawing on the expertise of leading scholars, conservators and curators, the Project gives everyone the opportunity to connect directly with this famous manuscript.
Country
The program "Humanist Virtual Libraries" distributes heritage documents and pursues research associating skills in human sciences and computer science. It aggregates several types of digital documents: A selection of facsimiles of Renaissance works digitized in the Central Region and in partner institutions, the Epistemon Textual Database, which offers digital editions in XML-TEI, and Transcripts or analyzes of notarial minutes and manuscripts
Country
clarin:el is the Greek national network of language resources, a nation-wide Research Infrastructure devoted to the sustainable storage, sharing, dissemination and preservation of language resources. CLARIN EL infrastructure, which is a Greek nation-wide Research Infrastructure devoted to the sustainable storage, sharing, dissemination and preservation of language resources (LRs) and aims at increasing access to and augmentation of such resources at a national scale and beyond. It is an open, integrated, secure and interoperable storage, sharing and processing infrastructure for LRs (datasets, tools and processing services) for all domains domains and disciplines where language plays a critical role, notably. CLARIN EL is implemented in the framework of the CLARIN Attiki, national project in support of ESFRI/2006 Research Infrastructures.
ILC-CNR for CLARIN-IT repository is a library for linguistic data and tools. Including: Text Processing and Computational Philology; Natural Language Processing and Knowledge Extraction; Resources, Standards and Infrastructures; Computational Models of Language Usage. The studies carried out within each area are highly interdisciplinary and involve different professional skills and expertises that extend across the disciplines of Linguistics, Computational Linguistics, Computer Science and Bio-Engineering.
Språkbanken is a collection of Norwegian language technology resources, and a national infrastructure for language technology and research. Our mandate is to collect and develop language resources, and to make these available for researchers, students and the ICT industry which works with the development of language-based ICT solutions. Språkbanken was established as a language policy initiative, designed to ensure that language technology solutions based on the Norwegian language will be developed, and thereby prevent domain loss of Norwegian in technology-dependent areas, cf. Mål og meining (Report 35, 2007 – 2008). As of today the collection contains resources in both Norwegian Bokmål and Nynorsk, as well as in Swedish, Danish and Norwegian Sign Language (NTS).
Content type(s)
The vocabulary of forenames is a simple, multilingual vocabulary (i.e. without hierarchies etc.) in which the forenames of the project partners’ persons and the forenames’ spelling variants, both historical and dialectal, are documented with references or passages. As a rule, each forename is assigned one or more persons bearing that name. There is a hit list of the most frequent forenames between 200 BC and AD 2016 as well as a visualisation in word clouds and the occurrences in a timeline.
HunCLARIN is a strategic research infrastructure of Hungary’s leading knowledge centres involved in R&D in speech- and language processing. It contains linguistic resources and tools that form the basis of research. The infrastructure has obtained an “SKI” qualification (Strategic Research Infrastructure) in 2010, and has been significantly expanded since. Currently comprising 36 members, the infrastructure includes several general- and specific-purpose text corpora, different language processing tools and analysers, linguistic databases as well as ontologies. RIL HAS was a co-founder of the European CLARIN project, which aims at supporting humanities and social sciences research with the help of language technology and by making digital linguistic resources more easily available. In accordance with these goals HunClarin makes the research infrastructures developed by the respective centres directly accessible for researchers through a common network entry point. A general goal of the infrastructure is to realise the interoperability of the collected research infrastructures and to enable comparing the performance of the respective alternatives and to coordinate different foci in R&D. The coordinator and contact person of the infrastructure is Tamás Váradi, RIL HAS.
CLARIN.SI is the Slovenian node of the European CLARIN (Common Language Resources and Technology Infrastructure) Centers. The CLARIN.SI repository is hosted at the Jožef Stefan Institute and offers long-term preservation of deposited linguistic resources, along with their descriptive metadata. The integration of the repository with the CLARIN infrastructure gives the deposited resources wide exposure, so that they can be known, used and further developed beyond the lifetime of the projects in which they were produced. Among the resources currently available in the CLARIN.SI repository are the multilingual MULTEXT-East resources, the CC version of Slovenian reference corpus Gigafida, the morphological lexicon Sloleks, the IMP corpora and lexicons of historical Slovenian, as well as many other resources for a variety of languages. Furthermore, several REST-based web services are provided for different corpus-linguistic and NLP tasks.
CLARIN-UK is a consortium of centres of expertise involved in research and resource creation involving digital language data and tools. The consortium includes the national library, and academic departments and university centres in linguistics, languages, literature and computer science.