Call for Datasets¤

How to contribute¤

This task invites the community to contribute high-quality Question Answering (QA) datasets to expand the TEXT2SPARQL benchmark suite. The goal is to enrich the ecosystem with diverse data sources, particularly focusing on low-resource languages, domain-specific knowledge graphs, and complex query structures.

We encourage submissions of both novel datasets and existing datasets that have not yet been integrated into standard benchmarks.

Participation Guidelines¤

Datasets must contain pairs of Natural Language (NL) questions and their corresponding SPARQL queries executable on a specific Knowledge Graph (e.g., DBpedia, Wikidata, or custom domain KGs). The question/answer pairs dataset can also be followed by a not yet published Knowledge Graph.

Focus Areas: We are particularly interested in:

Multilingualism: Datasets covering languages other than English.
Complexity: Questions requiring advanced SPARQL features (aggregations, filters, nesting).
Domain Specificity: Datasets focused on specialized domains (e.g., biomedical, legal, finance).

Format: The quesion/answers pairs should be published in the same format as Text2SPARQL benchmark dataset as follows:

Dataset

dataset:
  id: https://text2sparql.aksw.org/2025/dbpedia/
  prefix: db25
  defaultNamespace: http://dbpedia.org/
questions:
  - id: 1
    question:
      en: How many unique authors have written science fiction novels?
      es: ¿Cuántos autores únicos han escrito novelas de ciencia ficción?
    query:
      sparql: |
        SELECT DISTINCT COUNT(?author) WHERE {  
          ?x <http://dbpedia.org/ontology/literaryGenre> <http://dbpedia.org/resource/Science_fiction> .  
          ?x <http://dbpedia.org/ontology/author> ?author .
        }
...