SemTech4STLD

Aims and Scope

The rapid growth of online available scientific, technical, and legal data such as patents, reports, articles, etc. has made the large-scale analysis and processing of such data a crucial task. Today, scientists, patent experts, inventors, and other information professionals (e.g., information scientists, lawyers, etc.) contribute to this data every day by publishing articles, writing technical reports, or patent applications.

It is a challenging task to process, analyze, and explore documents due to their length, the use of domain-specific vocabulary, and the complexity introduced by targeting various scientific fields and domains. Documents are semi-structured and cover unstructured textual parts as well as structured parts such as tables, mathematical formulas, diagrams, and domain-specific information such as chemical names, bio-sequences, etc.

Such kind of information brings complexity in processing such documents; however, data is the lifeblood of many applications, and its preservation, analysis, enrichment, and use are key for applications in several domains. In order to benefit from the scientific-technical knowledge present in such documents, e.g., for decision-making or for professional search and analytics, there is an urgent need for analyzing, enriching, and linking such data by employing state-of-the-art Semantic Web technologies and AI methods.

However, as they are heterogeneous and are written using domain-specific terminology applying the existing semantic technologies is not straightforward. To address the challenges mentioned above, Semantic Web Technologies, Natural Language Processing (NLP) techniques, Deep Neural Networks (DNN), and Large Language Models (LLMs) must be leveraged in order to provide efficient and effective solutions for creating easily accessible and machine-understandable knowledge.

Workshop Topics

The workshop accepts contributions in all topics related to semantic web technologies and deep learning focused (but not limited) to:

Data Collection

Leveraging Large Language Models (LLMs) for generating scientific, technical, and legal data.
New tools and systems for capturing scientific, technical, and legal data such as scientific articles, patent publications, etc.
Procedures and tools for storing, sharing, and preserving data.
Collecting and sharing data sets such as benchmarks, etc.
Pipelines and protocols to capture peculiarities from data.
Employing Semantic Web Technologies to represent and preserve sensitive data in terms of ethics, privacy, security, trust, etc.

Novel Semantic Technologies for scientific, technical and legal data

Ontologies and annotation schema to model such data.
Annotation, linking and disambiguation of the data.
Knowledge graph construction.
LLMs to generate metadata, vocabularies, ontologies, and semantic models for specific data.

Applications for patents, scientific, technical and legal data by exploiting semantic technologies

Exploiting knowledge graphs to drive document similarity, question answering, search etc.
Recommender systems.
Semantic content-based retrieval.
Natural language processing techniques for classification, summarization, etc.
Exploratory search using semantic technologies on scientific, technical, and legal data.
Key enabling tools (also based on LLMs) for semantic technologies on specific data and domains.
Applications based on Generative AI and LLMs.
Lessons learned or/and use cases both from academia and industry around semantic models and LLMs for data in specific domains.

Submission

The submissions must be in English and adhere to the CEUR-WS one-column template (see Session 2: The New CEURART Style). The papers should be submitted as PDF files to EasyChair. The review process will be single-blind. Please be aware that at least one author per paper must be registered and attend the workshop to present the work and that ESWC is a 100% in person conference.

We will consider three different submission types:

Full Research Papers (10-12 pages) should be clearly placed with respect to the state of the art and state the contribution of the proposal in the domain of application, even if presenting preliminary results. In particular, research papers should describe the methodology in detail, experiments should be repeatable, and a comparison with the existing approaches in the literature is encouraged.
Short Papers (5-9 pages) should describe significant novel work in progress. Compared to full papers, their contribution may be narrower in scope, be applied to a narrower set of application domains, or have weaker empirical support than that expected for a full paper. Submissions likely to generate discussions in new and emerging areas of legal data are encouraged.
Position or Industry Papers (2-5 pages) should introduce new points of view in the workshop topics or summarize the experience of a group in the field.

Submissions should not exceed the indicated number of pages, including any diagrams and references.

Each submission will be reviewed by three independent reviewers on the basis of relevance for the workshop, novelty/originality, significance, technical quality and correctness, quality and clarity of presentation, quality of references and reproducibility.

The accepted papers will be available on the Workshop website. The proceedings will be published in a CEUR-WS volume and consequently indexed on Google Scholar, DBLP, and Scopus.

Program

SemTech4STLD workshop will take place on June 1st, 2025.

Timing	Content
9:00 9:10	Opening & Welcome
9:10 10:00	Keynote and Q&A on Evaluation Challenges in Using Generative AI for Science & Technical Content Speaker: Prof. Dr. Paul Groth INDE Lab, University of Amsterdam Abstract: Foundation Models show impressive results in a wide-range of tasks on scientific and legal content from information extraction to question answering and even literature synthesis. However, standard evaluation approaches (e.g. comparing to ground truth) often do not seem to work. Qualitatively the results look great but quantitive scores do not align with these observations. In this talk, I discuss the challenges we have faced in our lab in evaluation. I then outline potential routes forward. Short Bio: Paul Groth is Professor of Algorithmic Data Science at the University of Amsterdam where he leads the Intelligent Data Engineering Lab (INDElab). He holds a Ph.D. in Computer Science from the University of Southampton (2007) and has done research at the University of Southern California, the Vrije Universiteit Amsterdam and Elsevier Labs. His research focuses on intelligent systems for dealing with large amounts of diverse contextualized knowledge with a particular focus on web and science applications. This includes research in data provenance, data integration and knowledge sharing. Paul is scientific director of the UvA’s Data Science Center. Additionally, he is co-scientific director of two Innovation Center for Artificial Intelligence (ICAI) labs: The AI for Retail (AIR) Lab - a collaboration between UvA and Ahold Delhaize; and the Discovery Lab - a collaboration between Elsevier, the University of Amsterdam and VU University Amsterdam. Previously, Paul led the design of a number of large scale data integration and knowledge graph construction efforts in the biomedical domain. Paul was co-chair of the W3C Provenance Working Group that created a standard for provenance interchange. He has also contributed to the emergence of community initiatives to build a better scholarly ecosystem including altmetrics and the FAIR data principles. Paul is co-author of “Provenance: an Introduction to PROV” and “The Semantic Web Primer: 3rd Edition” as well as numerous academic articles. You can find him on twitter: @pgroth .
10:00 10:15 Session I	Paper I: Enabling Natural Language Access to BIM Models with AI and Knowledge Graphs Andrea Ibba, Ruben Alonso and Diego Reforgiato Recupero, (12 min + 5 Q&A)
10:30 11:00	Coffee Break
11:00 12:25 Session II	Paper II: Biomedical Entity Linking with Triple-aware Pre-Training, Xi Yan, Cedric Möller and Ricardo Usbeck, (12 min + 5 Q&A) Paper III: Evaluating LLMs for Named Entity Recognition in Scientific Domain with Fine-Tuning and Few-Shot Learning Davide Buscaldi, Danilo Dessì, Francesco Osborne, Davide Piras and Diego Reforgiato Recupero, (16 min + 5 Q&A) Paper IV: Benchmarking Large Language Models for Sustainable Development Goals Classification: Evaluating In-Context Learning and Fine-Tuning Strategies Andrea Cadeddu, Alessandro Chessa, Vincenzo De Leo, Gianni Fenu, Enrico Motta, Francesco Osborne, Diego Reforgiato Recupero, Angelo Salatino and Luca Secchi, (12 min + 5 Q&A) Paper V: Taming Hallucinations: A Semantic Matching Evaluation Framework for LLM-Generated Ontologies Nadeen Fathallah, Steffen Staab and Alsayed Algergawy, (16 min + 5 Q&A) Paper V & VI (Posters): Context-Aware Explanations: Leveraging Knowledge Graphs for Adaptive Explainability in Dynamic Environments Erick Mendez Guzman, Rima Dessi', Alya Alshaami, Amna Alowais, Hamda Alhammadi, Nada Alzarooni, Weam N.A Jarbou and Zarak Khan Leveraging Knowledge Graphs and Generative AI for Augmented Research Paper Retrieval Rima Dessi', Erick Mendez Guzman, Alya Alshaami, Amna Alowais, Hamda Alhammadi, Nada Alzarooni, Weam N.A Jarbou and Zarak Khan
12:25 12:30	Closing Remarks

Committees

Workshop Chairs

Rima Dessi' - Higher College of Technologies (United Arab Emirates)
Jeenu Joy - FIZ-Karlsruhe (Germany)
Danilo Dessi' - University of Sharjah (United Arab Emirates)
Francesco Osborne - Knowledge Media Institute - The Open University (United Kingdom)
Hidir Aras - FIZ-Karlsruhe (Germany)

Program Committee (To be confirmed)

Rubén Alonso - R2M Solution Srl (Italy)
Ahmad Alrifai - FIZ Karlsruhe, KIT-AIFB (Germany)
Pablo Calleja - UPM Ontology Engineering Group (Spain)
Mathieu D'Aquin - LORIA, University of Lorraine (France)
Susmita Gangopadhyay - GESIS Leibniz Institute for the Social Sciences (Germany)
Rene Hackl-Sommer - DeepL SE (Germany)
Inma Hernandez - University of Seville (Spain)
Fabian Hoppe - Vrije Universiteit Amsterdam (The Netherlands)
M. Taimoor Khan - GESIS Leibniz Institute for the Social Sciences (Germany)
Mirko Marras - University of Cagliari (Italy)
Giacomo Medda - University of Cagliari (Italy)
Angelo Antonio Salatino - The Open University (United Kingdom)
Lise Stork - Vrije Universiteit Amsterdam (The Netherlands)
Sabine Wehnert - Otto-von-Guericke-Universität Magdeburg (Germany)
Lei Zhang - FIZ Karlsruhe – Leibniz Institute for Information Infrastructure (Germany)

Third International Workshop on
Semantic Technologies and Deep Learning Models for Scientific,
Technical and Legal Data
SemTech4STLD
June 1st, 2025 / Portoroz, Slovenia

held at ESWC 2025

(Photo: Getty Images)

Aims and Scope

Abstract deadline

February 28th, 2025

Paper deadline

March 6th, 2025

March 16th, 2024

Notifications

April 3rd, 2025

April 11th, 2025

Camera-ready Paper

April 24th, 2025