Runtime Verification of Computer Vision Deep Neural Networks against Symbolic Constraints

for Degree: 
Contact Person: 
Status: 
Available

Abstract

Recent work has introduced simple techniques to evaluate compliance with symbolic rules for black-box deep neural networks (DNNs). This thesis should investigate and quantitatively compare, how compliant different computer vision DNNs are with respect to symbolic rules on real-world datasets, e.g., from the automated driving domain. The goal is to assess in how far existing verification testing techniques for DNNs are suitable to uncover remaining issues in the model's learned knowledge. Finally, the approach shall be evaluated as a runtime verification setup for DNNs, which can be post-hoc installed and raises an alarm in case of implausible outputs.

Problem Statement

Deep neural networks are broadly used in computer vision tasks, but yet too unreliable for use in safety critical applications like fully automated driving perception. The reason is that they may automatically learn unintuitive and wrong correlations from their training data that may lead to failures in rare situations (e.g., high occlusion). This makes it important to ensure that DNNs behave consistently with given intuition in form of symbolic rules on the desired outputs, for example "If there is a head, there should usually be a person" (isHead(region) => isPerson(region)). Techniques from concept-based explainable artificial intelligence (C-XAI; Lee et al. 2025) allow to post-hoc find representations of symbols (concepts) of interest, e.g., "head", within the internal representations of a trained deep neural network (DNN). As proposed by Schwalbe et al. (2022), this can be used to post-hoc attach classification or segmentation outputs for those concepts, even if the DNN has not directly been trained on them. Subsequently, it can be tested on a test set, whether the DNN outputs fulfill the (potentially fuzzy) logical constraints.

However, it so far has only been showcased on a very small setup. Therefore, it remains open, how effective this approach is in uncovering errors for different up-to-date vision DNNs, datasets, and rule sets. In other words: How many DNN failures arise from inconsistency with known rules?

Goals

  • Define a knowledge base of diverse rules applicable to vision tasks which serve as a test setup to test rule compliance.
  • Implement the verification testing setup for a selection of concurrent vision DNN architectures, rules, and datasets.
  • Conduct and evaluate a comparative study:
    • assess what are influence factors in DNN architecture, rule type and dataset for rule compliance
    • correlate rule compliance against quality as runtime monitor, i.e., ratio of false alarms against uncovered true errors

Approach

  • Subsequent steps for the comparison:
    • How do different DNN architectures compare (Convolutional DNNs / Vision Transformers; small / big; ...)?
    • How do different datasets compare (general like MS COCO vs. automotive like A2D2, ...)?
    • How do different tasks compare (object detection, semantic segmentation, ...)?
  • Setup for the extraction of symbols and relations from the DNN:
    • Rule base: For defining an exemplary rule base, it is recommended to start into a semantically rich domain like automated driving, for which plenty of intuitive rules and full ontologies are available (Giunchiglia et al. 2022).
    • The base method to extract symbols should be the C-XAI method described in Schwalbe et al. (2022), for which a rich code base is available from the team.
    • As symbols for the rule bases, simple object classes (e.g., street light, car, person), object parts (e.g., head, arm, steering wheel), and object attributes (e.g., red) can be used as a starting point. To probe their representations, existing datasets like ImageNet, the German Traffic Sign Datasets, or the very common BRODEN dataset can serve as a basis, later potentially extended by generated data.
    • As relations, simple hierarchical relations (isA), and 2D spatial relations (isPartOf) may serve as a starting point, which can be estimated from concept segmentations right away. These may be extended to 3D spatial relations using (separately) predicted or ground truth depth information.

 

Requirements

  • Solid programming skills in python and familiarity with the pytorch deep learning framework
  • Familiarity with machine learning using DNNs and logistic regression models
  • Familiarity with formalization of knowledge as logical rules
  • Basic understanding of continuous fuzzy (=multi-valued) logics

 

Literature

  • Giunchiglia, Eleonora, Mihaela Stoian, Salman Khan, Fabio Cuzzolin, and Thomas Lukasiewicz. 2022. “ROAD-R: The Autonomous Driving Dataset with Logical Requirements.” In IJCLR 2022 Workshops. Vienna, Austria. https://arxiv.org/abs/2210.01597.
  • Lee, Jae Hee, Georgii Mikriukov, Gesina Schwalbe, Stefan Wermter, and Diedrich Wolter. 2025. “Concept-Based Explanations in Computer Vision: Where Are We and Where Could We Go?” In Computer Vision – ECCV 2024 Workshops, edited by Alessio Del Bue, Cristian Canton, Jordi Pont-Tuset, and Tatiana Tommasi, 266–87. Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-92648-8_17.
  • Schwalbe, Gesina, Christian Wirth, and Ute Schmid. 2022. “Enabling Verification of Deep Neural Networks in Perception Tasks Using Fuzzy Logic and Concept Embeddings.” arXiv. https://doi.org/10.48550/arXiv.2201.00572. (Preprint)