This file was created by the TYPO3 extension publications --- Timezone: CEST Creation date: 2026-05-15 Creation time: 23:53:50 --- Number of references 13 inproceedings Towards Automatically Inferring Constraints to Identify Implicit Assumptions in Data Analysis 2026 1 10.1145/3786582.3786806 2026 IEEE/ACM 48th International Conference on Software Engineering (ICSE-NIER ’26) Florian Sihler Lars Pfrenger Oliver Gerstl Matthias Tichy inproceedings Statically Analyzing the Dataflow of R Programs The R programming language is primarily designed for statistical computing and mostly used by researchers without a background in computer science. R provides a wide range of dynamic features and peculiarities that are difficult to analyze statically like dynamic scoping and lazy evaluation with dynamic side effects. At the same time, the R ecosystem lacks sophisticated analysis tools that support researchers in understanding and improving their code. In this paper, we present a novel static dataflow analysis framework for the R programming language that is capable of handling the dynamic nature of R programs and produces the dataflow graph of given R programs. This graph can be essential in a range of analyses, including program slicing, which we implement as a proof of concept. The core analysis works as a stateful fold over a normalized version of the abstract syntax tree of the R program, which tracks (re-)definitions, values, function calls, side effects, external files, and a dynamic control flow to produce one dataflow graph per program. We evaluate the correctness of our analysis using output equivalence testing on a manually curated dataset of 779 sensible slicing points from executable real-world R scripts. Additionally, we use a set of systematic test cases based on the capabilities of the R language and the implementation of the R interpreter and measure the runtimes well as the memory consumption on a set of 4,230 real-world R scripts and 20,815 packages available on R’s package manager CRAN. Furthermore, we evaluate the recall of our program slicer, its accuracy using shrinking, and its improvement over the state of the art. We correctly analyze almost all programs in our equivalence test suite, preserving the identical output for 99.7% of the manually curated slicing points. On average, we require 576ms to analyze the dataflow and around 213kB to store the graph of a research script. This shows that our analysis is capable of analyzing real-world sources quickly and correctly. Our slicer achieves an average reduction of 84.8% of tokens indicating its potential to improve program comprehension. Konferenzbeitrag 2025 10 10.1145/3763087 Proceedings of the ACM on Programming Languages, OOPSLA 2025 1034-1062 Florian Sihler Matthias Tichy conference Explainability in Self-Adaptive Systems: A Systematic Literature Review 2025 9 9 1 10.1007/978-3-032-04200-2_19 Euromicro Conference on Software Engineering and Advanced Applications 2025 Raphael Straub Florian Sihler Ali Torbati Cong Wang Raffaela Groner Verena °­±ôö²õ Matthias Tichy inproceedings On the Anatomy of Real-World R Code for Static Analysis (Extended Abstract) 2025 2 2944-7682 10.18420/se2025-27 Software Engineering 2025 Gesellschaft für Informatik, Bonn Florian Sihler Lukas Pietzschmann Raphael Straub Matthias Tichy Andor Diera Abdelhalim Dahou inproceedings flowR: A Static Program Slicer for R Context Many researchers rely on the R programming language to perform their statistical analyses and visualizations in the form of R scripts. However, recent research and experience show, that many of these scripts contain problems. From being hard to comprehend by combining several analyses and plots into a single source file to being non-reproducible, with a lack of analysis tools supporting the writing of correct and maintainable code. Objective In this work, we address the problem of comprehending and maintaining R scripts by proposing flowR, a program slicer and static dataflow analyzer for the R programming language, which can be integrated directly into Visual Studio Code. Given a set of variables of interest, like the generation of a single figure in a script, flowR automatically reduces the program to the parts relevant for the output of interest, like the value of a variable. Method First, we use static program analysis to construct a detailed dataflow graph of the R script. The analysis supports loops, function calls, side effects, sourcing external files, and even redefinitions of R's primitive constructs. Subsequently, we calculate the program slice by solving a reachability problem on the graph, collecting all required parts and presenting them to the user. Results Providing several interactive ways of slicing the program, we require an average of 16 ms to calculate the slice on a given dataflow graph, reducing the code by around 94% of tokens. The demonstration video is available at https://youtu.be/Zgq6rnbvvhk. For the full source code and extensive documentation, refer to https://github.com/Code-Inspect/flowr. 2024 10 27 1 10.1145/3691620.3695359 ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (Tool Demonstrations) program analysis R https://github.com/flowr-analysis/flowr Florian Sihler Matthias Tichy inproceedings Improving the Comprehension of R Programs by Hybrid Dataflow Analysis Context Comprehending code is crucial in all areas of software development, with many existing supporting tools and techniques for various languages. However, for R, a widely used programming language, especially in the field of statistical computing, the support is limited. R offers a large number of packages as well as dynamic features, which make it challenging to analyze and understand. Objective We aim to (i) gain a better understanding of how R is used in the real world, (ii) devise better analysis strategies for R, which are able to handle its dynamic nature, and (iii) improve the comprehension of R scripts by using these analyses, providing new methods and procedures applicable to program comprehension in general. Method In eight contributions, we analyze feature usage in R scripts, develop a new static dataflow analysis intertwining control and dataflow, and more. We enable and propose new techniques for program comprehension using a combination of static and dynamic analysis. 2024 10 27 1 10.1145/3691620.3695603 ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (Doctoral Symposium) doctoral symposium program analysis https://dl.acm.org/doi/abs/10.1145/3691620.3695603 11.11.2024 Florian Sihler inproceedings Exploring the Effectiveness of Abstract Syntax Tree Patterns for Algorithm Recognition 2024 6 979-8-3503-6646-4 10.1109/ICCQ60895.2024.10576984 4. International Conference on Code Quality (ICCQ) https://ieeexplore.ieee.org/document/10576984 Denis ±·±ð³Ü³¾Ã¼±ô±ô±ð°ù Florian Sihler Raphael Straub Matthias Tichy inproceedings On the Anatomy of Real-World R Code for Static Analysis 1 2024 1 10.1145/3643991.3644911 21st International Conference on Mining Software Repositories (MSR '24) https://arxiv.org/abs/2401.16228 https://arxiv.org/pdf/2401.16228.pdf Florian Sihler Lukas Pietzschmann Raphael Straub Matthias Tichy Andor Diera Abdelhalim Dahou inproceedings GenCodeSearchNet: A Benchmark Test Suite for Evaluating Generalization in Programming Language Understanding 1 2023 10 10.48550/arXiv.2311.09707 GenBench 2023 Workshop Andor Diera Abdelhalim Dahou Lukas Galke Fabian Karl Florian Sihler Ansgar Scherp thesis Constructing a Static Program Slicer Specifically for R Programs Masterarbeit 2023 8 10.18725/OPARU-50107 University of Ulm, Germany Prof. Matthias Tichy Florian Sihler article One-Way Model Transformations in the Context of the Technology-Roadmapping Tool IRIS 2023 7 10.5381/jot.2023.22.2.a2 Journal of Object Technology The 19th European Conference on Modelling Foundations and Applications (ECMFA 2023) Florian Sihler Jakob Pietron Matthias Tichy article A domain-specific language for modeling and analyzing solution spaces for technology roadmapping 2022 2 10.1016/j.jss.2021.111094 Journal of Systems & Software (JSS) Alexander Breckel Jakob Pietron Katharina Juhnke Florian Sihler Matthias Tichy thesis One-way Model Transformations Bachelorarbeit 2022 1 10.18725/OPARU-47275 Universität Ulm Florian Sihler