Sean Jentz, Karolina Haara Löfstedt:
Management of Training Data for Deep Learning Applications: Requirements and Solutions,
summary, report, April 2024.
Abstract:
Queries in vScope fetch data from a graph database. This data is then transformed and displayed in various formats. The queries are structured hierarchically and are implemented as interdependent XML-formatted files. The files are contained in Git repositories and change requests are currently reviewed by developers with Git in a text-based format.
InfraSight Labs wants to simplify the process around query configuration in order to enable non-developers to take over the task. This thesis investigates the possibility of a tool that can catch changes to queries and visualize them at a higher level of abstraction than the diff function provided by Git.
We found that we can detect and classify changes to queries and their dependencies with the use of reflection and persisting annotations. This is exemplified with a proof of concept diffing tool. The tool serialized Java objects from XML-files and compared them using reflection, then used annotations provided to the fields to classify the impact of a change. The conclusion is that it is possible to represent changes made to query files at a higher level of abstraction than the text based Git diff function. Integrating the tool into the current product is left for future work..