Jakob Voigt, Business Informatics Master's student at the University of Rostock, will defend his Master's thesis on "The potential of training data quality improvements to increase the performance of deep learning based geospatial data classifiers" on 27 April 2022 at 09:00.
Supervisors and reviewers were Dr Sebastian Bader and Felix Holz from the University of Rostock.
The defence will take place in a virtual ZOOM room. All interested parties please register to send dial-in data by e-mail to Sebastian Bader (sebastian.baderuni-rostockde).
Abstract:
Deep convolutional neural networks (CNNs) are considered to be the quasi standard for semantic segmentation in computer vision. To generalize geospatial insights on remote sensing (RS) imagery, semantic segmentation is commonly used to detect a wide variety of objects. However, the quality of predictions generated by CNNs relies heavily on two input factors: the parameters of the models architecture and the input data that consists of images together with a set of labels containing the true classification or ground truth (GT). In order to adequately train a CNN, GT labels need to be procured beforehand which is often associated with intensive costs and time as well as bottlenecks in availability. In order to reduce the costs of generating geospatial insights as well as to overcome the bottlenecks in the availability of data, the most promising alternative to increase data volume might be to shift the focus towards increasing the quality of the class labels. In order to examine the impact of the quality of geospatial labels on the quality of predictions generated by CNNs in the case of building detection on RS images, a U-ResNet-50 network is trained with various label datasets of different quality levels. It is shown that missing and noisy labels have a significant impact on the quality of the predictions generated by the network. Moreover, it is shown for a specific example that it is possible to achieve just as precise predictions with one fortieth of the GT dataset as with a 40 times larger dataset lacking 20 percent of the labels.