Critical analysis on the reproducibility of visual quality assessment using deep features
2022, Götz-Hahn, Franz, Hosu, Vlad, Saupe, Dietmar
Data used to train supervised machine learning models are commonly split into independent training, validation, and test sets. This paper illustrates that complex data leakage cases have occurred in the no-reference image and video quality assessment literature. Recently, papers in several journals reported performance results well above the best in the field. However, our analysis shows that information from the test set was inappropriately used in the training process in different ways and that the claimed performance results cannot be achieved. When correcting for the data leakage, the performances of the approaches drop even below the state-of-the-art by a large margin. Additionally, we investigate end-to-end variations to the discussed approaches, which do not improve upon the original.
Large-scale crowdsourced subjective assessment of picturewise just noticeable difference
2022, Lin, Hanhe, Chen, Guangan, Jenadeleh, Mohsen, Hosu, Vlad, Reips, Ulf-Dietrich, Hamzaoui, Raouf, Saupe, Dietmar
The picturewise just noticeable difference (PJND) for a given image, compression scheme, and subject is the smallest distortion level that the subject can perceive when the image is compressed with this compression scheme. The PJND can be used to determine the compression level at which a given proportion of the population does not notice any distortion in the compressed image. To obtain accurate and diverse results, the PJND must be determined for a large number of subjects and images. This is particularly important when experimental PJND data are used to train deep learning models that can predict a probability distribution model of the PJND for a new image. To date, such subjective studies have been carried out in laboratory environments. However, the number of participants and images in all existing PJND studies is very small because of the challenges involved in setting up laboratory experiments. To address this limitation, we develop a framework to conduct PJND assessments via crowdsourcing. We use a new technique based on slider adjustment and a flicker test to determine the PJND. A pilot study demonstrated that our technique could decrease the study duration by 50% and double the perceptual sensitivity compared to the standard binary search approach that successively compares a test image side by side with its reference image. Our framework includes a robust and systematic scheme to ensure the reliability of the crowdsourced results. Using 1,008 source images and distorted versions obtained with JPEG and BPG compression, we apply our crowdsourcing framework to build the largest PJND dataset, KonJND-1k (Konstanz just noticeable difference 1k dataset). A total of 503 workers participated in the study, yielding 61,030 PJND samples that resulted in an average of 42 samples per source image. The KonJND-1k dataset is available at http://database.mmsp-kn.de/konjnd-1k-database.html.
KonVid-150k : A Dataset for No-Reference Video Quality Assessment of Videos in-the-Wild
2021, Götz-Hahn, Franz, Hosu, Vlad, Lin, Hanhe, Saupe, Dietmar
Video quality assessment (VQA) methods focus on particular degradation types, usually artificially induced on a small set of reference videos. Hence, most traditional VQA methods under-perform in-the-wild. Deep learning approaches have had limited success due to the small size and diversity of existing VQA datasets, either artificial or authentically distorted. We introduce a new in-the-wild VQA dataset that is substantially larger and diverse: KonVid-150k. It consists of a coarsely annotated set of 153,841 videos having five quality ratings each, and 1,596 videos with a minimum of 89 ratings each. Additionally, we propose new efficient VQA approaches (MLSP-VQA) relying on multi-level spatially pooled deep-features (MLSP). They are exceptionally well suited for training at scale, compared to deep transfer learning approaches. Our best method, MLSP-VQA-FF, improves the Spearman rank-order correlation coefficient (SRCC) performance metric on the commonly used KoNViD-1k in-the-wild benchmark dataset to 0.82. It surpasses the best existing deep-learning model (0.80 SRCC) and hand-crafted feature-based method (0.78 SRCC). We further investigate how alternative approaches perform under different levels of label noise, and dataset size, showing that MLSP-VQA-FF is the overall best method for videos in-the-wild. Finally, we show that the MLSP-VQA models trained on KonVid-150k sets the new state-of-the-art for cross-test performance on KoNViD-1k and LIVE-Qualcomm with a 0.83 and 0.64 SRCC, respectively. For KoNViD-1k this inter-dataset testing outperforms intra-dataset experiments, showing excellent generalization.
Blind Quality Assessment of Iris Images Acquired in Visible Light for Biometric Recognition
2020-03, Jenadeleh, Mohsen, Pedersen, Marius, Saupe, Dietmar
Image quality is a key issue affecting the performance of biometric systems. Ensuring the quality of iris images acquired in unconstrained imaging conditions in visible light poses many challenges to iris recognition systems. Poor-quality iris images increase the false rejection rate and decrease the performance of the systems by quality filtering. Methods that can accurately predict iris image quality can improve the efficiency of quality-control protocols in iris recognition systems. We propose a fast blind/no-reference metric for predicting iris image quality. The proposed metric is based on statistical features of the sign and the magnitude of local image intensities. The experiments, conducted with a reference iris recognition system and three datasets of iris images acquired in visible light, showed that the quality of iris images strongly affects the recognition performance and is highly correlated with the iris matching scores. Rejecting poor-quality iris images improved the performance of the iris recognition system. In addition, we analyzed the effect of iris image quality on the accuracy of the iris segmentation module in the iris recognition system.
Crowdsourced Quality Assessment of Enhanced Underwater Images : a Pilot Study
2022, Lin, Hanhe, Men, Hui, Yan, Yijun, Ren, Jinchang, Saupe, Dietmar
Underwater image enhancement (UIE) is essential for a high-quality underwater optical imaging system. While a number of UIE algorithms have been proposed in recent years, there is little study on image quality assessment (IQA) of enhanced underwater images. In this paper, we conduct the first crowdsourced subjective IQA study on enhanced underwater images. We chose ten state-of-the-art UIE algorithms and applied them to yield enhanced images from an underwater image benchmark. Their latent quality scales were reconstructed from pair comparison. We demonstrate that the existing IQA metrics are not suitable for assessing the perceived quality of enhanced underwater images. In addition, the overall performance of 10 UIE algorithms on the benchmark is ranked by the newly proposed simulated pair comparison of the methods.
Subjective annotation for a frame interpolation benchmark using artefact amplification
2020-12, Men, Hui, Hosu, Vlad, Lin, Hanhe, Bruhn, Andrés, Saupe, Dietmar
Current benchmarks for optical flow algorithms evaluate the estimation either directly by comparing the predicted flow fields with the ground truth or indirectly by using the predicted flow fields for frame interpolation and then comparing the interpolated frames with the actual frames. In the latter case, objective quality measures such as the mean squared error are typically employed. However, it is well known that for image quality assessment, the actual quality experienced by the user cannot be fully deduced from such simple measures. Hence, we conducted a subjective quality assessment crowdscouring study for the interpolated frames provided by one of the optical flow benchmarks, the Middlebury benchmark. It contains interpolated frames from 155 methods applied to each of 8 contents. For this purpose, we collected forced-choice paired comparisons between interpolated images and corresponding ground truth. To increase the sensitivity of observers when judging minute difference in paired comparisons we introduced a new method to the field of full-reference quality assessment, called artefact amplification. From the crowdsourcing data (3720 comparisons of 20 votes each) we reconstructed absolute quality scale values according to Thurstone’s model. As a result, we obtained a re-ranking of the 155 participating algorithms w.r.t. the visual quality of the interpolated frames. This re-ranking not only shows the necessity of visual quality assessment as another evaluation metric for optical flow and frame interpolation benchmarks, the results also provide the ground truth for designing novel image quality assessment (IQA) methods dedicated to perceptual quality of interpolated images. As a first step, we proposed such a new full-reference method, called WAE-IQA, which weights the local differences between an interpolated image and its ground truth.
TranSalNet : Towards perceptually relevant visual saliency prediction
2022, Lou, Jianxun, Lin, Hanhe, Marshall, David, Saupe, Dietmar, Liu, Hantao
Convolutional neural networks (CNNs) have significantly advanced computational modelling for saliency prediction. However, accurately simulating the mechanisms of visual attention in the human cortex remains an academic challenge. It is critical to integrate properties of human vision into the design of CNN architectures, leading to perceptually more relevant saliency prediction. Due to the inherent inductive biases of CNN architectures, there is a lack of sufficient long-range contextual encoding capacity. This hinders CNN-based saliency models from capturing properties that emulate viewing behaviour of humans. Transformers have shown great potential in encoding long-range information by leveraging the self-attention mechanism. In this paper, we propose a novel saliency model that integrates transformer components to CNNs to capture the long-range contextual visual information. Experimental results show that the transformers provide added value to saliency prediction, enhancing its perceptual relevance in the performance. Our proposed saliency model using transformers has achieved superior results on public benchmarks and competitions for saliency prediction models.
KonIQ++ : Boosting No-Reference Image Quality Assessment in the Wild by Jointly Predicting Image Quality and Defects
2021, Su, Shaolin, Hosu, Vlad, Lin, Hanhe, Zhang, Yanning, Saupe, Dietmar
Although image quality assessment (IQA) in-the-wild has been researched in computer vision, it is still challenging to precisely estimate perceptual image quality in the presence of real-world complex and composite distortions. In order to improve machine learning solutions for IQA, we consider side information denoting the presence of distortions besides the basic quality ratings in IQA datasets. Specifically, we extend one of the largest in-the-wild IQA databases, KonIQ-10k, to KonIQ++, by collecting distortion annotations for each image, aiming to improve quality prediction together with distortion identification. We further explore the interactions between image quality and distortion by proposing a novel IQA model, which jointly predicts image quality and distortion by recurrently refining task-specific features in a multi-stage fusion framework. Our dataset KonIQ++, along with the model, boosts IQA performance and generalization ability, demonstrating its potential for solving the challenging authentic IQA task. The proposed model can also accurately predict distinct image defects, suggesting its application in image processing tasks such as image colorization and deblurring.
Subjective Assessment of Global Picture-Wise Just Noticeable Difference
2020-07, Lin, Hanhe, Jenadeleh, Mohsen, Chen, Guangan, Reips, Ulf-Dietrich, Hamzaoui, Raouf, Saupe, Dietmar
The picture-wise just noticeable difference (PJND) for a given image and a compression scheme is a statistical quantity giving the smallest distortion that a subject can perceive when the image is compressed with the compression scheme. The PJND is determined with subjective assessment tests for a sample of subjects. We introduce and apply two methods of adjustment where the subject interactively selects the distortion level at the PJND using either a slider or keystrokes. We compare the results and times required to those of the adaptive binary search type approach, in which image pairs with distortions that bracket the PJND are displayed and the difference in distortion levels is reduced until the PJND is identified. For the three methods, two images are compared using the flicker test in which the displayed images alternate at a frequency of 8 Hz. Unlike previous work, our goal is a global one, determining the PJND not only for the original pristine image but also for a sequence of compressed versions. Results for the MCL-JCI dataset show that the PJND measurements based on adjustment are comparable with those of the traditional approach using binary search, yet significantly faster. Moreover, we conducted a crowdsourcing study with side-byside comparisons and forced choice, which suggests that the flicker test is more sensitive than a side-by-side comparison.