Crowdsourced Estimation of Collective Just Noticeable Difference for Compressed Video with Flicker Test and QUEST+
2023-09-14, Jenadeleh, Mohsen, Hamzaoui, Raouf, Reips, Ulf-Dietrich, Saupe, Dietmar
The concept of video-wise just noticeable difference (JND) was recently proposed to determine the lowest bitrate at which a source video can be compressed without perceptible quality loss with a given probability. This bitrate is usually obtained from an estimate of the satisfied used ratio (SUR) at each bitrate, respectively encoding quality parameter. The SUR is the probability that the distortion corresponding to this bitrate is not noticeable. Commonly, the SUR is computed experimentally by estimating the subjective JND threshold of each subject using binary search, fitting a distribution model to the collected data, and creating the complementary cumulative distribution function of the distribution. The subjective tests consist of paired comparisons between the source video and compressed versions. However, we show that this approach typically over- or underestimates the SUR. To address this shortcoming, we directly estimate the SUR function by considering the entire population as a collective observer. Our method randomly chooses the subject for each paired comparison and uses a state-of-the-art Bayesian adaptive psychometric method (QUEST+) to select the compressed video in the paired comparison. Our simulations show that this collective method yields more accurate SUR results with fewer comparisons. We also provide a subjective experiment to assess the JND and SUR for compressed video. In the paired comparisons, we apply a flicker test that compares a video that interleaves the source video and its compressed version with the source video. Analysis of the subjective data revealed that the flicker test provides on average higher sensitivity and precision in the assessment of the JND threshold than the usual test that compares compressed versions with the source video. Using crowdsourcing and the proposed approach, we build a JND dataset for 45 source video sequences that are encoded with both advanced video coding (AVC) and versatile video coding (VVC) at all available quantization parameters. Our dataset is available at http://database.mmsp-kn.de/flickervidset-database.html.
CUDAS : Distortion-Aware Saliency Benchmark
2023, Zhao, Xin, Lou, Jianxun, Wu, Xinbo, Wu, Yingying, Lévêque, Lucie, Liu, Xiaochang, Guo, Pengfei, Qin, Yipeng, Lin, Hanhe, Saupe, Dietmar, Liu, Hantao
Visual saliency prediction remains an academic challenge due to the diversity and complexity of natural scenes as well as the scarcity of eye movement data on where people look in images. In many practical applications, digital images are inevitably subject to distortions, such as those caused by acquisition, editing, compression or transmission. A great deal of attention has been paid to predicting the saliency of distortion-free pristine images, but little attention has been given to understanding the impact of visual distortions on saliency prediction. In this paper, we first present the CUDAS database - a new distortion-aware saliency benchmark, where eye-tracking data was collected for 60 pristine images and their corresponding 540 distorted formats. We then conduct a statistical evaluation to reveal the behaviour of state-of-the-art saliency prediction models on distorted images and provide insights on building an effective model for distortion-aware saliency prediction. The new database is made publicly available to the research community.
Large-scale crowdsourced subjective assessment of picturewise just noticeable difference
2022, Lin, Hanhe, Chen, Guangan, Jenadeleh, Mohsen, Hosu, Vlad, Reips, Ulf-Dietrich, Hamzaoui, Raouf, Saupe, Dietmar
The picturewise just noticeable difference (PJND) for a given image, compression scheme, and subject is the smallest distortion level that the subject can perceive when the image is compressed with this compression scheme. The PJND can be used to determine the compression level at which a given proportion of the population does not notice any distortion in the compressed image. To obtain accurate and diverse results, the PJND must be determined for a large number of subjects and images. This is particularly important when experimental PJND data are used to train deep learning models that can predict a probability distribution model of the PJND for a new image. To date, such subjective studies have been carried out in laboratory environments. However, the number of participants and images in all existing PJND studies is very small because of the challenges involved in setting up laboratory experiments. To address this limitation, we develop a framework to conduct PJND assessments via crowdsourcing. We use a new technique based on slider adjustment and a flicker test to determine the PJND. A pilot study demonstrated that our technique could decrease the study duration by 50% and double the perceptual sensitivity compared to the standard binary search approach that successively compares a test image side by side with its reference image. Our framework includes a robust and systematic scheme to ensure the reliability of the crowdsourced results. Using 1,008 source images and distorted versions obtained with JPEG and BPG compression, we apply our crowdsourcing framework to build the largest PJND dataset, KonJND-1k (Konstanz just noticeable difference 1k dataset). A total of 503 workers participated in the study, yielding 61,030 PJND samples that resulted in an average of 42 samples per source image. The KonJND-1k dataset is available at http://database.mmsp-kn.de/konjnd-1k-database.html.
KonIQ++ : Boosting No-Reference Image Quality Assessment in the Wild by Jointly Predicting Image Quality and Defects
2021, Su, Shaolin, Hosu, Vlad, Lin, Hanhe, Zhang, Yanning, Saupe, Dietmar
Although image quality assessment (IQA) in-the-wild has been researched in computer vision, it is still challenging to precisely estimate perceptual image quality in the presence of real-world complex and composite distortions. In order to improve machine learning solutions for IQA, we consider side information denoting the presence of distortions besides the basic quality ratings in IQA datasets. Specifically, we extend one of the largest in-the-wild IQA databases, KonIQ-10k, to KonIQ++, by collecting distortion annotations for each image, aiming to improve quality prediction together with distortion identification. We further explore the interactions between image quality and distortion by proposing a novel IQA model, which jointly predicts image quality and distortion by recurrently refining task-specific features in a multi-stage fusion framework. Our dataset KonIQ++, along with the model, boosts IQA performance and generalization ability, demonstrating its potential for solving the challenging authentic IQA task. The proposed model can also accurately predict distinct image defects, suggesting its application in image processing tasks such as image colorization and deblurring.
JPEG AIC-3 Dataset : Towards Defining the High Quality to Nearly Visually Lossless Quality Range
2023-06-20, Testolina, Michela, Hosu, Vlad, Jenadeleh, Mohsen, Lazzarotto, Davi, Saupe, Dietmar, Ebrahimi, Touradj
Visual data play a crucial role in modern society, and the rate at which images and videos are acquired, stored, and exchanged every day is rapidly increasing. Image compression is the key technology that enables storing and sharing of visual content in an efficient and cost-effective manner, by removing redundant and irrelevant information. On the other hand, image compression often introduces undesirable artifacts that reduce the perceived quality of the media. Subjective image quality assessment experiments allow for the collection of information on the visual quality of the media as perceived by human observers, and therefore quantifying the impact of such distortions. Nevertheless, the most commonly used subjective image quality assessment methodologies were designed to evaluate compressed images with visible distortions, and therefore are not accurate and reliable when evaluating images having higher visual qualities. In this paper, we present a dataset of compressed images with quality levels that range from high to nearly visually lossless, with associated quality scores in JND units. The images were subjectively evaluated by expert human observers, and the results were used to define the range from high to nearly visually lossless quality. The dataset is made publicly available to researchers, providing a valuable resource for the development of novel subjective quality assessment methodologies or compression methods that are more effective in this quality range.
Critical analysis on the reproducibility of visual quality assessment using deep features
2022, Götz-Hahn, Franz, Hosu, Vlad, Saupe, Dietmar
Data used to train supervised machine learning models are commonly split into independent training, validation, and test sets. This paper illustrates that complex data leakage cases have occurred in the no-reference image and video quality assessment literature. Recently, papers in several journals reported performance results well above the best in the field. However, our analysis shows that information from the test set was inappropriately used in the training process in different ways and that the claimed performance results cannot be achieved. When correcting for the data leakage, the performances of the approaches drop even below the state-of-the-art by a large margin. Additionally, we investigate end-to-end variations to the discussed approaches, which do not improve upon the original.
TranSalNet : Towards perceptually relevant visual saliency prediction
2022, Lou, Jianxun, Lin, Hanhe, Marshall, David, Saupe, Dietmar, Liu, Hantao
Convolutional neural networks (CNNs) have significantly advanced computational modelling for saliency prediction. However, accurately simulating the mechanisms of visual attention in the human cortex remains an academic challenge. It is critical to integrate properties of human vision into the design of CNN architectures, leading to perceptually more relevant saliency prediction. Due to the inherent inductive biases of CNN architectures, there is a lack of sufficient long-range contextual encoding capacity. This hinders CNN-based saliency models from capturing properties that emulate viewing behaviour of humans. Transformers have shown great potential in encoding long-range information by leveraging the self-attention mechanism. In this paper, we propose a novel saliency model that integrates transformer components to CNNs to capture the long-range contextual visual information. Experimental results show that the transformers provide added value to saliency prediction, enhancing its perceptual relevance in the performance. Our proposed saliency model using transformers has achieved superior results on public benchmarks and competitions for saliency prediction models.
Relaxed forced choice improves performance of visual quality assessment methods
2023-06, Jenadeleh, Mohsen, Zagermann, Johannes, Reiterer, Harald, Reips, Ulf-Dietrich, Hamzaoui, Raouf, Saupe, Dietmar
In image quality assessment, a collective visual quality score for an image or video is obtained from the individual ratings of many subjects. One commonly used format for these experiments is the two-alternative forced choice method. Two stimuli with the same content but differing visual quality are presented sequentially or side-by-side. Subjects are asked to select the one of better quality, and when uncertain, they are required to guess. The relaxed alternative forced choice format aims to reduce the cognitive load and the noise in the responses due to the guessing by providing a third response option, namely, "not sure". This work presents a large and comprehensive crowdsourcing experiment to compare these two response formats: the one with the ``not sure'' option and the one without it. To provide unambiguous ground truth for quality evaluation, subjects were shown pairs of images with differing numbers of dots and asked each time to choose the one with more dots. Our crowdsourcing study involved 254 participants and was conducted using a within-subject design. Each participant was asked to respond to 40 pair comparisons with and without the "not sure" response option and completed a questionnaire to evaluate their cognitive load for each testing condition. The experimental results show that the inclusion of the "not sure" response option in the forced choice method reduced mental load and led to models with better data fit and correspondence to ground truth. We also tested for the equivalence of the models and found that they were different. The dataset is available at http://database.mmsp-kn.de/cogvqa-database.html.
Crowdsourced Quality Assessment of Enhanced Underwater Images : a Pilot Study
2022, Lin, Hanhe, Men, Hui, Yan, Yijun, Ren, Jinchang, Saupe, Dietmar
Underwater image enhancement (UIE) is essential for a high-quality underwater optical imaging system. While a number of UIE algorithms have been proposed in recent years, there is little study on image quality assessment (IQA) of enhanced underwater images. In this paper, we conduct the first crowdsourced subjective IQA study on enhanced underwater images. We chose ten state-of-the-art UIE algorithms and applied them to yield enhanced images from an underwater image benchmark. Their latent quality scales were reconstructed from pair comparison. We demonstrate that the existing IQA metrics are not suitable for assessing the perceived quality of enhanced underwater images. In addition, the overall performance of 10 UIE algorithms on the benchmark is ranked by the newly proposed simulated pair comparison of the methods.
KonVid-150k : A Dataset for No-Reference Video Quality Assessment of Videos in-the-Wild
2021, Götz-Hahn, Franz, Hosu, Vlad, Lin, Hanhe, Saupe, Dietmar
Video quality assessment (VQA) methods focus on particular degradation types, usually artificially induced on a small set of reference videos. Hence, most traditional VQA methods under-perform in-the-wild. Deep learning approaches have had limited success due to the small size and diversity of existing VQA datasets, either artificial or authentically distorted. We introduce a new in-the-wild VQA dataset that is substantially larger and diverse: KonVid-150k. It consists of a coarsely annotated set of 153,841 videos having five quality ratings each, and 1,596 videos with a minimum of 89 ratings each. Additionally, we propose new efficient VQA approaches (MLSP-VQA) relying on multi-level spatially pooled deep-features (MLSP). They are exceptionally well suited for training at scale, compared to deep transfer learning approaches. Our best method, MLSP-VQA-FF, improves the Spearman rank-order correlation coefficient (SRCC) performance metric on the commonly used KoNViD-1k in-the-wild benchmark dataset to 0.82. It surpasses the best existing deep-learning model (0.80 SRCC) and hand-crafted feature-based method (0.78 SRCC). We further investigate how alternative approaches perform under different levels of label noise, and dataset size, showing that MLSP-VQA-FF is the overall best method for videos in-the-wild. Finally, we show that the MLSP-VQA models trained on KonVid-150k sets the new state-of-the-art for cross-test performance on KoNViD-1k and LIVE-Qualcomm with a 0.83 and 0.64 SRCC, respectively. For KoNViD-1k this inter-dataset testing outperforms intra-dataset experiments, showing excellent generalization.