Crowdsourced Estimation of Collective Just Noticeable Difference for Compressed Video with Flicker Test and QUEST+
2023-09-14, Jenadeleh, Mohsen, Hamzaoui, Raouf, Reips, Ulf-Dietrich, Saupe, Dietmar
The concept of video-wise just noticeable difference (JND) was recently proposed to determine the lowest bitrate at which a source video can be compressed without perceptible quality loss with a given probability. This bitrate is usually obtained from an estimate of the satisfied used ratio (SUR) at each bitrate, respectively encoding quality parameter. The SUR is the probability that the distortion corresponding to this bitrate is not noticeable. Commonly, the SUR is computed experimentally by estimating the subjective JND threshold of each subject using binary search, fitting a distribution model to the collected data, and creating the complementary cumulative distribution function of the distribution. The subjective tests consist of paired comparisons between the source video and compressed versions. However, we show that this approach typically over- or underestimates the SUR. To address this shortcoming, we directly estimate the SUR function by considering the entire population as a collective observer. Our method randomly chooses the subject for each paired comparison and uses a state-of-the-art Bayesian adaptive psychometric method (QUEST+) to select the compressed video in the paired comparison. Our simulations show that this collective method yields more accurate SUR results with fewer comparisons. We also provide a subjective experiment to assess the JND and SUR for compressed video. In the paired comparisons, we apply a flicker test that compares a video that interleaves the source video and its compressed version with the source video. Analysis of the subjective data revealed that the flicker test provides on average higher sensitivity and precision in the assessment of the JND threshold than the usual test that compares compressed versions with the source video. Using crowdsourcing and the proposed approach, we build a JND dataset for 45 source video sequences that are encoded with both advanced video coding (AVC) and versatile video coding (VVC) at all available quantization parameters. Our dataset is available at http://database.mmsp-kn.de/flickervidset-database.html.
KonIQ-10k : An Ecologically Valid Database for Deep Learning of Blind Image Quality Assessment
2020-01-24, Hosu, Vlad, Lin, Hanhe, Sziranyi, Tamas, Saupe, Dietmar
Deep learning methods for image quality assessment (IQA) are limited due to the small size of existing datasets. Extensive datasets require substantial resources both for generating publishable content and annotating it accurately. We present a systematic and scalable approach to creating KonIQ-10k, the largest IQA dataset to date, consisting of 10,073 quality scored images. It is the first in-the-wild database aiming for ecological validity, concerning the authenticity of distortions, the diversity of content, and quality-related indicators. Through the use of crowdsourcing, we obtained 1.2 million reliable quality ratings from 1,459 crowd workers, paving the way for more general IQA models. We propose a novel, deep learning model (KonCept512), to show an excellent generalization beyond the test set (0.921 SROCC), to the current state-of-the-art database LIVE-in-the-Wild (0.825 SROCC). The model derives its core performance from the InceptionResNet architecture, being trained at a higher resolution than previous models (512 × 384 ). Correlation analysis shows that KonCept512 performs similar to having 9 subjective scores for each test image.
Visual Quality Assessment for Motion Compensated Frame Interpolation
2019, Men, Hui, Lin, Hanhe, Hosu, Vlad, Maurer, Daniel, Bruhn, Andres, Saupe, Dietmar
Current benchmarks for optical flow algorithms evaluate the estimation quality by comparing their predicted flow field with the ground truth, and additionally may compare interpolated frames, based on these predictions, with the correct frames from the actual image sequences. For the latter comparisons, objective measures such as mean square errors are applied. However, for applications like image interpolation, the expected user's quality of experience cannot be fully deduced from such simple quality measures. Therefore, we conducted a subjective quality assessment study by crowdsourcing for the interpolated images provided in one of the optical flow benchmarks, the Middlebury benchmark. We used paired comparisons with forced choice and reconstructed absolute quality scale values according to Thurstone's model using the classical least squares method. The results give rise to a re-ranking of 141 participating algorithms w.r.t. visual quality of interpolated frames mostly based on optical flow estimation. Our re-ranking result shows the necessity of visual quality assessment as another evaluation metric for optical flow and frame interpolation benchmarks.
Disregarding the Big Picture : Towards Local Image Quality Assessment
2018, Wiedemann, Oliver, Hosu, Vlad, Lin, Hanhe, Saupe, Dietmar
Image quality has been studied almost exclusively as a global image property. It is common practice for IQA databases and metrics to quantify this abstract concept with a single number per image. We propose an approach to blind IQA based on a convolutional neural network (patchnet) that was trained on a novel set of 32,000 individually annotated patches of 64×64 pixel. We use this model to generate spatially small local quality maps of images taken from KonIQ-10k, a large and diverse in-the-wild database of authentically distorted images. We show that our local quality indicator correlates well with global MOS, going beyond the predictive ability of quality related attributes such as sharpness. Averaging of patchnet predictions already outperforms classical approaches to global MOS prediction that were trained to include global image features. We additionally experiment with a generic second-stage aggregation CNN to estimate mean opinion scores. Our latter model performs comparable to the state of the art with a PLCC of 0.81 on KonIQ-10k.
CUDAS : Distortion-Aware Saliency Benchmark
2023, Zhao, Xin, Lou, Jianxun, Wu, Xinbo, Wu, Yingying, Lévêque, Lucie, Liu, Xiaochang, Guo, Pengfei, Qin, Yipeng, Lin, Hanhe, Saupe, Dietmar, Liu, Hantao
Visual saliency prediction remains an academic challenge due to the diversity and complexity of natural scenes as well as the scarcity of eye movement data on where people look in images. In many practical applications, digital images are inevitably subject to distortions, such as those caused by acquisition, editing, compression or transmission. A great deal of attention has been paid to predicting the saliency of distortion-free pristine images, but little attention has been given to understanding the impact of visual distortions on saliency prediction. In this paper, we first present the CUDAS database - a new distortion-aware saliency benchmark, where eye-tracking data was collected for 60 pristine images and their corresponding 540 distorted formats. We then conduct a statistical evaluation to reveal the behaviour of state-of-the-art saliency prediction models on distorted images and provide insights on building an effective model for distortion-aware saliency prediction. The new database is made publicly available to the research community.
Effective Aesthetics Prediction With Multi-Level Spatially Pooled Features
2019-06, Hosu, Vlad, Goldlücke, Bastian, Saupe, Dietmar
We propose an effective deep learning approach to aesthetics quality assessment that relies on a new type of pre-trained features, and apply it to the AVA data set, the currently largest aesthetics database. While previous approaches miss some of the information in the original images, due to taking small crops, down-scaling or warping the originals during training, we propose the first method that efficiently supports full resolution images as an input, and can be trained on variable input sizes. This allows us to significantly improve upon the state of the art, increasing the Spearman rank-order correlation coefficient (SRCC) of ground-truth mean opinion scores (MOS) from the existing best reported of 0.612 to 0.756. To achieve this performance, we extract multi-level spatially pooled (MLSP) features from all convolutional blocks of a pre-trained InceptionResNet-v2 network, and train a custom shallow Convolutional Neural Network (CNN) architecture on these new features.
Expertise screening in crowdsourcing image quality
2018, Hosu, Vlad, Lin, Hanhe, Saupe, Dietmar
We propose a screening approach to find reliable and effectively expert crowd workers in image quality assessment (IQA). Our method measures the users' ability to identify image degradations by using test questions, together with several relaxed reliability checks. We conduct multiple experiments, obtaining reproducible results with a high agreement between the expertise-screened crowd and the freelance experts of 0.95 Spearman rank order correlation (SROCC), with one restriction on the image type. Our contributions include a reliability screening method for uninformative users, a new type of test questions that rely on our proposed database 1 of pristine and artificially distorted images, a group agreement extrapolation method and an analysis of the crowdsourcing experiments.
KonIQ++ : Boosting No-Reference Image Quality Assessment in the Wild by Jointly Predicting Image Quality and Defects
2021, Su, Shaolin, Hosu, Vlad, Lin, Hanhe, Zhang, Yanning, Saupe, Dietmar
Although image quality assessment (IQA) in-the-wild has been researched in computer vision, it is still challenging to precisely estimate perceptual image quality in the presence of real-world complex and composite distortions. In order to improve machine learning solutions for IQA, we consider side information denoting the presence of distortions besides the basic quality ratings in IQA datasets. Specifically, we extend one of the largest in-the-wild IQA databases, KonIQ-10k, to KonIQ++, by collecting distortion annotations for each image, aiming to improve quality prediction together with distortion identification. We further explore the interactions between image quality and distortion by proposing a novel IQA model, which jointly predicts image quality and distortion by recurrently refining task-specific features in a multi-stage fusion framework. Our dataset KonIQ++, along with the model, boosts IQA performance and generalization ability, demonstrating its potential for solving the challenging authentic IQA task. The proposed model can also accurately predict distinct image defects, suggesting its application in image processing tasks such as image colorization and deblurring.
SUR-Net : Predicting the Satisfied User Ratio Curve for Image Compression with Deep Learning
2019, Fan, Chunling, Lin, Hanhe, Hosu, Vlad, Zhang, Yun, Jiang, Qingshan, Hamzaoui, Raouf, Saupe, Dietmar
The Satisfied User Ratio (SUR) curve for a lossy image compression scheme, e.g., JPEG, characterizes the probability distribution of the Just Noticeable Difference (JND) level, the smallest distortion level that can be perceived by a subject. We propose the first deep learning approach to predict such SUR curves. Instead of the direct approach of regressing the SUR curve itself for a given reference image, our model is trained on pairs of images, original and compressed. Relying on a Siamese Convolutional Neural Network (CNN), feature pooling, a fully connected regression-head, and transfer learning, we achieved a good prediction performance. Experiments on the MCL-JCI dataset showed a mean Bhattacharyya distance between the predicted and the original JND distributions of only 0.072.
Deeprn : A Content Preserving Deep Architecture for Blind Image Quality Assessment
2018, Varga, Domonkos, Saupe, Dietmar, Sziranyi, Tamas
This paper presents a blind image quality assessment (BIQA) method based on deep learning with convolutional neural networks (CNN). Our method is trained on full and arbitrarily sized images rather than small image patches or resized input images as usually done in CNNs for image classification and quality assessment. The resolution independence is achieved by pyramid pooling. This work is the first that applies a fine-tuned residual deep learning network (ResNet-101) to BIQA. The training is carried out on a new and very large, labeled dataset of 10, 073 images (KonIQ-10k) that contains quality rating histograms besides the mean opinion scores (MOS). In contrast to previous methods we do not train to approximate the MOS directly, but rather use the distributions of scores. Experiments were carried out on three benchmark image quality databases. The results showed clear improvements of the accuracy of the estimated MOS values, compared to current state-of-the-art algorithms. We also report on the quality of the estimation of the score distributions.