JPEG AIC-3 Dataset : Towards Defining the High Quality to Nearly Visually Lossless Quality Range
2023-06-20, Testolina, Michela, Hosu, Vlad, Jenadeleh, Mohsen, Lazzarotto, Davi, Saupe, Dietmar, Ebrahimi, Touradj
Visual data play a crucial role in modern society, and the rate at which images and videos are acquired, stored, and exchanged every day is rapidly increasing. Image compression is the key technology that enables storing and sharing of visual content in an efficient and cost-effective manner, by removing redundant and irrelevant information. On the other hand, image compression often introduces undesirable artifacts that reduce the perceived quality of the media. Subjective image quality assessment experiments allow for the collection of information on the visual quality of the media as perceived by human observers, and therefore quantifying the impact of such distortions. Nevertheless, the most commonly used subjective image quality assessment methodologies were designed to evaluate compressed images with visible distortions, and therefore are not accurate and reliable when evaluating images having higher visual qualities. In this paper, we present a dataset of compressed images with quality levels that range from high to nearly visually lossless, with associated quality scores in JND units. The images were subjectively evaluated by expert human observers, and the results were used to define the range from high to nearly visually lossless quality. The dataset is made publicly available to researchers, providing a valuable resource for the development of novel subjective quality assessment methodologies or compression methods that are more effective in this quality range.
KonVid-150k : A Dataset for No-Reference Video Quality Assessment of Videos in-the-Wild
2021, Götz-Hahn, Franz, Hosu, Vlad, Lin, Hanhe, Saupe, Dietmar
Video quality assessment (VQA) methods focus on particular degradation types, usually artificially induced on a small set of reference videos. Hence, most traditional VQA methods under-perform in-the-wild. Deep learning approaches have had limited success due to the small size and diversity of existing VQA datasets, either artificial or authentically distorted. We introduce a new in-the-wild VQA dataset that is substantially larger and diverse: KonVid-150k. It consists of a coarsely annotated set of 153,841 videos having five quality ratings each, and 1,596 videos with a minimum of 89 ratings each. Additionally, we propose new efficient VQA approaches (MLSP-VQA) relying on multi-level spatially pooled deep-features (MLSP). They are exceptionally well suited for training at scale, compared to deep transfer learning approaches. Our best method, MLSP-VQA-FF, improves the Spearman rank-order correlation coefficient (SRCC) performance metric on the commonly used KoNViD-1k in-the-wild benchmark dataset to 0.82. It surpasses the best existing deep-learning model (0.80 SRCC) and hand-crafted feature-based method (0.78 SRCC). We further investigate how alternative approaches perform under different levels of label noise, and dataset size, showing that MLSP-VQA-FF is the overall best method for videos in-the-wild. Finally, we show that the MLSP-VQA models trained on KonVid-150k sets the new state-of-the-art for cross-test performance on KoNViD-1k and LIVE-Qualcomm with a 0.83 and 0.64 SRCC, respectively. For KoNViD-1k this inter-dataset testing outperforms intra-dataset experiments, showing excellent generalization.
KonIQ-10k : An Ecologically Valid Database for Deep Learning of Blind Image Quality Assessment
2020-01-24, Hosu, Vlad, Lin, Hanhe, Sziranyi, Tamas, Saupe, Dietmar
Deep learning methods for image quality assessment (IQA) are limited due to the small size of existing datasets. Extensive datasets require substantial resources both for generating publishable content and annotating it accurately. We present a systematic and scalable approach to creating KonIQ-10k, the largest IQA dataset to date, consisting of 10,073 quality scored images. It is the first in-the-wild database aiming for ecological validity, concerning the authenticity of distortions, the diversity of content, and quality-related indicators. Through the use of crowdsourcing, we obtained 1.2 million reliable quality ratings from 1,459 crowd workers, paving the way for more general IQA models. We propose a novel, deep learning model (KonCept512), to show an excellent generalization beyond the test set (0.921 SROCC), to the current state-of-the-art database LIVE-in-the-Wild (0.825 SROCC). The model derives its core performance from the InceptionResNet architecture, being trained at a higher resolution than previous models (512 × 384 ). Correlation analysis shows that KonCept512 performs similar to having 9 subjective scores for each test image.
Visual Quality Assessment for Interpolated Slow-motion Videos based on a Novel Database
2020, Men, Hui, Hosu, Vlad, Lin, Hanhe, Bruhn, Andres, Saupe, Dietmar
Professional video editing tools can generate slow-motion video by interpolating frames from video recorded at a standard frame rate. Thereby the perceptual quality of such interpolated slow-motion videos strongly depends on the underlying interpolation techniques. We built a novel benchmark database that is specifically tailored for interpolated slow-motion videos (KoSMo-1k). It consists of 1,350 interpolated video sequences, from 30 different content sources, along with their subjective quality ratings from up to ten subjective comparisons per video pair. Moreover, we evaluated the performance of twelve existing full-reference (FR) image/video quality assessment (I/VQA) methods on the benchmark. In this way, we are able to show that specifically tailored quality assessment methods for interpolated slow-motion videos are needed, since the evaluated methods - despite their good performance on real-time video databases - do not give satisfying results when it comes to frame interpolation.
Critical analysis on the reproducibility of visual quality assessment using deep features
2022, Götz-Hahn, Franz, Hosu, Vlad, Saupe, Dietmar
Data used to train supervised machine learning models are commonly split into independent training, validation, and test sets. This paper illustrates that complex data leakage cases have occurred in the no-reference image and video quality assessment literature. Recently, papers in several journals reported performance results well above the best in the field. However, our analysis shows that information from the test set was inappropriately used in the training process in different ways and that the claimed performance results cannot be achieved. When correcting for the data leakage, the performances of the approaches drop even below the state-of-the-art by a large margin. Additionally, we investigate end-to-end variations to the discussed approaches, which do not improve upon the original.
KonIQ++ : Boosting No-Reference Image Quality Assessment in the Wild by Jointly Predicting Image Quality and Defects
2021, Su, Shaolin, Hosu, Vlad, Lin, Hanhe, Zhang, Yanning, Saupe, Dietmar
Although image quality assessment (IQA) in-the-wild has been researched in computer vision, it is still challenging to precisely estimate perceptual image quality in the presence of real-world complex and composite distortions. In order to improve machine learning solutions for IQA, we consider side information denoting the presence of distortions besides the basic quality ratings in IQA datasets. Specifically, we extend one of the largest in-the-wild IQA databases, KonIQ-10k, to KonIQ++, by collecting distortion annotations for each image, aiming to improve quality prediction together with distortion identification. We further explore the interactions between image quality and distortion by proposing a novel IQA model, which jointly predicts image quality and distortion by recurrently refining task-specific features in a multi-stage fusion framework. Our dataset KonIQ++, along with the model, boosts IQA performance and generalization ability, demonstrating its potential for solving the challenging authentic IQA task. The proposed model can also accurately predict distinct image defects, suggesting its application in image processing tasks such as image colorization and deblurring.
Foveated Video Coding for Real-Time Streaming Applications
2020, Wiedemann, Oliver, Hosu, Vlad, Lin, Hanhe, Saupe, Dietmar
Video streaming under real-time constraints is an increasingly widespread application. Many recent video encoders are unsuitable for this scenario due to theoretical limitations or run time requirements. In this paper, we present a framework for the perceptual evaluation of foveated video coding schemes. Foveation describes the process of adapting a visual stimulus according to the acuity of the human eye. In contrast to traditional region-of-interest coding, where certain areas are statically encoded at a higher quality, we utilize feedback from an eye-tracker to spatially steer the bit allocation scheme in real-time. We evaluate the performance of an H.264 based foveated coding scheme in a lab environment by comparing the bitrates at the point of just noticeable distortion (JND). Furthermore, we identify perceptually optimal codec parameterizations. In our trials, we achieve an average bitrate savings of 63.24% at the JND in comparison to the unfoveated baseline.
Large-scale crowdsourced subjective assessment of picturewise just noticeable difference
2022, Lin, Hanhe, Chen, Guangan, Jenadeleh, Mohsen, Hosu, Vlad, Reips, Ulf-Dietrich, Hamzaoui, Raouf, Saupe, Dietmar
The picturewise just noticeable difference (PJND) for a given image, compression scheme, and subject is the smallest distortion level that the subject can perceive when the image is compressed with this compression scheme. The PJND can be used to determine the compression level at which a given proportion of the population does not notice any distortion in the compressed image. To obtain accurate and diverse results, the PJND must be determined for a large number of subjects and images. This is particularly important when experimental PJND data are used to train deep learning models that can predict a probability distribution model of the PJND for a new image. To date, such subjective studies have been carried out in laboratory environments. However, the number of participants and images in all existing PJND studies is very small because of the challenges involved in setting up laboratory experiments. To address this limitation, we develop a framework to conduct PJND assessments via crowdsourcing. We use a new technique based on slider adjustment and a flicker test to determine the PJND. A pilot study demonstrated that our technique could decrease the study duration by 50% and double the perceptual sensitivity compared to the standard binary search approach that successively compares a test image side by side with its reference image. Our framework includes a robust and systematic scheme to ensure the reliability of the crowdsourced results. Using 1,008 source images and distorted versions obtained with JPEG and BPG compression, we apply our crowdsourcing framework to build the largest PJND dataset, KonJND-1k (Konstanz just noticeable difference 1k dataset). A total of 503 workers participated in the study, yielding 61,030 PJND samples that resulted in an average of 42 samples per source image. The KonJND-1k dataset is available at http://database.mmsp-kn.de/konjnd-1k-database.html.
Subjective annotation for a frame interpolation benchmark using artefact amplification
2020-12, Men, Hui, Hosu, Vlad, Lin, Hanhe, Bruhn, Andrés, Saupe, Dietmar
Current benchmarks for optical flow algorithms evaluate the estimation either directly by comparing the predicted flow fields with the ground truth or indirectly by using the predicted flow fields for frame interpolation and then comparing the interpolated frames with the actual frames. In the latter case, objective quality measures such as the mean squared error are typically employed. However, it is well known that for image quality assessment, the actual quality experienced by the user cannot be fully deduced from such simple measures. Hence, we conducted a subjective quality assessment crowdscouring study for the interpolated frames provided by one of the optical flow benchmarks, the Middlebury benchmark. It contains interpolated frames from 155 methods applied to each of 8 contents. For this purpose, we collected forced-choice paired comparisons between interpolated images and corresponding ground truth. To increase the sensitivity of observers when judging minute difference in paired comparisons we introduced a new method to the field of full-reference quality assessment, called artefact amplification. From the crowdsourcing data (3720 comparisons of 20 votes each) we reconstructed absolute quality scale values according to Thurstone’s model. As a result, we obtained a re-ranking of the 155 participating algorithms w.r.t. the visual quality of the interpolated frames. This re-ranking not only shows the necessity of visual quality assessment as another evaluation metric for optical flow and frame interpolation benchmarks, the results also provide the ground truth for designing novel image quality assessment (IQA) methods dedicated to perceptual quality of interpolated images. As a first step, we proposed such a new full-reference method, called WAE-IQA, which weights the local differences between an interpolated image and its ground truth.
ATQAM/MAST'20 : Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends
2020, Guha, Tanaya, Hosu, Vlad, Saupe, Dietmar, Goldlücke, Bastian, Kumar, Naveen, Lin, Weisi, Martinez, Victor, Somandepalli, Krishna, Narayanan, Shrikanth, Cheng, Wen-Huang, McLaughlin, Kree
The Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends (ATQAM/ MAST) aims to bring together researchers and professionals working in fields ranging from computer vision, multimedia computing, multimodal signal processing to psychology and social sciences. It is divided into two tracks: ATQAM and MAST. ATQAM track: Visual quality assessment techniques can be divided into image and video technical quality assessment (IQA and VQA, or broadly TQA) and aesthetics quality assessment (AQA). While TQA is a long-standing field, having its roots in media compression, AQA is relatively young. Both have received increased attention with developments in deep learning. The topics have mostly been studied separately, even though they deal with similar aspects of the underlying subjective experience of media. The aim is to bring together individuals in the two fields of TQA and AQA for the sharing of ideas and discussions on current trends, developments, issues, and future directions. MAST track: The research area of media content analytics has been traditionally used to refer to applications involving inference of higher-level semantics from multimedia content. However, multimedia is typically created for human consumption, and we believe it is necessary to adopt a human-centered approach to this analysis, which would not only enable a better understanding of how viewers engage with content but also how they impact each other in the process.