Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data
Xintao Wang, Liangbin Xie, Chao Dong
et al.
Though many attempts have been made in blind super-resolution to restore low-resolution images with unknown and complex degradations, they are still far from addressing general real-world degraded images. In this work, we extend the powerful ESRGAN to a practical restoration application (namely, Real-ESRGAN), which is trained with pure synthetic data. Specifically, a high-order degradation modeling process is introduced to better simulate complex real-world degradations. We also consider the common ringing and overshoot artifacts in the synthesis process. In addition, we employ a U-Net discriminator with spectral normalization to increase discriminator capability and stabilize the training dynamics. Extensive comparisons have shown its superior visual performance than prior works on various real datasets. We also provide efficient implementations to synthesize training pairs on the fly.
1780 sitasi
en
Engineering, Computer Science
MetaFormer is Actually What You Need for Vision
Weihao Yu, Romy Mi Luo, Pan Zhou
et al.
Transformers have shown great potential in computer vision tasks. A common belief is their attention-based token mixer module contributes most to their competence. However, recent works show the attention-based module in transformers can be replaced by spatial MLPs and the resulted models still perform quite well. Based on this observation, we hypothesize that the general architecture of the transformers, instead of the specific token mixer module, is more essential to the model's performance. To verify this, we deliberately replace the attention module in transformers with an embarrassingly simple spatial pooling operator to conduct only basic token mixing. Surprisingly, we observe that the derived model, termed as PoolFormer, achieves competitive performance on multiple computer vision tasks. For example, on ImageNet-1K, PoolFormer achieves 82.1 % top-1 accuracy, surpassing well-tuned vision transformer/MLP-like baselines DeiT-B/ResMLP-B24 by 0.3%/1.1% accuracy with 35%/52% fewer parameters and 49%/61% fewer MACs. The effectiveness of Pool-Former verifies our hypothesis and urges us to initiate the concept of “MetaFormer”, a general architecture abstracted from transformers without specifying the token mixer. Based on the extensive experiments, we argue that MetaFormer is the key player in achieving superior results for recent transformer and MLP-like models on vision tasks. This work calls for more future research dedicated to improving MetaFormer instead of focusing on the token mixer modules. Additionally, our proposed PoolFormer could serve as a starting baseline for future MetaFormer architecture design.
1296 sitasi
en
Computer Science
Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks
Yaqin Zhou, Shangqing Liu, J. Siow
et al.
Vulnerability identification is crucial to protect the software systems from attacks for cyber security. It is especially important to localize the vulnerable functions among the source code to facilitate the fix. However, it is a challenging and tedious process, and also requires specialized security expertise. Inspired by the work on manually-defined patterns of vulnerabilities from various code representation graphs and the recent advance on graph neural networks, we propose Devign, a general graph neural network based model for graph-level classification through learning on a rich set of code semantic representations. It includes a novel Conv module to efficiently extract useful features in the learned rich node representations for graph-level classification. The model is trained over manually labeled datasets built on 4 diversified large-scale open-source C projects that incorporate high complexity and variety of real source code instead of synthesis code used in previous works. The results of the extensive evaluation on the datasets demonstrate that Devign outperforms the state of the arts significantly with an average of 10.51% higher accuracy and 8.68% F1 score, increases averagely 4.66% accuracy and 6.37% F1 by the Conv module.
1058 sitasi
en
Computer Science, Mathematics
PointPainting: Sequential Fusion for 3D Object Detection
Sourabh Vora, Alex H. Lang, Bassam Helou
et al.
Camera and lidar are important sensor modalities for robotics in general and self-driving cars in particular. The sensors provide complementary information offering an opportunity for tight sensor-fusion. Surprisingly, lidar-only methods outperform fusion methods on the main benchmark datasets, suggesting a gap in the literature. In this work, we propose PointPainting: a sequential fusion method to fill this gap. PointPainting works by projecting lidar points into the output of an image-only semantic segmentation network and appending the class scores to each point. The appended (painted) point cloud can then be fed to any lidar-only method. Experiments show large improvements on three different state-of-the art methods, Point-RCNN, VoxelNet and PointPillars on the KITTI and nuScenes datasets. The painted version of PointRCNN represents a new state of the art on the KITTI leaderboard for the bird's-eye view detection task. In ablation, we study how the effects of Painting depends on the quality and format of the semantic segmentation output, and demonstrate how latency can be minimized through pipelining.
1022 sitasi
en
Computer Science, Engineering
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
Kundan Kumar, Rithesh Kumar, T. Boissiere
et al.
Previous works (Donahue et al., 2018a; Engel et al., 2019a) have found that generating coherent raw audio waveforms with GANs is challenging. In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques. Subjective evaluation metric (Mean Opinion Score, or MOS) shows the effectiveness of the proposed approach for high quality mel-spectrogram inversion. To establish the generality of the proposed techniques, we show qualitative results of our model in speech synthesis, music domain translation and unconditional music synthesis. We evaluate the various components of the model through ablation studies and suggest a set of guidelines to design general purpose discriminators and generators for conditional sequence synthesis tasks. Our model is non-autoregressive, fully convolutional, with significantly fewer parameters than competing models and generalizes to unseen speakers for mel-spectrogram inversion. Our pytorch implementation runs at more than 100x faster than realtime on GTX 1080Ti GPU and more than 2x faster than real-time on CPU, without any hardware specific optimization tricks.
1108 sitasi
en
Computer Science, Engineering
An Overview of Multi-Task Learning in Deep Neural Networks
Sebastian Ruder
Multi-task learning (MTL) has led to successes in many applications of machine learning, from natural language processing and speech recognition to computer vision and drug discovery. This article aims to give a general overview of MTL, particularly in deep neural networks. It introduces the two most common methods for MTL in Deep Learning, gives an overview of the literature, and discusses recent advances. In particular, it seeks to help ML practitioners apply MTL by shedding light on how MTL works and providing guidelines for choosing appropriate auxiliary tasks.
3166 sitasi
en
Computer Science, Mathematics
Interpretable Explanations of Black Boxes by Meaningful Perturbation
Ruth C. Fong, A. Vedaldi
As machine learning algorithms are increasingly applied to high impact yet high risk tasks, such as medical diagnosis or autonomous driving, it is critical that researchers can explain how such algorithms arrived at their predictions. In recent years, a number of image saliency methods have been developed to summarize where highly complex neural networks “look” in an image for evidence for their predictions. However, these techniques are limited by their heuristic nature and architectural constraints. In this paper, we make two main contributions: First, we propose a general framework for learning different kinds of explanations for any black box algorithm. Second, we specialise the framework to find the part of an image most responsible for a classifier decision. Unlike previous works, our method is model-agnostic and testable because it is grounded in explicit and interpretable image perturbations.
1666 sitasi
en
Computer Science, Mathematics
Communication theory of secrecy systems
C. Shannon
9694 sitasi
en
Mathematics, Computer Science
Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches
D. Simon
3270 sitasi
en
Mathematics
Simultaneous localization and mapping: part I
H. Durrant-Whyte, Tim Bailey
4139 sitasi
en
Computer Science, Engineering
Alignment by Maximization of Mutual Information
Paul A. Viola, William M. Wells
4196 sitasi
en
Computer Science, Mathematics
Nonlinear Regression Analysis and Its Applications
D. Bates, D. Watts
3258 sitasi
en
Computer Science
Clustering of time series data - a survey
T. Liao
2625 sitasi
en
Computer Science
The Cambridge Grammar of the English Language
H. Hughes
Feature Detection with Automatic Scale Selection
Tony Lindeberg
3078 sitasi
en
Computer Science
Biological signals as handicaps.
A. Grafen
2450 sitasi
en
Biology, Medicine
Large Scale Multiple Kernel Learning
S. Sonnenburg, Gunnar Rätsch, C. Schäfer
et al.
1421 sitasi
en
Mathematics, Computer Science
Biometry: the principles and practice of statistics in biological research 2nd edition.
Sokal Rr, Rohlf Fj
4005 sitasi
en
Computer Science
A História mostra-nos que não temos de escolher o pior
José Bragança de Miranda, Carlos Camponez, José Gomes Pinto
Doutorado em Ciências da Comunicação e com a agregação em Teoria da Cultura pela Universidade Nova de Lisboa, José Bragança de Miranda, em entrevista à Biblos, reflete sobre o tema da liberdade, cruzando a Comunicação, a Filosofia, a História e a Política. Refratário às lógicas monistas do pensamento e das organizações, o atual reitor da Universidade Lusófona considera que, a ter existido, o fim da História abriu-se com as Revoluções que conduziram ao controlo do poder absoluto. Reconhecendo que o controlo desse poder não é uma conquista definitiva, José Bragança de Miranda defende, no entanto, que a História mostra que a escolha do pior não se apresenta como uma inevitabilidade para a Humanidade. Por isso, mais do que discutir conceitos como a Liberdade, considera que é importante fazer uso deles, atualizando em cada ínfimo presente o legado de muitos heróis do passado.
History of scholarship and learning. The humanities
The Impact of Entrepreneurial Competence on Entrepreneurial Performance of Family Farms: A Comprehensive Research Framework of “Competence – Legitimacy – Performance”
Xiaofeng Su, Xiaoli Jiang, Anxin Xu
In China, the implementation of the rural revitalization strategy provides a broad stage for migrant workers to return home to start their own businesses. This study constructs a research framework of “competence – legitimacy – performance.” Through online and offline surveys, this study obtained 477 valid samples from new family farm entrepreneurs in Fujian province of China. By using structural equation model, this study explores the relationship between entrepreneurial competence and family farm entrepreneurial performance. The empirical analysis results show that all the five dimensions of family farm entrepreneurs’ entrepreneurial competence, namely, opportunity recognition competence, network competence, resource acquisition competence, entrepreneurial learning competence, and improvisational competence have positive impacts on family farm entrepreneurial performance. And organizational legitimacy also has a positive impact on family farm entrepreneurial performance. In addition, the mediating effect of organizational legitimacy in opportunity recognition competence, network competence, resource acquisition competence, entrepreneurial learning competence, and family farm entrepreneurial performance are supported by data. However, organizational legitimacy does not play a significant mediating role in the relationship between improvisational competence and family farm entrepreneurial performance. The research findings provide some enlightenment and reflections to family farm entrepreneurs and policy-makers.
History of scholarship and learning. The humanities, Social Sciences