Genomics is a Big Data science and is going to get much bigger, very soon, but it is not known whether the needs of genomics will exceed other Big Data domains. Projecting to the year 2025, we compared genomics with three other major generators of Big Data: astronomy, YouTube, and Twitter. Our estimates show that genomics is a “four-headed beast”—it is either on par with or the most demanding of the domains analyzed here in terms of data acquisition, storage, distribution, and analysis. We discuss aspects of new technologies that will need to be developed to rise up and meet the computational challenges that genomics poses for the near future. Now is the time for concerted, community-wide planning for the “genomical” challenges of the next decade.
This article examines how the availability of Big Data, coupled with new data analytics, challenges established epistemologies across the sciences, social sciences and humanities, and assesses the extent to which they are engendering paradigm shifts across multiple disciplines. In particular, it critically explores new forms of empiricism that declare ‘the end of theory’, the creation of data-driven rather than knowledge-driven science, and the development of digital humanities and computational social sciences that propose radically different ways to make sense of culture, history, economy and society. It is argued that: (1) Big Data and new data analytics are disruptive innovations which are reconfiguring in many instances how research is conducted; and (2) there is an urgent need for wider critical reflection within the academy on the epistemological implications of the unfolding data revolution, a task that has barely begun to be tackled despite the rapid changes in research practices presently taking place. After critically reviewing emerging epistemological positions, it is contended that a potentially fruitful approach would be the development of a situated, reflexive and contextually nuanced epistemology.
There is no doubt that big data are now rapidly expanding in all science and engineering domains. While the potential of these massive data is undoubtedly significant, fully making sense of them requires new ways of thinking and novel learning techniques to address the various challenges. In this paper, we present a literature survey of the latest advances in researches on machine learning for big data processing. First, we review the machine learning techniques and highlight some promising learning methods in recent studies, such as representation learning, deep learning, distributed and parallel learning, transfer learning, active learning, and kernel-based learning. Next, we focus on the analysis and discussions about the challenges and possible solutions of machine learning for big data. Following that, we investigate the close connections of machine learning with signal processing techniques for big data processing. Finally, we outline several open issues and research trends.
An important feature of modern science and engineering is that data of various kinds is being produced at an unprecedented rate. This is so in part because of new experimental methods, and in part because of the increase in the availability of high powered computing technology. It is also clear that the nature of the data we are obtaining is significantly different. For example, it is now often the case that we are given data in the form of very long vectors, where all but a few of the coordinates turn out to be irrelevant to the questions of interest, and further that we don’t necessarily know which coordinates are the interesting ones. A related fact is that the data is often very high-dimensional, which severely restricts our ability to visualize it. The data obtained is also often much noisier than in the past and has more missing information (missing data). This is particularly so in the case of biological data, particularly high throughput data from microarray or other sources. Our ability to analyze this data, both in terms of quantity and the nature of the data, is clearly not keeping pace with the data being produced. In this paper, we will discuss how geometry and topology can be applied to make useful contributions to the analysis of various kinds of data. Geometry and topology are very natural tools to apply in this direction, since geometry can be regarded as the study of distance functions, and what one often works with are distance functions on large finite sets of data. The mathematical formalism which has been developed for incorporating geometric and topological techniques deals with point clouds, i.e. finite sets of points equipped with a distance function. It then adapts tools from the various branches of geometry to the study of point clouds. The point clouds are intended to be thought of as finite samples taken from a geometric object, perhaps with noise. Here are some of the key points which come up when applying these geometric methods to data analysis. • Qualitative information is needed: One important goal of data analysis is to allow the user to obtain knowledge about the data, i.e. to understand how it is organized on a large scale. For example, if we imagine that we are looking at a data set constructed somehow from diabetes patients, it would be important to develop the understanding that there are two types of the disease, namely the juvenile and adult onset forms. Once that is established, one of course wants to develop quantitative methods for distinguishing them, but the first insight about the distinct forms of the disease is key.
To meet collective obligations towards biodiversity conservation and monitoring, it is essential that the world's governments and non-governmental organisations as well as the research community tap all possible sources of data and information, including new, fast-growing sources such as citizen science (CS), in which volunteers participate in some or all aspects of environmental assessments. Through compilation of a database on CS and community-based monitoring (CBM, a subset of CS) programs, we assess where contributions from CS and CBM are significant and where opportunities for growth exist. We use the Essential Biodiversity Variable framework to describe the range of biodiversity data needed to track progress towards global biodiversity targets, and we assess strengths and gaps in geographical and taxonomic coverage. Our results show that existing CS and CBM data particularly provide large-scale data on species distribution and population abundance, species traits such as phenology, and ecosystem function variables such as primary and secondary productivity. Only birds, Lepidoptera and plants are monitored at scale. Most CS schemes are found in Europe, North America, South Africa, India, and Australia. We then explore what can be learned from successful CS/CBM programs that would facilitate the scaling up of current efforts, how existing strengths in data coverage can be better exploited, and the strategies that could maximise the synergies between CS/CBM and other approaches for monitoring biodiversity, in particular from remote sensing. More and better targeted funding will be needed, if CS/CBM programs are to contribute further to international biodiversity monitoring.
ABSTRACT Big Data has emerged in the past few years as a new paradigm providing abundant data and opportunities to improve and/or enable research and decision-support applications with unprecedented value for digital earth applications including business, sciences and engineering. At the same time, Big Data presents challenges for digital earth to store, transport, process, mine and serve the data. Cloud computing provides fundamental support to address the challenges with shared computing resources including computing, storage, networking and analytical software; the application of these resources has fostered impressive Big Data advancements. This paper surveys the two frontiers – Big Data and cloud computing – and reviews the advantages and consequences of utilizing cloud computing to tackling Big Data in the digital earth and relevant science domains. From the aspects of a general introduction, sources, challenges, technology status and research opportunities, the following observations are offered: (i) cloud computing and Big Data enable science discoveries and application developments; (ii) cloud computing provides major solutions for Big Data; (iii) Big Data, spatiotemporal thinking and various application domains drive the advancement of cloud computing and relevant technologies with new requirements; (iv) intrinsic spatiotemporal principles of Big Data and geospatial sciences provide the source for finding technical and theoretical solutions to optimize cloud computing and processing Big Data; (v) open availability of Big Data and processing capability pose social challenges of geospatial significance and (vi) a weave of innovations is transforming Big Data into geospatial research, engineering and business values. This review introduces future innovations and a research agenda for cloud computing supporting the transformation of the volume, velocity, variety and veracity into values of Big Data for local to global digital earth science and applications.
Poor research design and data analysis encourage false-positive findings. Such poor methods persist despite perennial calls for improvement, suggesting that they result from something more than just misunderstanding. The persistence of poor methods results partly from incentives that favour them, leading to the natural selection of bad science. This dynamic requires no conscious strategizing—no deliberate cheating nor loafing—by scientists, only that publication is a principal factor for career advancement. Some normative methods of analysis have almost certainly been selected to further publication instead of discovery. In order to improve the culture of science, a shift must be made away from correcting misunderstandings and towards rewarding understanding. We support this argument with empirical evidence and computational modelling. We first present a 60-year meta-analysis of statistical power in the behavioural sciences and show that power has not improved despite repeated demonstrations of the necessity of increasing power. To demonstrate the logical consequences of structural incentives, we then present a dynamic model of scientific communities in which competing laboratories investigate novel or previously published hypotheses using culturally transmitted research methods. As in the real world, successful labs produce more ‘progeny,’ such that their methods are more often copied and their students are more likely to start labs of their own. Selection for high output leads to poorer methods and increasingly high false discovery rates. We additionally show that replication slows but does not stop the process of methodological deterioration. Improving the quality of research requires change at the institutional level.
Big data has fundamentally changed the way organizations manage, analyze and leverage data in any industry. One of the most promising fields where big data can be applied to make a change is healthcare. Big healthcare data has considerable potential to improve patient outcomes, predict outbreaks of epidemics, gain valuable insights, avoid preventable diseases, reduce the cost of healthcare delivery and improve the quality of life in general. However, deciding on the allowable uses of data while preserving security and patient’s right to privacy is a difficult task. Big data, no matter how useful for the advancement of medical science and vital to the success of all healthcare organizations, can only be used if security and privacy issues are addressed. To ensure a secure and trustworthy big data environment, it is essential to identify the limitations of existing solutions and envision directions for future research. In this paper, we have surveyed the state-of-the-art security and privacy challenges in big data as applied to healthcare industry, assessed how security and privacy issues occur in case of big healthcare data and discussed ways in which they may be addressed. We mainly focused on the recently proposed methods based on anonymization and encryption, compared their strengths and limitations, and envisioned future research directions.
Data mining technology can search for potentially valuable knowledge from a large amount of data, mainly divided into data preparation and data mining, and expression and analysis of results. It is a mature information processing technology and applies database technology. Database technology is a software science that researches manages, and applies databases. The data in the database are processed and analyzed by studying the underlying theory and implementation methods of the structure, storage, design, management, and application of the database. We have introduced several databases and data mining techniques to help a wide range of clinical researchers better understand and apply database technology.
Timothy R. Hannigan, Richard F.J. Haans, Keyvan Vakili
et al.
Increasingly, management researchers are using topic modeling, a new method borrowed from computer science, to reveal phenomenon-based constructs and grounded conceptual relationships in textual da...
Leopold Talirz, Snehal Kumbhar, Elsa Passaro
et al.
Materials Cloud is a platform designed to enable open and seamless sharing of resources for computational science, driven by applications in materials modelling. It hosts (1) archival and dissemination services for raw and curated data, together with their provenance graph, (2) modelling services and virtual machines, (3) tools for data analytics, and pre-/post-processing, and (4) educational materials. Data is citable and archived persistently, providing a comprehensive embodiment of entire simulation pipelines (calculations performed, codes used, data generated) in the form of graphs that allow retracing and reproducing any computed result. When an AiiDA database is shared on Materials Cloud, peers can browse the interconnected record of simulations, download individual files or the full database, and start their research from the results of the original authors. The infrastructure is agnostic to the specific simulation codes used and can support diverse applications in computational science that transcend its initial materials domain.
For over a decade the term “Big data” has been used to describe the rapid increase in volume, variety and velocity of information available, not just in medical research but in almost every aspect of our lives. As scientists, we now have the capacity to rapidly generate, store and analyse data that, only a few years ago, would have taken many years to compile. However, “Big data” no longer means what it once did. The term has expanded and now refers not to just large data volume, but to our increasing ability to analyse and interpret those data. Tautologies such as “data analytics” and “data science” have emerged to describe approaches to the volume of available information as it grows ever larger. New methods dedicated to improving data collection, storage, cleaning, processing and interpretation continue to be developed, although not always by, or for, medical researchers. Exploiting new tools to extract meaning from large volume information has the potential to drive real change in clinical practice, from personalized therapy and intelligent drug design to population screening and electronic health record mining. As ever, where new technology promises “Big Advances,” significant challenges remain. Here we discuss both the opportunities and challenges posed to biomedical research by our increasing ability to tackle large datasets. Important challenges include the need for standardization of data content, format, and clinical definitions, a heightened need for collaborative networks with sharing of both data and expertise and, perhaps most importantly, a need to reconsider how and when analytic methodology is taught to medical researchers. We also set “Big data” analytics in context: recent advances may appear to promise a revolution, sweeping away conventional approaches to medical science. However, their real promise lies in their synergy with, not replacement of, classical hypothesis-driven methods. The generation of novel, data-driven hypotheses based on interpretable models will always require stringent validation and experimental testing. Thus, hypothesis-generating research founded on large datasets adds to, rather than replaces, traditional hypothesis driven science. Each can benefit from the other and it is through using both that we can improve clinical practice.
Data sharing, research ethics, and incentives must improve The field of computational social science (CSS) has exploded in prominence over the past decade, with thousands of papers published using observational data, experimental designs, and large-scale simulations that were once unfeasible or unavailable to researchers. These studies have greatly improved our understanding of important phenomena, ranging from social inequality to the spread of infectious diseases. The institutions supporting CSS in the academy have also grown substantially, as evidenced by the proliferation of conferences, workshops, and summer schools across the globe, across disciplines, and across sources of data. But the field has also fallen short in important ways. Many institutional structures around the field—including research ethics, pedagogy, and data infrastructure—are still nascent. We suggest opportunities to address these issues, especially in improving the alignment between the organization of the 20th-century university and the intellectual requirements of the field.
ABSTRACT To assess whether surgical treatment with predominantly volar locking plates has superior clinical and radiographic outcomes to conservative treatment with cast immobilization in the treatment of distal radius fractures by reviewing and investigating the literature with a high level of evidence. Thus, we systematically searched PubMed, Web of Science, and Embase databases for clinical trials comparing surgical and conservative treatments for distal radius fractures. Data extraction was performed to access parameters that included: wrist functional assessment, such as mobility and grip strength; subjective outcomes, such as DASH score, PRWE score, and quality of life score (EQ‐5D); radiographic assessment (palmar inclination, ulnar variance, and articular subluxation, etc.); and complications, such as fracture nonunion, reoperation, infection, and neurologic symptoms, and then to conduct data organization and quantitative synthesis. Finally, a total of 19 clinical trials with 2729 patients were included in this report, including 1378 in the conservative treatment group and 1351 in the surgical treatment group. In terms of recovery of wrist function, wrist joints undergoing cast immobilization gained a greater range of extension (MD 1.5°, p = 0.02), whereas surgical treatment of the range of wrist rotation was significantly greater than with conservative treatment (MD 3°, p = 0.03), and wrist grip strength was significantly improved (MD 2 kg, p = 0.04). There were no significant differences between the two groups of patients in terms of wrist flexion and anterior rotation activities. In terms of patient subjective scores, patients in the plaster immobilization group showed significant improvement in PRWE scores compared with the surgical treatment group at 3 and 12 months after treatment (MD 3–7 points, p < 0.05). There was no significant difference between the two groups in terms of ulnar varus and dorsal tilt angle correction (p > 0.05). At the level of complications, the overall complication rate was significantly lower in patients who underwent surgical treatment than in the conservative treatment group (202/1046 [19.3%] vs. 361/1065 [33.9%], p = 0.001). The conclusions drawn were that when treating distal radius fractures, surgical treatment with predominantly metacarpal plate fixation achieves superior anatomical fracture repositioning and a lower complication rate compared to cast immobilization. However, no significant advantage was demonstrated in terms of recovery of wrist function and subjective patient perception. Overall, surgical treatment is an appropriate choice for patients with high motor function requirements who need to recover quickly, whereas more conservative treatments can be considered for patients with lower functional requirements.
Real-time dynamic capture of a single moving target is one of the most crucial and representative tasks in UAV capture problems. This paper proposes a multi-UAV real-time dynamic capture strategy based on a differential game model to address this challenge. In this paper, the dynamic capture problem is divided into two parts: pursuit and capture. First, in the pursuit–evasion problem based on differential games, the capture UAVs and the target UAV are treated as adversarial parties engaged in a game. The current pursuit–evasion state is modeled and analyzed according to varying environmental information, allowing the capture UAVs to quickly track the target UAV. The Nash equilibrium solution in the differential game is optimal for both parties in the pursuit–evasion process. Then, a collaborative multi-UAV closed circular pipeline control method is proposed to ensure an even distribution of capture UAVs around the target, preventing excessive clustering and thereby significantly improving capture efficiency. Finally, simulations and real-flight experiments are conducted on the RflySim platform in typical scenarios to analyze the computational process and verify the effectiveness of the proposed method. Results indicate that this approach effectively provides a solution for multi-UAV dynamic capture and achieves desirable capture outcomes.