Un-Sok Pak, Dok-Song Che Gal, Song-Myong Jang
et al.
We investigated the lattice dynamics and transport properties of lead-free quadruple halide perovskites Cs 4 CuSb 2 Cl 12 , demonstrating its dynamic and mechanical stabilities and an ultralow κ l .
Govardhan Pandurangappa, Shubham Ajaykumar Rajput, Raghuram Chetty
et al.
Abstract Photocatalytic CO 2 reduction to value‐added chemicals using perovskites is being explored to mitigate CO 2 emissions. This study examines lead‐free antimony‐based Cs 3 Sb 2 Cl 9 halide perovskites for its photocatalytic CO 2 reduction activity in solid‐liquid mode. The primary product was found to be formic acid (HCOOH) in the liquid phase, along with carbon monoxide (CO) and methane(CH 4 ) in the gaseous phase. Cs 3 Sb 2 Cl 9 displayed a selectivity of 73% toward formic acid with a yield of 54 µmol g −1 h −1 . Further, the selectivity toward formic acid was enhanced to 90% by loading Ir/IrO X as a water oxidation catalyst onto Cs 3 Sb 2 Cl 9 . The Ir/IrO X was loaded by the thermal decomposition of iridium (III) acetylacetonate. The yield of formic acid (HCOOH) on the composite catalyst Cs 3 Sb 2 Cl 9 ‐Ir/IrO X improved to 75 µmol g −1 h −1 , suppressing the formation of gaseous products. This is the highest yield of formic acid reported for this class of materials. The catalyst stability was evaluated using different material characterization techniques. Further, the stability of the catalyst toward formic acid yield was studied throughout a 25‐h reaction.
We studied the capability of automated machine translation in the online video education space by automatically translating Khan Academy videos with state-of-the-art translation models and applying text-to-speech synthesis and audio/video synchronization to build engaging videos in target languages. We also analyzed and established two reliable translation confidence estimators based on round-trip translations in order to efficiently manage translation quality and reduce human translation effort. Finally, we developed a deployable system to deliver translated videos to end users and collect user corrections for iterative improvement.
ChatGPT has demonstrated exceptional proficiency in natural language conversation, e.g., it can answer a wide range of questions while no previous large language models can. Thus, we would like to push its limit and explore its ability to answer causal discovery questions by using a medical benchmark (Tu et al. 2019) in causal discovery.
We propose a novel gradient-based attack against transformer-based language models that searches for an adversarial example in a continuous space of token probabilities. Our algorithm mitigates the gap between adversarial loss for continuous and discrete text representations by performing multi-step quantization in a quantization-compensation loop. Experiments show that our method significantly outperforms other approaches on various natural language processing (NLP) tasks.
Ciphers are a powerful tool for encrypting communication. There are many different cipher types, which makes it computationally expensive to solve a cipher using brute force. In this paper, we frame the decryption task as a classification problem. We first create a dataset of transpositions, substitutions, text reversals, word reversals, sentence shifts, and unencrypted text. Then, we evaluate the performance of various tokenizer-model combinations on this task.
Deep learning holds great promise for detecting discriminatory language in the public sphere. However, for the detection of illegal age discrimination in job advertisements, regex approaches are still strong performers. In this paper, we investigate job advertisements in the Netherlands. We present a qualitative analysis of the benefits of the 'old' approach based on regexes and investigate how neural embeddings could address its limitations.
This paper demonstrates a task to finetune a BART model so it can construct a sentence from an arbitrary set of words, which used to be a difficult NLP task. The training task is making sentences with four words, but the trained model can generate sentences when fewer or more words are provided. The output sentences have high quality in general. The model can have some real-world applications, and this task can be used as an evaluation mechanism for any language model as well.
Reproducibility is an important task in scientific research. It is crucial for researchers to compare newly developed systems with the state-of-the-art to assess whether they made a breakthrough. However previous works may not be immediately reproducible, for example due to the lack of source code. In this work we reproduce DEXTER, a system to automatically extract Gene-Disease Associations (GDAs) from biomedical abstracts. The goal is to provide a benchmark for future works regarding Relation Extraction (RE), enabling researchers to test and compare their results.
The paper describes the work that has been submitted to the 5th workshop on Challenges and Applications of Automated Extraction of socio-political events from text (CASE 2022). The work is associated with Subtask 1 of Shared Task 3 that aims to detect causality in protest news corpus. The authors used different large language models with customized cross-entropy loss functions that exploit annotation information. The experiments showed that bert-based-uncased with refined cross-entropy outperformed the others, achieving a F1 score of 0.8501 on the Causal News Corpus dataset.
Hateful meme detection is a new research area recently brought out that requires both visual, linguistic understanding of the meme and some background knowledge to performing well on the task. This technical report summarises the first place solution of the Hateful Meme Detection Challenge 2020, which extending state-of-the-art visual-linguistic transformers to tackle this problem. At the end of the report, we also point out the shortcomings and possible directions for improving the current methodology.
A simple two-step method of growing ZnO nanorod arrays on the surface of BiOI nanosheets was developed under mild environment. The hierarchical structure of ZnO arrays@BiOI nanosheets was characterized by various measurements like X-ray powder diffraction, scanning electron microscopy, transmission electron microscopy, and energy-dispersive X-ray detector. The optical absorption of the ZnO arrays@BiOI nanosheets composite was investigated by UV-Vis diffuse reflectance spectra. The photocatalytic degradation of methanol orange under visible light shows that the obtained ZnO arrays@BiOI nanosheets heterostructures exhibit enhanced photocatalytic activity, contrasting to the sum of BiOI nanosheets and ZnO nanorods. The mechanism of the photocatalytic process was discussed. This method of growing ZnO nanorod arrays on other nanosheets also provides a potential method to fabricating other complex structures.
Automating the assessment of learner summaries provides a useful tool for assessing learner reading comprehension. We present a summarization task for evaluating non-native reading comprehension and propose three novel approaches to automatically assess the learner summaries. We evaluate our models on two datasets we created and show that our models outperform traditional approaches that rely on exact word match on this task. Our best model produces quality assessments close to professional examiners.
For the task of open domain Knowledge Based Question Answering in CCKS2019, we propose a method combining information retrieval and semantic parsing. This multi-module system extracts the topic entity and the most related relation predicate from a question and transforms it into a Sparql query statement. Our method obtained the F1 score of 70.45% on the test data.
We propose Text2Math, a model for semantically parsing text into math expressions. The model can be used to solve different math related problems including arithmetic word problems and equation parsing problems. Unlike previous approaches, we tackle the problem from an end-to-end structured prediction perspective where our algorithm aims to predict the complete math expression at once as a tree structure, where minimal manual efforts are involved in the process. Empirical results on benchmark datasets demonstrate the efficacy of our approach.
The University of Cambridge submission to the WMT18 news translation task focuses on the combination of diverse models of translation. We compare recurrent, convolutional, and self-attention-based neural models on German-English, English-German, and Chinese-English. Our final system combines all neural models together with a phrase-based SMT system in an MBR-based scheme. We report small but consistent gains on top of strong Transformer ensembles.
Leslie Mareike Schoop, Roland Eger, Jürgen Nuss
et al.
We report the first examples of quinary rare earth thiophosphates with a fully ordered cation and anion distribution, Cs5Ln3X3(P2S6)2(PS4), (Ln = La, Ce and X = Br, Cl) as well as the quasi‐quaternary Cs10Y4Cl10(P2S6)3. These four new compounds crystallize in three different, unknown structure types. The yellowish, transparent, brittle Cs5Ce3Br3(P2S6)2(PS4) crystallizes in the orthorhombic space group Pnma (no. 62) with a = 13.276(3), b = 14.891(3), c = 19.593(4) Å, and V = 3873(1) Å3 in a novel structure type. Colorless crystals of Cs5La3Br3(P2S6)2(PS4) and Cs5La3Cl3(P2S6)2(PS4) are isotypic and were obtained in the monoclinic space group P21/m (no. 11) with a = 9.715(2), b = 14.310(3), c = 13.685(3) Å, β = 100.16(3)° and V = 1873(1) Å3 and a = 9.513(2), b = 14.182(3), c = 13.699(3) Å, β = 99.39(3)° and V = 1823(1) Å3, respectively. Both structures contain isolated hexathiohypodiphosphate(IV) [P2S6]4– and thiophosphate [PS4]3– units that are arranged alternately in layers. Cs10Y4Cl10(P2S6)3 crystallizes in colorless transparent platelets in the orthorhombic space group Pnnm (no. 58) with a = 13.153(3), b = 28.964(6), c = 7.780(2) Å, and V = 2964(1) Å3. The structure is composed of isolated [P4/2S6]4– octahedra containing four half occupied P positions surrounded octahedrally by sulfur. We show with Raman scattering that this disordered thiophosphate anion shows a Raman spectrum that is distinct from spectra published for other literature‐known thiophosphate anions.
The package cleanNLP provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes Stanford's CoreNLP library, exposing a number of annotation tasks for text written in English, French, German, and Spanish. Annotators include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction.
Pre-trained word embeddings improve the performance of a neural model at the cost of increasing the model size. We propose to benefit from this resource without paying the cost by operating strictly at the sub-lexical level. Our approach is quite simple: before task-specific training, we first optimize sub-word parameters to reconstruct pre-trained word embeddings using various distance measures. We report interesting results on a variety of tasks: word similarity, word analogy, and part-of-speech tagging.
Halid Ziya Yerebakan, Fitsum Reda, Yiqiang Zhan
et al.
This paper presents a new Bayesian non-parametric model by extending the usage of Hierarchical Dirichlet Allocation to extract tree structured word clusters from text data. The inference algorithm of the model collects words in a cluster if they share similar distribution over documents. In our experiments, we observed meaningful hierarchical structures on NIPS corpus and radiology reports collected from public repositories.