Automatic Detection of Fake News
Verónica Pérez-Rosas, Bennett Kleinberg, Alexandra Lefevre
et al.
The proliferation of misleading information in everyday access media outlets such as social media feeds, news blogs, and online newspapers have made it challenging to identify trustworthy news sources, thus increasing the need for computational tools able to provide insights into the reliability of online content. In this paper, we focus on the automatic identification of fake content in online news. Our contribution is twofold. First, we introduce two novel datasets for the task of fake news detection, covering seven different news domains. We describe the collection, annotation, and validation process in detail and present several exploratory analyses on the identification of linguistic differences in fake and legitimate news content. Second, we conduct a set of learning experiments to build accurate fake news detectors, and show that we can achieve accuracies of up to 76%. In addition, we provide comparative analyses of the automatic and manual identification of fake news.
898 sitasi
en
Computer Science
Are Emily and Greg More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination
Marianne Bertrand, S. Mullainathan
3437 sitasi
en
Materials Science, Economics
Data Management and Analysis Methods
G. Ryan, H. Bernard
Uses and Gratifications Theory in the 21st Century
Thomas E. Ruggiero
Framing European politics: a content analysis of press and television news
H. Semetko, P. Valkenburg
We investigated the prevalence of 5 news frames identified in earlier studies on framing and framing effects: attribution of responsibility, conflict, human interest, economic consequences, and morality. We content analyzed 2,601 newspaper stories and 1,522 television news stories in the period surrounding the Amsterdam meetings of European heads of state in 1997. Our results showed that, overall, the attribution of responsibility frame was most commonly used in the news, followed by the conflict, economic consequences, human interest, and morality frames, respectively. The use of news frames depended on both the type of outlet and the type of topic. Most significant differences were not between media (television vs. the press) but between sensationalist vs. serious types of news outlets. Sober and serious newspapers and television news programs more often used the responsibility and conflict frames in the presentation of news, whereas sensationalist outlets more often used the human interest frame.
2478 sitasi
en
Political Science
Private Benefits of Control: An International Comparison
Alexander Dyck, Luigi Zingales, John G. Matsusaka
et al.
2569 sitasi
en
Business, Economics
Mars in the Australian Press, 1875-1899. 1. Interpretation, Authority and Planetary Science
Richard de Grijs
[Abridged] In the late nineteenth century, Mars emerged as one of the most intensively reported astronomical objects in the popular press, driven by favourable oppositions, improved telescopic capabilities and growing speculation regarding planetary habitability. I examine how Mars was interpreted in Australian newspapers between the 1870s and 1899, focusing on the ways in which astronomical knowledge was framed, contextualised and debated within a colonial media environment. Drawing on a large collection of digitised newspaper articles, I analyse how observational authority, instrumental credibility and individual expertise were harnessed in press reporting. The paper situates Australian Mars coverage within a global network of scientific communication dominated by metropolitan centres in Europe and North America, while highlighting the distinctive role played by southern-hemisphere visibility. Australian observatories and observers were frequently positioned as contributors of confirmatory observation rather than interpretive leadership, reinforcing a pattern of locally grounded but internationally oriented scientific engagement. The analysis traces a shift from early emphasis on disciplined observation and measurement to later periods characterised by contested interpretations, particularly surrounding the so-called Martian "canals" and the speculative claims advanced by personalities such as Percival Lowell in the USA. By examining how newspapers mediated between observational astronomy, engineering analogies and popular imagination, this study contributes to a broader understanding of how planetary science entered public discourse beyond metropolitan centres. In doing so, it underscores the active role of colonial newspapers in shaping scientific meaning and situates Australian Mars reporting within the wider history of nineteenth-century astronomical culture.
en
physics.hist-ph, astro-ph.EP
A Survey of OCR Evaluation Methods and Metrics and the Invisibility of Historical Documents
Fitsum Sileshi Beyene, Christopher L. Dancy
Optical character recognition (OCR) and document understanding systems increasingly rely on large vision and vision-language models, yet evaluation remains centered on modern, Western, and institutional documents. This emphasis masks system behavior in historical and marginalized archives, where layout, typography, and material degradation shape interpretation. This study examines how OCR and document understanding systems are evaluated, with particular attention to Black historical newspapers. We review OCR and document understanding papers, as well as benchmark datasets, which are published between 2006 and 2025 using the PRISMA framework. We look into how the studies report training data, benchmark design, and evaluation metrics for vision transformer and multimodal OCR systems. During the review, we found that Black newspapers and other community-produced historical documents rarely appear in reported training data or evaluation benchmarks. Most evaluations emphasize character accuracy and task success on modern layouts. They rarely capture structural failures common in historical newspapers, including column collapse, typographic errors, and hallucinated text. To put these findings into perspective, we use previous empirical studies and archival statistics from significant Black press collections to show how evaluation gaps lead to structural invisibility and representational harm. We propose that these gaps occur due to organizational (meso) and institutional (macro) behaviors and structure, shaped by benchmark incentives and data governance decisions.
Ilmalikud laulud infoallikate ja haridusvahenditena talurahvavalgustuse ajal
Māra Grudule
The written and oral culture of the Baltic indigenous peoples underwent gradual changes in the late 18th and 19th centuries. According to Wolfgang Welsch, vision is linked with knowledge and science, while hearing relates to faith and religion (Welsch 1996: 248) – this distinction shaped the interaction between oral and written culture. Among Baltic peasants, oral culture remained dominant until the mid-19th century, with the German clergy continuing to control the information space despite ongoing social change. During the Enlightenment, secular Latvian literature began to emerge. Gotthard Friedrich Stender (1714–1796), a German pastor from Kurzeme, laid the foundation for Latvian secular prose, poetry, and popular science literature. However, his songs, the so-called ziņģes, proved more influential than his prose. The songs combine entertainment with moral instruction on drinking, social harmony, and education. Around the turn of the 19th century, major transformations occurred: the territory of present-day Latvia was incorporated into the Russian Empire, Napoleon’s campaigns threatened the region, serfdom was abolished, and a Latvian school network was created. The public demanded information, which was shared through church sermons and, from the 1820s onward, through Latvian newspapers. Supported by Baltic German pastors, the first generation of Latvian intellectuals emerged. By the 1830s, they actively sought to merge oral and written traditions, adapting elements of the Baltic Germans’ peasant Enlightenment project for the purposes of the Latvian national awakening. This paper examines how three key events of the early 19th century – Napoleon’s campaigns and Latvian recruitment into the Russian army, the abolition of serfdom, and the rise of Latvian schools – were reflected in Latvian songs. It analyzes songs published in Latvian newspapers, in books, and on flyers, and it explores the differing perspectives of Baltic Germans and Latvians.
Other Finnic languages and dialects
Southern Newswires: A Large-Scale Study of Mid-Century Wire Content Beyond the Front Page
Michael McRae
This paper describes the construction of a large-scale corpus of historical wire articles from U.S. Southern newspapers, spanning 1960-1975 and covering multiple wire services (e.g., Associated Press, United Press International, Newspaper Enterprise Association). Unlike prior work that focuses primarily on front-page content, the corpus captures wire-sourced articles across the entire newspaper, offering broader insight into mid-century Southern news coverage. The analysis incorporates both raw OCR text and a version processed through an LLM-based text correction pipeline designed to reduce OCR noise and improve suitability for quantitative text analysis. Multiple versions of the same wire dispatch are retained, allowing for the study of editorial differences in language and framing across newspapers. Articles are classified by wire service, enabling comparative analysis of editorial patterns across agencies. Together, these features provide a detailed perspective on how Southern newspapers transmitted national and international news during a transformative period in American history.
A Unified BERT-CNN-BiLSTM Framework for Simultaneous Headline Classification and Sentiment Analysis of Bangla News
Mirza Raquib, Munazer Montasir Akash, Tawhid Ahmed
et al.
In our daily lives, newspapers are an essential information source that impacts how the public talks about present-day issues. However, effectively navigating the vast amount of news content from different newspapers and online news portals can be challenging. Newspaper headlines with sentiment analysis tell us what the news is about (e.g., politics, sports) and how the news makes us feel (positive, negative, neutral). This helps us quickly understand the emotional tone of the news. This research presents a state-of-the-art approach to Bangla news headline classification combined with sentiment analysis applying Natural Language Processing (NLP) techniques, particularly the hybrid transfer learning model BERT-CNN-BiLSTM. We have explored a dataset called BAN-ABSA of 9014 news headlines, which is the first time that has been experimented with simultaneously in the headline and sentiment categorization in Bengali newspapers. Over this imbalanced dataset, we applied two experimental strategies: technique-1, where undersampling and oversampling are applied before splitting, and technique-2, where undersampling and oversampling are applied after splitting on the In technique-1 oversampling provided the strongest performance, both headline and sentiment, that is 78.57\% and 73.43\% respectively, while technique-2 delivered the highest result when trained directly on the original imbalanced dataset, both headline and sentiment, that is 81.37\% and 64.46\% respectively. The proposed model BERT-CNN-BiLSTM significantly outperforms all baseline models in classification tasks, and achieves new state-of-the-art results for Bangla news headline classification and sentiment analysis. These results demonstrate the importance of leveraging both the headline and sentiment datasets, and provide a strong baseline for Bangla text classification in low-resource.
From Brazil to The Planet: politics of race in the american black press
Lívia Maria Tiéde
In the passage of the nineteenth to the twentieth century, Brazilian and American Black intellectuals expressed their views on racial disputes over the Black press. This article evaluates views of Black politics in the U.S. and Brazil as an essential grounding of the struggle for civil rights, understanding the past of both countries as emerging from a collective experience of discrimination. The primary sources are the letters written by J. S. Moore, an African American intellectual who lived in Bahia, Brazil. Moore addressed his writings to newspapers such as the New York Amsterdam News, Chicago Defender, and particularly the Richmond Planet. His critics' articles debated directly with editors from 1917 to the 1930s. The dialogue between the American continents also sheds light on Brazilians’ racial issues. It gives us a more complex perspective of the global fight against racism and makes it possible to understand the potential choices for resistance among diasporic societies.
History (General), Social Sciences
Study of lexical creativity in the French newspaper Le Monde during the Covid-19 pandemic
Anaïs TAZAMOUCHT, Fatima Zohra SAKRANE & Karen FERREIRA-MEYERS
Abstract: Our research work is part of applied linguistics, and aims to analyze, more precisely, the dynamics of neologisms present in one of the most read newspapers in France that is «Le Monde». The electronic version of the articles in this journal published during the Covid-19 pandemic is, in our research project, the corpus on which we relied throughout our research. Moreover, our study is a lexico-semantic approach of all the new words collected in this journal. The method we adopted consists of a double analysis namely quantitative and qualitative. The first is the statistical analysis of the neologisms collected. The second allows us to clarify the different processes of formation of these new lexicons that emerge in French society. The results obtained show that the journalists of the newspaper Le Monde, integrate in their writings, neological lexicons from distinct typologies and grammatical classes, and whose creation molds are also diverse.
Keywords: applied linguistics, neologisms, lexico-semantic approach, «Le Monde» journal, training process
Media Manipulations in the Coverage of Events of the Ukrainian Revolution of Dignity: Historical, Linguistic, and Psychological Approaches
Ivan Khoma, Solomia Fedushko, Zoryana Kunch
This article examines the use of manipulation in the coverage of events of the Ukrainian Revolution of Dignity in the mass media, namely in the content of the online newspaper Ukrainian Truth (Ukrainska pravda), online newspaper High Castle (Vysokyi Zamok), and online newspaper ZIK during the public protest, namely during the Ukrainian Revolution of Dignity. Contents of these online newspapers the historical, linguistic, and psychological approaches are used. Also media manipulations in the coverage of events of the Ukrainian Revolution of Dignity are studied. Internet resources that cover news are analyzed. Current and most popular Internet resources are identified. The content of online newspapers is analyzed and statistically processed. Internet content of newspapers by the level of significance of data (very significant data, significant data and insignificant data) is classified. The algorithm of detection of the media manipulations in the highlighting the course of the Ukrainian revolutions based on historical, linguistic, and psychological approaches is designed. Methods of counteracting information attacks in online newspapers are developed.
Newswire: A Large-Scale Structured Database of a Century of Historical News
Emily Silcock, Abhishek Arora, Luca D'Amico-Wong
et al.
In the U.S. historically, local newspapers drew their content largely from newswires like the Associated Press. Historians argue that newswires played a pivotal role in creating a national identity and shared understanding of the world, but there is no comprehensive archive of the content sent over newswires. We reconstruct such an archive by applying a customized deep learning pipeline to hundreds of terabytes of raw image scans from thousands of local newspapers. The resulting dataset contains 2.7 million unique public domain U.S. newswire articles, written between 1878 and 1977. Locations in these articles are georeferenced, topics are tagged using customized neural topic classification, named entities are recognized, and individuals are disambiguated to Wikipedia using a novel entity disambiguation model. To construct the Newswire dataset, we first recognize newspaper layouts and transcribe around 138 millions structured article texts from raw image scans. We then use a customized neural bi-encoder model to de-duplicate reproduced articles, in the presence of considerable abridgement and noise, quantifying how widely each article was reproduced. A text classifier is used to ensure that we only include newswire articles, which historically are in the public domain. The structured data that accompany the texts provide rich information about the who (disambiguated individuals), what (topics), and where (georeferencing) of the news that millions of Americans read over the course of a century. We also include Library of Congress metadata information about the newspapers that ran the articles on their front pages. The Newswire dataset is useful both for large language modeling - expanding training data beyond what is available from modern web texts - and for studying a diversity of questions in computational linguistics, social science, and the digital humanities.
THE 1923 SEPTEMBER UPRISING IN BULGARIA AS REPORTED BY SOVIET PROVINCIAL NEWSPAPERS
Bryantsev M.V.
The current historiographical situation in the study of the events of September 1923 in Bulgaria determined the purpose of the study, which is to analyze the publications of the September events on the pages of Soviet provincial newspapers, which reflected the official position of the authorities, who tried to form a certain image of what was happening in Bulgaria. We analyzed newspaper materials, mostly from provincial Soviet newspapers from 1923. We considered not only informational materials, editorials, but also the authors’ analytical articles and reprints from central newspapers, most often Pravda and Izvestiya. Tsankov’s assumption of power on 12 June 1923 and his repressive policy toward peasant revolts and communists placed news from Bulgaria among the most topical, along with reports from Poland and, of course, Germany. The newspapers were forced to admit that the Bulgarian Communists were not ready for an uprising, although as early as August 1923 it had been decided to prepare an armed uprising in the next 2-3 years. With some delay the Soviet newspapers began to write about the success of the uprising, defining these events as a civil war and even a revolution. The defeat that followed the uprising was explained by the Bulgarian and Soviet Communists in a completely biased way. The leaders of the uprising, Kolarov and Dimitrov, saw the cause in the betrayal of one of the members of the Sofia Revolutionary Committee. Others looked for these causes in insufficient organization and lack of weapons. All this is evidence not only of a superficial assessment of events, but also of a wishful thinking. The optimism reflected in the publications about the fate of the proletarian revolution in Bulgaria in the future also testifies to this.
Archaeology, Law in general. Comparative and uniform law. Jurisprudence
Forecasting Future News Deserts
Edward Malthouse, Jaewon Choi, Zach Metzger
et al.
This article builds a model to forecast the number of newspapers that will exist in each US county in 2028, based on what is known about each county in 2023. The methodology is to use information known in 2018 to predict the number of newspapers in 2023. Having estimated the model parameters, we apply it to 2023 data. The model is based on market demographic characteristics and allows for different effects (slopes) for large, medium and small markets (population segments). While the main contribution is forecasting, we interpret the parameter estimates for validation. We find that the best predictor of the number of newspapers in five years is the current number of newspapers. Population size also has a positive association with newspapers. Average age and median income have positive slopes, but not in all population segments. The proportions of Blacks, and separately Hispanics, in a county have negative associations with the number of newspapers, but not in all population segments. The report provides maps showing which counties that are currently news deserts could be revived, which counties that currently have one newspaper are more at risk of losing it, and which counties with two or more newspapers are at risk. We also study the model residuals showing which counties are under- or over-performing relative to the market conditions.
Tender Notice Extraction from E-papers Using Neural Network
Ashmin Bhattarai, Anuj Sedhai, Devraj Neupane
et al.
Tender notices are usually sought by most of the companies at regular intervals as a means for obtaining the contracts of various projects. These notices consist of all the required information like description of the work, period of construction, estimated amount of project, etc. In the context of Nepal, tender notices are usually published in national as well as local newspapers. The interested bidders should search all the related tender notices in newspapers. However, it is very tedious for these companies to manually search tender notices in every newspaper and figure out which bid is best suited for them. This project is built with the purpose of solving this tedious task of manually searching the tender notices. Initially, the newspapers are downloaded in PDF format using the selenium library of python. After downloading the newspapers, the e-papers are scanned and tender notices are automatically extracted using a neural network. For extraction purposes, different architectures of CNN namely ResNet, GoogleNet and Xception are used and a model with highest performance has been implemented. Finally, these extracted notices are then published on the website and are accessible to the users. This project is helpful for construction companies as well as contractors assuring quality and efficiency. This project has great application in the field of competitive bidding as well as managing them in a systematic manner.
BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset
Md. Istiak Hossain Shihab, Md. Rakibul Hasan, Mahfuzur Rahman Emon
et al.
While strides have been made in deep learning based Bengali Optical Character Recognition (OCR) in the past decade, the absence of large Document Layout Analysis (DLA) datasets has hindered the application of OCR in document transcription, e.g., transcribing historical documents and newspapers. Moreover, rule-based DLA systems that are currently being employed in practice are not robust to domain variations and out-of-distribution layouts. To this end, we present the first multidomain large Bengali Document Layout Analysis Dataset: BaDLAD. This dataset contains 33,695 human annotated document samples from six domains - i) books and magazines, ii) public domain govt. documents, iii) liberation war documents, iv) newspapers, v) historical newspapers, and vi) property deeds, with 710K polygon annotations for four unit types: text-box, paragraph, image, and table. Through preliminary experiments benchmarking the performance of existing state-of-the-art deep learning architectures for English DLA, we demonstrate the efficacy of our dataset in training deep learning based Bengali document digitization models.
«Illuminare il senso delle ricerche di oggi». Marisa Volpi Orlandini, critica militante negli anni sessanta/'Illuminating the meaning of today's research'. Marisa Volpi Orlandini, militant critic in the 1960s
Sonia Chianchiano
Il saggio vuole indagare il ruolo di critica militante ricoperto da Marisa Volpi Orlandini nel corso degli anni sessanta e il suo contributo nell’aggiornare il pubblico italiano sulle ricerche artistiche contemporanee, affermando il proprio sguardo in un panorama artistico-culturale in costante mutamento. Lo studio vuole mettere in luce gli anni di formazione, l'incontro con alcune figure chiave come Roberto Longhi, Carla Lonzi e Giulio Carlo Argan, e il suo percorso di critica militante costruito con coerenza e impostato sulla tradizione e sulla rielaborazione individuale di un metodo assimilato nel tempo.
Una ricerca sviluppata grazie ai documenti conservati negli anni dalla studiosa e alle numerose pubblicazioni che l’hanno vista protagonista firmando presentazioni in catalogo, saggi e contributi sulle principali riviste e i giornali dell’epoca.
The essay aims to investigate the role of militant criticism played by Marisa Volpi Orlandini during the 1960s and her contribution to update the Italian public on contemporary artistic research, affirming her point of view in a constantly changing artistic-cultural scene. The study wants to highlight his formative years, his contact with some key figures such as Roberto Longhi, Carla Lonzi and Giulio Carlo Argan, and his path of militant criticism built with consistency and set on tradition and individual reworking of a method absorbed over time.
A research developed thanks to the documents preserved over the years by the scholar and the numerous publications that saw her as a protagonist signing presentations in catalogs, essays and contributions in the main magazines and newspapers of the period.
Arts in general, Auxiliary sciences of history