Anand Umashankar, Karam Tomotaki-Dawoud, Nicolai Schneider
Remote sensing archives are inherently distributed: Earth observation missions such as Sentinel-1, Sentinel-2, and Sentinel-3 have collectively accumulated more than 5 petabytes of imagery, stored and processed across many geographically dispersed platforms. Training machine learning models on such data in a centralized fashion is impractical due to data volume, sovereignty constraints, and geographic distribution. Federated learning (FL) addresses this by keeping data local and exchanging only model updates. A central challenge for remote sensing is the non-IID nature of Earth observation data: label distributions vary strongly by geographic region, degrading the convergence of standard FL algorithms. In this paper, we conduct a systematic empirical study of three FL strategies -- FedAvg, FedProx, and bulk synchronous parallel (BSP) -- applied to multi-label remote sensing image classification under controlled non-IID label-skew conditions. We evaluate three convolutional neural network (CNN) architectures of increasing depth (LeNet, AlexNet, and ResNet-34) and analyze the joint effect of algorithm choice, model capacity, client fraction, client count, batch size, and communication cost. Experiments on the UC Merced multi-label dataset show that FedProx outperforms FedAvg for deeper architectures under data heterogeneity, that BSP approaches centralized accuracy at the cost of high sequential communication, and that LeNet provides the best accuracy-communication trade-off for the dataset scale considered.
Pesquisa que analisa a representação social e do conhecimento do estado da Guanabara (1960-1975) em fotografias, a partir de fundos e coleções custodiados pelos arquivos públicos sediados no Rio de Janeiro: Arquivo Nacional, Arquivo Geral da Cidade do Rio de Janeiro e Arquivo Público do Estado do Rio de Janeiro, cotejando as fotografias, a descrição e os pontos de acesso atribuídos às imagens.
Palavras-chave: representação social; representação do conhecimento; ciência da informação; fotografias; arquivos; estado da Guanabara; Rio de Janeiro (1960-1975).
Diplomatics. Archives. Seals, Bibliography. Library science. Information resources
Aish Albladi, Md Kaosar Uddin, Minarul Islam
et al.
Sentiment analysis is a crucial task in natural language processing (NLP) that enables the extraction of meaningful insights from textual data, particularly from dynamic platforms like Twitter and IMDB. This study explores a hybrid framework combining transformer-based models, specifically BERT, GPT-2, RoBERTa, XLNet, and DistilBERT, to improve sentiment classification accuracy and robustness. The framework addresses challenges such as noisy data, contextual ambiguity, and generalization across diverse datasets by leveraging the unique strengths of these models. BERT captures bidirectional context, GPT-2 enhances generative capabilities, RoBERTa optimizes contextual understanding with larger corpora and dynamic masking, XLNet models dependency through permutation-based learning, and DistilBERT offers efficiency with reduced computational overhead while maintaining high accuracy. We demonstrate text cleaning, tokenization, and feature extraction using Term Frequency Inverse Document Frequency (TF-IDF) and Bag of Words (BoW), ensure high-quality input data for the models. The hybrid approach was evaluated on benchmark datasets Sentiment140 and IMDB, achieving superior accuracy rates of 94\% and 95\%, respectively, outperforming standalone models. The results validate the effectiveness of combining multiple transformer models in ensemble-like setups to address the limitations of individual architectures. This research highlights its applicability to real-world tasks such as social media monitoring, customer sentiment analysis, and public opinion tracking which offers a pathway for future advancements in hybrid NLP frameworks.
When digitizing historical archives, it is necessary to search for the faces of celebrities and ordinary people, especially in newspapers, link them to the surrounding text, and make them searchable. Existing face detectors on datasets of scanned historical documents fail remarkably -- current detection tools only achieve around 24% mAP at 50:90% IoU. This work compensates for this failure by introducing a new manually annotated domain-specific dataset in the style of the popular Wider Face dataset, containing 2.2k new images from digitized historical newspapers from the 19th to 20th century, with 11k new bounding-box annotations and associated facial landmarks. This dataset allows existing detectors to be retrained to bring their results closer to the standard in the field of face detection in the wild. We report several experimental results comparing different families of fine-tuned detectors against publicly available pre-trained face detectors and ablation studies of multiple detector sizes with comprehensive detection and landmark prediction performance results.
XIX. yüzyılın başından itibaren sınır, ticaret ve aşiretler gibi meseleler dolayısıyla anlaşmazlıklar yaşayan Osmanlı ve İran devletleri, 1820 yılının sonundan itibaren bu anlaşmazlıkları silahlı mücadeleyle çözme yoluna gitti. İran Devleti’nin Veliaht Şehzadesi Abbas Mirza’nın Kars ve Bayezid sınırından Osmanlı topraklarına gerçekleştirdiği saldırılar, iki ülkeyi kaçınılmaz bir savaşa itti. Yaklaşık üç yıl devam eden savaş, Şark ve Bağdat cephelerinde cereyan etti. Büyük oranda Osmanlı ve İran devletlerinin sınır hattında konargöçer olarak yaşayan Kürt aşiretlerinin kimin tebaası olduğu noktasında uzlaşamayan iki devleti tam barışa sevk eden gelişmeler ise, 1823 yılından itibaren başladı. İran’dan gelen elçiler, Osmanlı Devleti’ni, yapılacak barış antlaşmasıyla sorunların çözümüne ikna etmek ve savaş hâline son vermek için çabaladılar. Elçilerin masaya çekmeyi başardığı Osmanlı Devleti’nde antlaşma müzakerelerini yürütmek için Erzurum Valisi Rauf Paşa görevlendirildi. İran ise müzakereler için Muhammed Ali Aştiyani’yi tayin etti. İki devlet arasındaki savaşı bitiren ve barışı tesis eden 1823 I. Erzurum Antlaşması’nın maddelerini oluşturacak metnin müzakereleri, Erzurum’daki valilik sarayında 22 Temmuz 1823 tarihinde gerçekleşti ve görüşmeler neticesinde anlaşmaya varılan hususların maddeler hâline getirilmesine karar verildi. Bu çalışmada, Osmanlı ve İran devletlerini savaşa götüren süreç; savaş, iki ülke arasında barış için başlayan diplomatik çabalar ve müzakereler esnasında yaşanan tartışmalara yer veren bilgiler sunulduktan sonra, mükâleme mazbatasının transkripsiyonuna yer verilecektir.
Este artigo trabalha com o que há de anunciado, significado objetivamente declarado, e com o que está apenas enunciado, dependente de contexto mais amplo de significados enredados entre si, nas narrativas de duas coleções de documentos pessoais do Arquivo Público de Mato Grosso. Adotando como referência as organizações (arranjos) de duas coleções, assinala como as narrativas de si buscaram construir, em espaços de arquivos, “provas públicas” e um “lugar na história”, ao mesmo tempo em que deixaram pistas para a contra-história das negociações pessoais e das contradições sociais.
Palavras-chaves: acervos pessoais; narrativas de si; tratamento documental; Mato Grosso.
Diplomatics. Archives. Seals, Bibliography. Library science. Information resources
This paper explores the value of archival theory as a means of grappling with bias in algorithmic design. Rather than seek to mitigate biases perpetuated by datasets and algorithmic systems, archival theory offers a reframing of bias itself. Drawing on a range of archival theory from the fields of history, literary and cultural studies, Black studies, and feminist STS, we propose absence-as power, presence, and productive-as a concept that might more securely anchor investigations into the causes of algorithmic bias, and that can prompt more capacious, creative, and joyful future work. This essay, in turn, can intervene into the technical as well as the social, historical, and political structures that serve as sources of bias.
Ludovic Courtès, Timothy Sample, Simon Tournier
et al.
The ability to verify research results and to experiment with methodologies are core tenets of science. As research results are increasingly the outcome of computational processes, software plays a central role. GNU Guix is a software deployment tool that supports reproducible software deployment, making it a foundation for computational research workflows. To achieve reproducibility, we must first ensure the source code of software packages Guix deploys remains available.We describe our work connecting Guix with Software Heritage, the universal source code archive, making Guix the first free software distribution and tool backed by a stable archive. Our contribution is twofold: we explain the rationale and present the design and implementation we came up with; second, we report on the archival coverage for package source code with data collected over five years and discuss remaining challenges.
Neste artigo temos como objetivo pensar o arquivo a partir de redes de poder e saber, no que estamos chamando, baseados em Foucault, de dispositivo de arquivo. Buscamos inserir o arquivo em uma grande rede de relações formada por jogos de forças de saberes e de poderes, assim como pensar no arquivista como um elemento essencial na construção dessas narrativas sobre o passado e a manipulação do tempo histórico.
Palavras-chave: dispositivo de arquivo; passado; Foucault; arquivo.
Diplomatics. Archives. Seals, Bibliography. Library science. Information resources
O artigo remonta aos cadernos de anotações de Marc Ferrez, produzidos nas primeiras décadas do século XX, analisando-os como uma escrita autobiográfica e um esforço de construção de memória. Os cadernos evidenciam um empenho de inventariar a própria obra, registrar os conhecimentos tecnológicos e científicos que marcaram a fotografia no limiar da prática fotográfica moderna e produzir uma “escrita de si” que valorizasse o autor como artista, fotógrafo, homem moderno e exímio conhecedor dos processos e tecnologias da imagem fotográfica.
Palavras-chave: coleção, memória, autobiografia, escrita de si, documentos pessoais, Marc Ferrez.
Diplomatics. Archives. Seals, Bibliography. Library science. Information resources
Secure messaging applications often offer privacy to users by protecting their messages from would be observers through end-to-end encryption techniques. However, the metadata of who communicates with whom cannot be concealed by encryption alone. Signal's Sealed Sender mechanism attempts to enhance its protection of this data by obfuscating the sender of any message sent with the protocol. However, it was shown by Martiny et al. that due to the message delivery protocols in Signal, the record of who receives messages can be enough to recover this metadata. In this work we extend the attack presented from deanonymizing communicating pairs to deanonymizing entire group conversations.
We introduce a computational method of dating for an archive in ancient Mesopotamia. We use the name index Nuzi Personal Names (NPN) published in 1943. We made an electronic version of NPN and added the kinships of the two powerful families to NPN to reflect the Nuzi studies after 1943. Nuzi is a town from the 15th - 14th century B.C.E.for a period of some five generations in Arrapha. The cuneiform tablets listed in NPN are for contracts on land transactions, marriage, loans, slavery, etc. In NPN, the kinships and cuneiform tablets (contracts, documents, texts) involved are listed for each person. We reconstruct family trees from the added NPN to formulate the least squares problem with the constraints: a person's father is at least 22.5 years older than the person, contractors were living at the time of the contract, etc. Our results agree with the Assyriological results of M. P. Maidman on the seniority among siblings of a powerful family. Our method could be applied to the other clay tablet archives once we have the name index in the format of NPN.
A partir dos relatórios dos presidentes da província de Goiás, entre 1835 e 1850, o artigo procura dar visibilidade a uma história indígena nas entrelinhas da narrativa sobre a história do Brasil. Trata-se de compreender a dinâmica da política indigenista na região, que reflete, no plano regional, a tensão entre extermínio e brandura característica de discursos e práticas indigenistas da época, bem como a multiplicidade de estratégias de resistência indígena.
Diplomatics. Archives. Seals, Bibliography. Library science. Information resources
Yasith Jayawardana, Alexander C. Nwala, Gavindya Jayawardena
et al.
The vastness of the web imposes a prohibitive cost on building large-scale search engines with limited resources. Crawl frontiers thus need to be optimized to improve the coverage and freshness of crawled content. In this paper, we propose an approach for modeling the dynamics of change in the web using archived copies of webpages. To evaluate its utility, we conduct a preliminary study on the scholarly web using 19,977 seed URLs of authors' homepages obtained from their Google Scholar profiles. We first obtain archived copies of these webpages from the Internet Archive (IA), and estimate when their actual updates occurred. Next, we apply maximum likelihood to estimate their mean update frequency ($λ$) values. Our evaluation shows that $λ$ values derived from a short history of archived data provide a good estimate for the true update frequency in the short-term, and that our method provides better estimations of updates at a fraction of resources compared to the baseline models. Based on this, we demonstrate the utility of archived data to optimize the crawling strategy of web crawlers, and uncover important challenges that inspire future research directions.
Constatando-se que, historicamente, a população negra está presente em maiores proporções em territórios segregados e estigmatizados de diversas cidades brasileiras, o objetivo deste artigo é promover uma discussão sobre os processos de segregação urbana tendo a raça como categoria analítica. Para abordar esse problema, foi empreendida uma análise crítica de estudos que propiciam o debate por meio de diferentes referenciais teórico-metodológicos.
Palavras-chave: segregação urbana; segregação racial; população negra; racismo.
Diplomatics. Archives. Seals, Bibliography. Library science. Information resources
A. A. Shlyapnikov, M. A. Gorbunov, M. A. Gorbachev
et al.
The work described in this article is a continuation of the previously initiated research on archival spectral observations carried out in the Crimea. It covers a time interval of about 90 years and contains information about spectroscopy using various facilities: from the wide-angle astrographs with an objective prism to the main CrAO telescope - ZTSh. A brief history of telescopes and their equipment are presented. The article is illustrated with the possibilities of network access to the catalogues of observations taken with various instruments in the interactive Aladin Sky Atlas with the redirection to the original spectrograms. For this aim, the linear coordinates of the scanned negatives were converted into a scale which corresponds to the wavelengths. The possibilities of taking into account the spectral sensitivity of the recorded images by the absolute energy distribution are shown. A feature of this work is the connection of digitized original observations and the results of their independent processing with the data published for objects in the "Izvestiya of the Crimean Astrophysical Observatory" journals.
Anthony Bagnall, Hoang Anh Dau, Jason Lines
et al.
In 2002, the UCR time series classification archive was first released with sixteen datasets. It gradually expanded, until 2015 when it increased in size from 45 datasets to 85 datasets. In October 2018 more datasets were added, bringing the total to 128. The new archive contains a wide range of problems, including variable length series, but it still only contains univariate time series classification problems. One of the motivations for introducing the archive was to encourage researchers to perform a more rigorous evaluation of newly proposed time series classification (TSC) algorithms. It has worked: most recent research into TSC uses all 85 datasets to evaluate algorithmic advances. Research into multivariate time series classification, where more than one series are associated with each class label, is in a position where univariate TSC research was a decade ago. Algorithms are evaluated using very few datasets and claims of improvement are not based on statistical comparisons. We aim to address this problem by forming the first iteration of the MTSC archive, to be hosted at the website www.timeseriesclassification.com. Like the univariate archive, this formulation was a collaborative effort between researchers at the University of East Anglia (UEA) and the University of California, Riverside (UCR). The 2018 vintage consists of 30 datasets with a wide range of cases, dimensions and series lengths. For this first iteration of the archive we format all data to be of equal length, include no series with missing data and provide train/test splits.