arXiv Open Access 2024

3DLNews: A Three-decade Dataset of US Local News Articles

Gangani Ariyarathne Alexander C. Nwala
Lihat Sumber

Abstrak

We present 3DLNews, a novel dataset with local news articles from the United States spanning the period from 1996 to 2024. It contains almost 1 million URLs (with HTML text) from over 14,000 local newspapers, TV, and radio stations across all 50 states, and provides a broad snapshot of the US local news landscape. The dataset was collected by scraping Google and Twitter search results. We employed a multi-step filtering process to remove non-news article links and enriched the dataset with metadata such as the names and geo-coordinates of the source news media organizations, article publication dates, etc. Furthermore, we demonstrated the utility of 3DLNews by outlining four applications.

Topik & Kata Kunci

Penulis (2)

G

Gangani Ariyarathne

A

Alexander C. Nwala

Format Sitasi

Ariyarathne, G., Nwala, A.C. (2024). 3DLNews: A Three-decade Dataset of US Local News Articles. https://arxiv.org/abs/2408.04716

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓