Semantic Scholar Open Access 2019 13 sitasi

The impact of preprocessing in natural language for open source intelligence and criminal investigation

Jan William Johnsen K. Franke

Abstrak

Underground forums serves as gathering place for like-minded cyber criminals and are an continued threat to law and order. Law enforcement agencies can use Open-Source Intelligence (OSINT) to gather valuable information to proactively counter existing and new threats. For example, by shifting criminal investigation’s focus onto certain cyber criminals with large impact in underground forums and related criminal business models. This paper presents our study on text preprocessing requirements and document construction for the topic model algorithm Latent Dirichlet Allocation (LDA). We identify a set of preprocessing requirements based on literature review and demonstrate them on a real-world forum, similar to those used by cyber criminals. Our result show that topic modelling processes needs to follow a very strict procedure to provide significant result that can be useful in OSINT. Additionally, more reliable results are produced by tuning the hyper-parameters and the number of topics for LDA. We demonstrate improved results by iterative preprocessing to continuously improve the model, which provide more coherent and focused topics.

Topik & Kata Kunci

Penulis (2)

J

Jan William Johnsen

K

K. Franke

Format Sitasi

Johnsen, J.W., Franke, K. (2019). The impact of preprocessing in natural language for open source intelligence and criminal investigation. https://doi.org/10.1109/BigData47090.2019.9006006

Akses Cepat

Informasi Jurnal
Tahun Terbit
2019
Bahasa
en
Total Sitasi
13×
Sumber Database
Semantic Scholar
DOI
10.1109/BigData47090.2019.9006006
Akses
Open Access ✓