DOAJ Open Access 2021

A Graph Database Representation of Portuguese Criminal-Related Documents

Gonçalo Carnaz Vitor Beires Nogueira Mário Antunes

Abstrak

Organizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents <i>SEMCrime</i>, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A <i>5WH1</i> (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.73</mn></mrow></semantics></math></inline-formula>, and a 5W1H information extraction performance with an F-Measure of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.65</mn></mrow></semantics></math></inline-formula>.

Topik & Kata Kunci

Penulis (3)

G

Gonçalo Carnaz

V

Vitor Beires Nogueira

M

Mário Antunes

Format Sitasi

Carnaz, G., Nogueira, V.B., Antunes, M. (2021). A Graph Database Representation of Portuguese Criminal-Related Documents. https://doi.org/10.3390/informatics8020037

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.3390/informatics8020037
Informasi Jurnal
Tahun Terbit
2021
Sumber Database
DOAJ
DOI
10.3390/informatics8020037
Akses
Open Access ✓