arXiv Open Access 2025

Probing BERT for German Compound Semantics

Filip Miletić Aaron Schmid Sabine Schulte im Walde
Lihat Sumber

Abstrak

This paper investigates the extent to which pretrained German BERT encodes knowledge of noun compound semantics. We comprehensively vary combinations of target tokens, layers, and cased vs. uncased models, and evaluate them by predicting the compositionality of 868 gold standard compounds. Looking at representational patterns within the transformer architecture, we observe trends comparable to equivalent prior work on English, with compositionality information most easily recoverable in the early layers. However, our strongest results clearly lag behind those reported for English, suggesting an inherently more difficult task in German. This may be due to the higher productivity of compounding in German than in English and the associated increase in constituent-level ambiguity, including in our target compound set.

Topik & Kata Kunci

Penulis (3)

F

Filip Miletić

A

Aaron Schmid

S

Sabine Schulte im Walde

Format Sitasi

Miletić, F., Schmid, A., Walde, S.S.i. (2025). Probing BERT for German Compound Semantics. https://arxiv.org/abs/2505.14130

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓