arXiv Open Access 2026

Measuring the Prevalence of Policy Violating Content with ML Assisted Sampling and LLM Labeling

Attila Dobi Aravindh Manickavasagam Benjamin Thompson Xiaohan Yang Faisal Farooq
Lihat Sumber

Abstrak

Content safety teams need metrics that reflect what users actually experience, not only what is reported. We study prevalence: the fraction of user views (impressions) that went to content violating a given policy on a given day. Accurate prevalence measurement is challenging because violations are often rare and human labeling is costly, making frequent, platform-representative studies slow. We present a design-based measurement system that (i) draws daily probability samples from the impression stream using ML-assisted weights to concentrate label budget on high-exposure and high-risk content while preserving unbiasedness, (ii) labels sampled items with a multimodal LLM governed by policy prompts and gold-set validation, and (iii) produces design-consistent prevalence estimates with confidence intervals and dashboard drilldowns. A key design goal is one global sample with many pivots: the same daily sample supports prevalence by surface, viewer geography, content age, and other segments through post-stratified estimation. We describe the statistical estimators, variance and confidence interval construction, label-quality monitoring, and an engineering workflow that makes the system configurable across policies.

Topik & Kata Kunci

Penulis (5)

A

Attila Dobi

A

Aravindh Manickavasagam

B

Benjamin Thompson

X

Xiaohan Yang

F

Faisal Farooq

Format Sitasi

Dobi, A., Manickavasagam, A., Thompson, B., Yang, X., Farooq, F. (2026). Measuring the Prevalence of Policy Violating Content with ML Assisted Sampling and LLM Labeling. https://arxiv.org/abs/2602.18518

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2026
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓