Evaluating AI for Finance: Is AI Credible at Assessing Investment Risk?
Abstrak
We assess whether AI systems can credibly evaluate investment risk appetite-a task that must be thoroughly validated before automation. Our analysis was conducted on proprietary systems (GPT, Claude, Gemini) and open-weight models (LLaMA, DeepSeek, Mistral), using carefully curated user profiles that reflect real users with varying attributes such as country and gender. As a result, the models exhibit significant variance in score distributions when user attributes-such as country or gender-that should not influence risk computation are changed. For example, GPT-4o assigns higher risk scores to Nigerian and Indonesian profiles. While some models align closely with expected scores in the Low- and Mid-risk ranges, none maintain consistent scores across regions and demographics, thereby violating AI and finance regulations.
Topik & Kata Kunci
Penulis (11)
Divij Chawla
Ashita Bhutada
Do Duc Anh
Abhinav Raghunathan
Vinod SP
Cathy Guo
Dar Win Liew
Prannaya Gupta
Rishabh Bhardwaj
Rajat Bhardwaj
Soujanya Poria
Akses Cepat
- Tahun Terbit
- 2025
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓