|
|
|
|
building trustworthy multimodal ai: a review of fairness, transparency, and ethics in vision-language tasks
|
|
|
|
|
|
|
|
نویسنده
|
saleh mohammad ,tabatabaei azadeh
|
|
منبع
|
international journal of web research - 2025 - دوره : 8 - شماره : 2 - صفحه:11 -24
|
|
چکیده
|
Objective: this review explores the trustworthiness of multimodal artificial intelligence (ai) systems, specifically focusing on vision-language tasks. it addresses critical challenges related to fairness, transparency, and ethical implications in these systems, providing a comparative analysis of key tasks such as visual question answering (vqa), image captioning, and visual dialogue. background: multimodal models, particularly vision-language models, enhance artificial intelligence (ai) capabilities by integrating visual and textual data, mimicking human learning processes. despite significant advancements, the trustworthiness of these models remains a crucial concern, particularly as ai systems increasingly confront issues regarding fairness, transparency, and ethics. methods: this review examines research conducted from 2017 to 2024, focusing on forenamed core vision-language tasks. it employs a comparative approach to analyze these tasks through the lens of trustworthiness, underlining fairness, explainability, and ethics. this study synthesizes findings from recent literature to identify trends, challenges, and state-of-the-art solutions.results: several key findings were highlighted. transparency: the explainability of vision language tasks is important for user trust. techniques, such as attention maps and gradient-based methods, have successfully addressed this issue. fairness: bias mitigation in vqa and visual dialogue systems is essential for ensuring unbiased outcomes across diverse demographic groups. ethical implications: addressing biases in multilingual models and ensuring ethical data handling is critical for the responsible deployment of vision-language systems. conclusion: this study underscores the importance of integrating fairness, transparency, and ethical considerations in developing vision-language models within a unified framework.
|
|
کلیدواژه
|
vqa ,ethical implications ,trustworthiness ,debiasing; explainability ,image captioning ,visual dialogue
|
|
آدرس
|
university of science and culture, department of computer engineering, iran, university of science and culture, department of computer engineering, iran
|
|
پست الکترونیکی
|
aztabatabaee@gmail.com; a.tabatabaei@usc.ac.ir
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|