|
|
envisioning answers: unleashing deep learning for visual question answering in artistic images
|
|
|
|
|
نویسنده
|
zolghadriha erfan ,fouladi-ghaleh kazim ,ardehkhani pouya
|
منبع
|
aut journal of electrical engineering - 2024 - دوره : 56 - شماره : 2 - صفحه:191 -202
|
چکیده
|
In specialized fields, the accurate answering of visual questions is crucial for practical applications, and this study focuses on improving a visual question-answering model for artistic images by utilizing a dataset with both visual and knowledge-based questions. the approach involves employing a pre-trained bert model to understand query nature and using the iqan model with mlb and mutan mechanisms for visual queries, along with an xlnet-based model for knowledge-based information. the results demonstrate a 78.92% accuracy for visual questions, 47.71% for knowledge-based questions, and an overall accuracy of 55.88% by combining both branches. additionally, the research explores the impact of parameters like the number of glances and activation functions on the model’s performance.
|
کلیدواژه
|
art pictures ,visual question answering (vqa) ,natural language processing (nlp) ,computer vision ,attention
|
آدرس
|
university of tehran, faculty of engineering, college of farabi, deep learning research lab, department of computer engineering, iran, university of tehran, faculty of engineering, college of farabi, department of computer engineering, iran, university of tehran, faculty of engineering, college of farabi, deep learning research lab, department of computer engineering, iran
|
پست الکترونیکی
|
pouya.ardehkhani@ut.ac.ir
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|