>
Fa   |   Ar   |   En
   envisioning answers: unleashing deep learning for visual question answering in artistic images  
   
نویسنده zolghadriha erfan ,fouladi-ghaleh kazim ,ardehkhani pouya
منبع aut journal of electrical engineering - 2024 - دوره : 56 - شماره : 2 - صفحه:191 -202
چکیده    In specialized fields, the accurate answering of visual questions is crucial for practical applications, and this study focuses on improving a visual question-answering model for artistic images by utilizing a dataset with both visual and knowledge-based questions. the approach involves employing a pre-trained bert model to understand query nature and using the iqan model with mlb and mutan mechanisms for visual queries, along with an xlnet-based model for knowledge-based information. the results demonstrate a 78.92% accuracy for visual questions, 47.71% for knowledge-based questions, and an overall accuracy of 55.88% by combining both branches. additionally, the research explores the impact of parameters like the number of glances and activation functions on the model’s performance.
کلیدواژه art pictures ,visual question answering (vqa) ,natural language processing (nlp) ,computer vision ,attention
آدرس university of tehran, faculty of engineering, college of farabi, deep learning research lab, department of computer engineering, iran, university of tehran, faculty of engineering, college of farabi, department of computer engineering, iran, university of tehran, faculty of engineering, college of farabi, deep learning research lab, department of computer engineering, iran
پست الکترونیکی pouya.ardehkhani@ut.ac.ir
 
     
   
Authors
  
 
 

Copyright 2023
Islamic World Science Citation Center
All Rights Reserved