|
|
text detection and recognition for robot localization
|
|
|
|
|
نویسنده
|
raisi z. ,zelek j.
|
منبع
|
journal of electrical and computer engineering innovations - 2024 - دوره : 12 - شماره : 1 - صفحه:163 -174
|
چکیده
|
Background and objectives: signage is everywhere, and a robot should be able to take advantage of signs to help it localize (including visual place recognition (vpr)) and map. robust text detection & recognition in the wild is challenging due to pose, irregular text instances, illumination variations, viewpoint changes, and occlusion factors. method: this paper proposes an end-to-end scene text spotting model that simultaneously outputs the text string and bounding boxes. the proposed model leverages a pre-trained vision transformer (vit) architecture combined with a multi-task transformer-based text detector more suitable for the vpr task. our central contribution is introducing an end-to-end scene text spotting framework to adequately capture the irregular and occluded text regions in different challenging places. we first equip the vit backbone using a masked autoencoder (mae) to capture partially occluded characters to address the occlusion problem. then, we use a multi-task prediction head for the proposed model to handle arbitrary shapes of text instances with polygon bounding boxes. results: the evaluation of the proposed architecture's performance for vpr involved conducting several experiments on the challenging self-collected text place (sctp) benchmark dataset. the well-known evaluation metric, precision-recall, was employed to measure the performance of the proposed pipeline. the final model achieved the following performances, recall = 0.93 and precision = 0.8, upon testing on this benchmark. conclusion: the initial experimental results show that the proposed model outperforms the state-of-the-art (sota) methods in comparison to the sctp dataset, which confirms the robustness of the proposed end-to-end scene text detection and recognition model.
|
کلیدواژه
|
text detection ,text recognition ,robotics localization ,deep learning ,visual place recognition
|
آدرس
|
university of waterloo, canada. chabahar maritime university, iran, university of waterloo, systems design engineering department, canada
|
پست الکترونیکی
|
j.zelek@uwaterloo.ca
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|