ارزیابی قابلیت شبکه رمزگذار-رمزگشای deeplabv3+ با پیچش های آتروس اصلاح شده (مطالعه موردی: قطعه بندی معنایی ساختمان)

Fa | Ar | En

ارزیابی قابلیت شبکه رمزگذار-رمزگشای deeplabv3+ با پیچش های آتروس اصلاح شده (مطالعه موردی: قطعه بندی معنایی ساختمان)


نویسنده	امتی محمدعرفان ,طبیب محمودی فاطمه
منبع	مهندسي فناوري اطلاعات مكاني - 1402 - دوره : 11 - شماره : 3 - صفحه:43 -57
چکیده	قطعه ‌بندی ساختمان‌ها به دلیل نیاز به ویژگی‌های معنایی غنی کار دشواری است. تفاوت در شکل، رنگ و اندازه ساختمان ها و نزدیکی آن ها به سایر عوارض مانند پارکینگ‌ها و خیابان‌ها تشخیص آنها را در تصاویر با وضوح زیاد با چالش هایی روبرو می‌سازد. در این تحقیق با هدف استخراج ساختمان‌ از تصاویر با وضوح زیاد، از یک معماری شبکه عصبی پیچشی عمیق از نوع رمزگذار-رمزگشا مبتنی بر مدل اصلاح شده deeplabv3+ استفاده شده است. در ماژول آتروس این مدل اصلاح شده، لایه‌های پیچش با نرخ‌های کمتری در مقایسه با ماژول اصلی، اعمال شده و از پیچش گسترده به جای پیچش استاندارد استفاده گردید تا هدف دستیابی به قطعه‌بندی معنایی قدرتمندتر عوارض ساختمانی با اندازه کوچک و بزرگ محقق گردد. قابلیت اجرایی مدل پیشنهادی در این تحقیق با استفاده از دو مجموعه داده whu و inria ارزیابی گردید و نتایج بدست آمده نشان داد که استفاده از نرخ های آتروس کمتر و تغییر آنها به 4، 8 و 12به‌طور قابل‌توجهی عملکرد قطعه‌بندی را در هردو مجموعه داده بهبود بخشید. مدل اصلاح شده پیشنهادی توانست شاخص های recall، iou وf-score را در مجموعه داده whu نسبت به سایر مدل های پیشرفته به ترتیب به میزان0.33، 0.39 و 0.53 بهبود بخشد. به علاوه، روش اصلاح شده در مجموعه داده inria توانست شاخص های فوق را نسبت به این مدل ها به ترتیب به میزان 1.22 ،0.35 و 0.35 بهبود بخشد. مدل پیشنهادی دراین تحقیق براساس کاهش نرخ‌های آتروس به 4، 8 و 12 و تغییر در لایه‌های resnet-50 توانست در استخراج عوارض ساختمانی به iouبرابر با 89.51 در مجموعه داده whu و 76.64 در مجموعه داده inria دست یابد. در حالیکه، مدل deeplabv3+ اصلی با نرخ‌های آتروس 6، 12، 18 و نسخه اصلی resnet-50، مقدارiouبرابر با 88.87 را در مجموعه داده whu و مقدارiou برابر با 75.82 را در مجموعه داده inria برای قطعه‌بندی ساختمان‌ها به دست آورد.
کلیدواژه	قطعه بندی معنایی، شبکه عصبی پیچشی عمیق، رمزگذار، رمزگشا، پیچش آتروس
آدرس	دانشگاه تربیت دبیر شهید رجایی, دانشکده عمران, گروه مهندسی نقشه برداری, ایران, دانشگاه تربیت دبیر شهید رجایی, دانشکده عمران, گروه مهندسی نقشه برداری, ایران
پست الکترونیکی	fmahmoudi@sru.ac.ir

evaluating the capabilities of deeplabv3+ encoder-decoder network with modified atrous convolutions (case study: deep semantic building segmentation)

Authors	omati mohammad erfan ,tabib mahmoudi fatemeh
Abstract	building segmentation is a difficult task due to the need for rich semantic features. differences in the shape, color and size of buildings and their proximity to other features such as parking lots and streets make their recognition in high resolution images challenging. in this research, with the aim of extracting buildings from high-resolution images, deep convolutional neural network architecture of the encoder-decoder type based on the modified deeplabv3+ model has been used. in the atrous module of this modified model, convolution layers are applied with lower rates compared to the original module, in order to achieve the goal of performing a more powerful semantic segmentation of small and large building objects. the performance of the proposed model in this research was evaluated using two data sets, whu and inria, and the results showed that using lower atrous rates and changing them to 4, 8, and 12 significantly improved the segmentation performance in both data sets. the proposed modified model was able to improve the iou and f-score indices compared with other advanced models in the whu data set by 0.39 and 0.53, respectively. in addition, the modified method in the inria dataset improved both of the above indices by 0.35. the proposed model in this research, based on the reduction of atros rates to 4, 8 and 12 and the change in resnet-50 layers, was able to achieve an iou equal to 89.51 in the whu dataset and 76.64 in the inria dataset in the extraction of construction charges.
Keywords	semantic segmentation ,deep convolutional neural network ,encoder-decoder ,atrous convolution