Using 2D Vision-Language Models to Resolve Unbalanced Training of 3D Deep Learning of Street MMS Data

Authors

Keywords:

Urban digital twin, Street, BIM, Large Language Model (LLM), Deep Learning , Graph

Abstract

The geometry-level digital twin of the urban built environment, such as roads, is important for various applications, from infrastructure planning to maintenance. Existing literature mainly focuses on reconstructing individual roads, such as highways or tunnels, using the Scan-to-BIM workflow. There remains a gap in detecting instance-level and detailed types of different road elements (e.g., traffic signs, utility boxes). Additionally, the integration of network-level graph representations encompassing road segments, sidewalks, and other elements for comprehensive road network generation and geometry reconstruction has been underexplored. This study, therefore, proposes a three-step method to automate the reconstruction of street geometry digital twins. First, deep learning-based semantic and instance segmentation is applied to mobile laser scanning data. Second, the segmented instances are aligned with street view imagery, and their semantic information is enriched using Large Language Models. Third, a graph-based representation is designed as an intermediary to facilitate the reconstruction of Building Information Models (BIM) and road networks. This study takes streets in Central, Hong Kong, as a preliminary example to validate the proposed method. This study contributes an automated geometry level of urban street digital twin, bridges pretrained large-scale human knowledge with road elements semantic extension, and provides novel perspectives for fields such as urban planning, engineering, built environment renovation, and transportation management. Future research directions include the development of automated digital twin applications for street infrastructure maintenance and management.

Published

2025-12-25

Conference Proceedings Volume

Section

Open Access Proceeding Proceedings of Smart and Sustainable Built Environment Conference Series

How to Cite

Using 2D Vision-Language Models to Resolve Unbalanced Training of 3D Deep Learning of Street MMS Data. (2025). Proceedings of Smart and Sustainable Built Environment Conference Series, 227-234. https://isasbec.abc2.net/index.php/sasbe/article/view/2641