Ontology-Guided Extraction of Infrastructure Damage Information during Floods Using Large Language Models
Abstract
Urban floods are becoming increasingly frequent due to intensified extreme rainfalls and ongoing urbanisation, posing significant threats to both human life and the built environment in cities worldwide. Timely, high-resolution data on infrastructure damage during such events is essential for effective emergency response and long-term adaptation planning. However, conventional data collection methods, which rely on manual inspection and official reporting, often fail to capture large-scale and event-specific impacts. In contrast, social media platforms have emerged as a publicly available source of real-time user-generated observations, though their informal and unstructured nature presents challenges for systematic information extraction. This study proposes a novel approach to automatically extract structured infrastructure damage information from social media posts related to flood events. A prompt engineering strategy for Large Language Models (LLMs), guided by a domain-specific ontology of infrastructure assets and damage typologies, is developed to ensure classification consistency and extraction reliability. A case study of the 2021 Zhengzhou flood in China illustrates the effectiveness of the proposed method. Preliminary results suggest that general-purpose LLMs can be transformed into high-accuracy information extractors, providing a scalable foundation for downstream analyses of infrastructure vulnerability and disaster impacts. Moreover, the proposed approach supports the development of flood-related infrastructure damage databases, which are critically needed for advancing research on infrastructure resilience.