A Data-driven Framework for Automated Pavement Maintenance Strategy Generation: A Case Study of Florid
Keywords:
Pavement Maintenance, Large Language Model (LLM), Retrieval-Augmented Generation (RAG), LTPP, Data-driven Decision MakingAbstract
Road infrastructure in climate-sensitive regions such as the Southeastern United States faces increasing risks under climate change. Intense rainfall, hurricanes, storm surges, and freeze–thaw cycles accelerate pavement deterioration and disrupt roadway functionality. Developing adaptive and transparent maintenance strategies is therefore essential for sustaining performance and enhancing resilience. Current pavement management systems rely on static knowledge bases and empirical rules, which limits flexibility and interpretability. This study proposes a data-driven framework that combines Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) to generate context-aware and interpretable pavement maintenance strategies. A comprehensive knowledge base was constructed using Long-Term Pavement Performance (LTPP) data, Florida Department of Transportation manuals, regional climate information, and historical maintenance records. Structured indicators such as Annual Average Daily Truck Traffic, precipitation, and freeze index guided the retrieval process. Given pavement distress, climate, and traffic conditions, the RAG module retrieved relevant cases and technical standards, enabling the LLM to produce evidence-based recommendations. Validation on 30 pavement sections in Florida achieved an exact prediction accuracy of 76.7% with 23 correct classifications. Predictions were dominated by Mill and Overlay (15 cases, 50%) and Surface Treatment (9 cases, 30%), followed by Patch Repair (3 cases, 10%), Rigid Pavement Repair (2 cases, 6.7%), and Thin Overlay (1 case, 3.3%). Crack Sealing and Recycled Treatment were not predicted. The framework showed strong performance for structural and surface renewal actions, while preventive strategies remain underrepresented. Generated reports included condition summaries, historical references, recommendations, and assumptions, which improved interpretability and partially reduced the risk of hallucination.