Enhancing Hybrid NLP Approaches with the Strategic Shift Toward retrieval-augmented generation for Design Phase Communication Documents Processing

Authors

Keywords:

Hybrid NLP, Bert, Roof Design, Information Extraction

Abstract

Hybrid Natural Language Processing (NLP) approaches have demonstrated superior performance in construction document processing, with previous research showing that combining rule-based methods with traditional Named Entity Recognition (NER) achieves 98% accuracy for information extraction tasks in roof design documentation. However, the integration of modern pre-trained transformer models into these hybrid architectures remains underexplored. This paper investigates the enhancement of established hybrid NLP frameworks by replacing conventional machine learning components with fine-tuned pre-trained transformers for improved information extraction from design phase communication documents in roof design practice. Building upon proven hybrid methodologies combining regular expression (Regex) pattern matching with machine learning components, we systematically replace traditional NER models with fine-tuned BERT-base architecture. The enhanced hybrid approach maintains rule-based preprocessing efficiency while leveraging BERT's contextual understanding capabilities. We evaluate this BERT-enhanced hybrid system on information extraction tasks using roof design communication documents and project specifications, benchmarked against the established Regex+NER baseline (98% accuracy). Experimental results demonstrate that the Regex+BERT hybrid approach achieves 99.2% accuracy, representing a 1.2 percentage point improvement over the original hybrid baseline. Detailed analysis reveals that BERT-enhanced hybrids excel in handling contextual ambiguities and formatting variations in roof design communications while retaining computational efficiency and interpretability advantages of the original hybrid architecture. The research provides practical frameworks for practitioners seeking to enhance existing hybrid NLP systems with BERT without complete architectural redesign. Results indicate that selective integration of BERT significantly improves accuracy while preserving operational advantages that make hybrid approaches attractive for roof design practice applications

Published

2025-12-25

Conference Proceedings Volume

Section

Open Access Proceeding Proceedings of Smart and Sustainable Built Environment Conference Series

How to Cite

Enhancing Hybrid NLP Approaches with the Strategic Shift Toward retrieval-augmented generation for Design Phase Communication Documents Processing. (2025). Proceedings of Smart and Sustainable Built Environment Conference Series, 271-281. https://isasbec.abc2.net/index.php/sasbe/article/view/2665