keyboard_arrow_up
Household Movement Detection in Mixed-format Occupancy Data using LLM-Based Entity Resolution

Authors

Sasirekha Oguri, John R. Talburt, and Mert Can Cakmak, University of Arkansas, USA

Abstract

Entity resolution (ER) typically relies on pairwise similarity comparisons between records,which limits its ability to capture indirect relationships present in demographic occupancy data. An important indirect pattern arises from household movement, where multiple individuals relocate together across addresses, but detecting such patterns is difficult due to mixed-format records, noise, duplication, and the absence of stable identifiers. This paper proposes an AI-enhanced framework for detecting indirect entity links associated with household movement in unstandardized name–address data. The approach integrates prompt-based large language model (LLM) named entity recognition for extracting personal names and addresses without extensive preprocessing, semantic text embeddings for robust similarity computation, and graph-based reasoning to infer group-level movement patterns. Experimental evaluation on SPX benchmark datasets (S8–S12) generated using the Synthetic Occupancy Generator demonstrates that incorporating indirect household movement evidence improves recall by 8–15% while maintaining high precision, yielding F1-score gains of 6–8% over a strong pairwise baseline.

Keywords

Entity Resolution, Household Movement Detection, Indirect Linkage, Named Entity Recognition, Large Language Models, Semantic Text Embeddings, Graph-Based Clustering, Occupancy Data, Synthetic Data, Data Integration

Full Text  Volume 16, Number 10