Scrabble: Transferrable semi-automated semantic metadata normalization using intermediate representation
Dhiman Sengupta
Julian McAuley
Rajesh Gupta
BuildSys 2018 - Proceedings of the 5th Conference on Systems for Built Environments

Abstract

{\textcopyright} 2018 Association for Computing Machinery. Interoperability in the Internet of Things relies on a common data model that captures the necessary semantics for vendor independent application development and data exchange. However, traditional systems such as those in building management are vertically integrated and do not use a standard schema. A typical building can consist of thousands of data points. Third party vendors who seek to deploy applications like fault diagnosis need to manually map the building information into a common schema. This mapping process requires deep domain expertise and a detailed understanding of intricacies of each building's system. Our framework - Scrabble - reduces the mapping effort significantly by using a multi-stage active learning mechanism that exploits the structure present in a standard schema and learns from buildings that have already been mapped to the schema. Scrabble uses conditional random fields with transfer learning to represent unstructured building information in a reusable intermediate representation. This reusable representation is mapped to the schema using a multilayer perceptron. Our novel semantic model based active learning mechanism requires only minimal input from domain experts to interpret esoteric, idiosyncratic data points. We have evaluated Scrabble on five buildings with thousands of different entities and our method outperforms prior work by 59{\%}/162{\%} higher Accuracy/Macro-averaged-F1 in a building when 10 examples are provided by an expert in both cases. Scrabble achieves 99{\%} Accuracy with 100-160 examples for buildings with thousands of points while the other baselines cannot.

Bibtex

@inproceedings{Koh2018a,
    author = "Koh, Jason and Balaji, Bharathan and Sengupta, Dhiman and McAuley, Julian and Gupta, Rajesh and Agarwal, Yuvraj",
    booktitle = "BuildSys 2018 - Proceedings of the 5th Conference on Systems for Built Environments",
    pages = "11--20",
    title = "Scrabble: Transferrable semi-automated semantic metadata normalization using intermediate representation",
    year = "2018",
    doi = "10.1145/3276774.3276795"
}

Plain Text

Jason Koh, Bharathan Balaji, Dhiman Sengupta, Julian McAuley, Rajesh Gupta, and Yuvraj Agarwal. Scrabble: transferrable semi-automated semantic metadata normalization using intermediate representation. In BuildSys 2018 - Proceedings of the 5th Conference on Systems for Built Environments, 11–20. 2018. doi:10.1145/3276774.3276795.