Large scale deployment of sensors is essential to practical applications in cyber physical systems. For instance, instrumenting a commercial building for smart energy management requires deployment and operation of thousands of measurement and metering sensors and actuators that direct operation of the HVAC system. Each of these sensors need to be named consistently and constantly calibrated. Doing this process manually is not only time consuming but also error prone given the scale, heterogeneity and complexity of buildings as well as lack of uniform naming schemas.

To address this challenge, we started the project Zodiac to automatically classify, name and manage sensors based on data driven analysis on available sensor metadata. Using a combination of hierarchical clustering and random forest classifiers we showed that it is possible to learn sensor type with high accuracy given a set of ground truth labels by a domain expert.

We would like the research community to take up the challenge of standardizing sensor ontology in buildings, and hence we release the metadata for sensors across all the networked buildings on our campus. The dataset encompasses more than 55 buildings and over 180,000 points. Many challenges remain in sensor ontology mapping -- identification of sensor location, equipment the sensor belongs to, the relationship between sensors within an equipment and across equipment. This dataset allows the research community to explore solutions to these problems.

Our full paper can be found here.

Buildings with Manually Labeled Sensor Type

The metadata provided is obtained from discovery of points in the BACNet network on our campus. We label the ground truth based on six years of working experience with this Building Management System. We use the naming convention imposed by our university naming schema.

The file format is Comma Separated Values (csv).


We have open sourced the Zodiac metadata mapping algorithm in this GitHub Repository. We explain the different parts of the metadata and the Zodiac code in depth in a IPython Notebook.

We have collected timeseries data for the Computer Science building for over three years and from the rest of the 55 buildings we have access to for a few months. We are making this data available to the community for research. However, we are concerned about the side channel occupant information that may be embedded in the dataset. Thus, we ask you to sign an agreement with us that provides you access to our data for three years and protects the privacy of the building occupants.

Link to the Dataset Agreement Document

Please contact us at if you have any questions.