- Email: [email protected]

Contents lists available at ScienceDirect

Energy and Buildings journal homepage: www.elsevier.com/locate/enbuild

Baseline building energy modeling of cluster inverse model by using daily energy consumption in ofﬁce buildings Jong-Hwan Ko, Dong-Seok Kong, Jung-Ho Huh ∗ Department of Architectural Engineering, University of Seoul, Seoul, South Korea

a r t i c l e

i n f o

Article history: Received 19 September 2016 Received in revised form 9 January 2017 Accepted 26 January 2017 Available online 7 February 2017 Keywords: M&V Inverse model Baseline model Cluster Daily energy consumption

a b s t r a c t Many retroﬁt projects are being carried out in existing buildings to reduce energy consumption. However, the energy consumptions before and after retroﬁt need to be known in order to evaluate such retroﬁt projects. Even though the energy consumption after retroﬁt can be determined through measurement, the energy consumption before retroﬁt cannot be known. This study is to more easily estimate energy usage prior to the retroﬁt. Generally, dynamic simulation or regression model should be used to estimate the energy consumption of buildings before retroﬁt. However, existing regression models have no way to calibrate the model if it is inaccurate. In this paper, we use a clustering technique to improve the accuracy of the regression model. The estimation of energy consumption before retroﬁt is referred to as “baseline model” and the inverse model is used to create this baseline model. The inverse model is created through monthly data, daily data, and other similar data. In this study, the inverse model was created through daily data and the baseline model was derived from it. The conventional change-point Model and the cluster inverse model presented in this paper were compared and evaluated with the criteria presented through M&V (Measurement and Veriﬁcation). The results suggest that the cluster inverse model which reﬂects the characteristics of data is more appropriate when the baseline model is derived from daily data. © 2017 Elsevier B.V. All rights reserved.

1. Introduction The building energy consumption has been continuously increasing in South Korea since the 1990s [1]. The energy consumption of ofﬁce buildings increased by 12.8% of total energy between 2007 and 2010, and the electricity consumption increased by 22%. To reduce the increase rate of energy consumption in buildings, the Building Energy Efﬁciency Certiﬁcation System for ofﬁce buildings and public ofﬁce buildings has been enforced as national policy since 2010. Various technologies are being introduced to reduce building energy from the design stage for new buildings. However, existing buildings have a limitation in that only simple retroﬁt projects can be performed [2]. Furthermore, it is a critical issue for existing buildings to evaluate the actually reduced amount of energy consumed by the building. This process is referred to as M&V (Measurement and Veriﬁcation), which is illustrated in Fig. 1. To carry out M&V as shown in Fig. 1, the energy consumptions of the building before and after the retroﬁt are required. However, the energy consumption data of the building before the retroﬁt no

∗ Corresponding author. E-mail address: [email protected] (J.-H. Huh). http://dx.doi.org/10.1016/j.enbuild.2017.01.086 0378-7788/© 2017 Elsevier B.V. All rights reserved.

longer exists after the retroﬁt. Therefore, the M&V must be conducted by estimating the building energy consumption before the retroﬁt [8,14,15]. The energy consumption before installing energy conservation measures (ECMs) is called the baseline model. To create this baseline model, dynamic simulation or inverse model is generally used. Dong Seok Gong [3] proposed a method of creating a baseline model through the hourly data of BEMS and the dynamic simulation of EnergyPlus. Dynamic simulation requires calibration by comparing with the previous energy consumption. For input values, the current conditions and the data of design drawings of the target building were used. Incorrect input values were calibrated by performing a sensitivity analysis. The occupancy schedule of the target building was extracted from the electricity consumption data to represent the characteristics of each period. When compared with the measured energy consumptions, the baseline model created through calibrated simulation satisﬁed the tolerance for statistical error of the model presented by ASHRAE or FEMP. However, the creation of a baseline model through dynamic energy simulation requires a high level of expertise of the simulation user who must know a large amount of input data. Therefore, it is not easy for workers to actively use the baseline model. On the other hand, the inverse model can derive the baseline model through measured

318

J.-H. Ko et al. / Energy and Buildings 140 (2017) 317–323

Nomenclature u x Ciinitial dj avgbig C cavg

Centroid of cluster, average of pattern An element of a speciﬁc cluster, a speciﬁc pattern A group element of a speciﬁc cluster, a set of patterns Centroid of ith cluster Distance between jth cluster and data Point of the largest average distance Distance between the centroids of clusters Average distance between the centroids of clusters from c1 to ck newDistAvg Average distance between the centroid of a new cluster and data oldDistAvg Average distance between the centroid of an existing cluster and data K Number of the centroids of clusters RSSk Residual sum of squares of the kth cluster Residual sum of squares of all clusters RSS TOA Ambient temperature [◦ C] Operating time of boiler no. 1 [min] B1 B2 Operating time of boiler no. 2 [min] B3 Operating time of hot-water boiler [min]

data that has been collected and is likely to be actively used at the working level because it does not require a large amount of input data. Therefore, the baseline model was created through the inverse model in this study. In the dynamic simulation, it is possible to calibrate the error through input values. The inverse model also requires a calibration method like dynamic simulation. Yuna Zhang [4,16] compared daily energy data and hourly energy data through the change-point (CP), Gaussian process regression (GPR), Gaussian mixture regression (GMR), and artiﬁcial neural network (ANN) models, which are representative inverse models. The models based daily energy data satisﬁed the speciﬁed criteria for MBE and CV (RMSE), but showed very low accuracy for R2 . The models based on hourly energy data, however, exceeded all the speciﬁed criteria. Thus, this paper concluded that it was more rational to use the hourly data to predict the energy consumption of a building. However, there are practical problems such as cost and labor to acquire hourly data to use these models. In this study, therefore, the inverse model was developed to derive the baseline model that satisﬁes the tolerance for statistical error through daily data. However, the method of model presented in the paper is simply generated through the algorithm, and additional indicators are needed to calibrate the inverse model. Abhishek Srivastav [5] used the Gaussian mixture model to create a baseline model. In the Gaussian mixture model, the range of the dependent variable of Y-axis is determined as a normal distribution according to the independent variable of X-axis. In this paper, ambient temperature was used as the inde-

pendent variable and electricity consumption as the dependent variable. However, using one independent variable in the Gaussian mixture model results in too broad a range of dependent variable. Therefore, the range of variables was narrowed by adding relative humidity and solar radiation to the dependent variables. Then the accuracy of the developed baseline of the Gaussian mixture model was compared with that of the baseline created through multiple regression analysis, and the accuracy of the former was higher. However, the data used to create this model was hourly data measured in 5 min intervals and only the coefﬁcient of determination was used for model evaluation. The analysis of this paper determined that it was impossible to create a relatively accurate model with ambient temperature only. Therefore, such variables as relative humidity and solar radiation were added. Nevertheless, it is not easy to collect solar radiation data. In the present study, therefore, the model was calibrated through operating time recorded on the operation log of the heat source equipment in case the baseline model does not satisfy the tolerance. External environmental factors affecting building energy include relative humidity and solar radiation. However, the most important factor was the outside air temperature, and it was found that it is important to generate the model based on the outside air temperature. As mentioned in most literatures [2,6], the data that are typically used to evaluate energy reductions are monthly and hourly data, and the criteria for evaluating the baseline in ASHRAE and FEMP only include monthly and hourly data. However, there are only 12 monthly data in a year and the data have large errors [7,8]. Therefore, they cannot properly evaluate retroﬁt projects with small energy reductions. In the case of hourly data, 105,120 data are obtained if measured in 5 min intervals. However, a large number of sensors are required to collect hourly data, and operators are required to manage them, which makes it difﬁcult to collect data. Daily data is recorded manually on operation log for building operation by building operators. There are 365 daily data collected for one year, and the baseline model created with daily data have smaller errors than monthly data and greater errors than hourly data due to the number of data. However, it is difﬁcult to evaluate the model because ASRAE and FEMP do not present any criteria for evaluating the model. In this study, the baseline model was created through daily data which is more accurate than monthly data and less accurate than the hourly data, but offers a lower cost burden. The characteristics of daily data were classiﬁed by clustering algorithm and a regression analysis was performed for the classiﬁcation results before the baseline model was created [9,11]. However, because there were no criteria for evaluating the appropriateness of the model as mentioned above, the model was evaluated using the criteria adopted in existing literatures. If the model failed to satisfy the presented criteria, the baseline model was derived by analyzing the data using the multiple regression model with an additional independent variable [13,14]. Therefore, the purpose of this study was to create the base-

Fig. 1. Illustration of M&V for retroﬁt.

J.-H. Ko et al. / Energy and Buildings 140 (2017) 317–323

line model through daily data and to evaluate the baseline model and determine whether it is a proper model. 2. Technical approach

Table 1 Required value for baseline model.

CV-RMSE MBE R2

2.1. K-means algorithm

319

Monthly

Daily

Hourly

15% 5% 0.75

22% 7%

30% 10%

When scatter plots of electricity and gas consumptions are drawn with the daily data based on the ambient temperature, the electricity and gas consumptions vary even if the ambient temperature is identical. These data need to be classiﬁed into groups. The data clustering technique was used to classify daily data, and the K-means algorithm was adopted among the data clustering techniques. The K-means algorithm minimizes the average Euclidean distance to the centroids of data clusters [10,11]. The centroid of of the data that belong to cluster is called the average or centroid u the cluster and its equation is as follows: (ω) = u

1 x |ω| x ∈

(1)

The K-means algorithm randomly selects the initial centroid. The cluster is formed based on the characteristics of the centroid of cluster that it belongs to the set of data patterns and the Euclidean distance is the shortest possible distance from the centroid of cluster. The point where the Euclidean distance between the centroid of cluster and the data is the shortest is searched by repeated attempts to ﬁnd the optimum centroid of cluster. When the initial centroid is selected, the local weighting, global weighting and normalization factor in 2-dimensional space are used. This is expressed by the following Eq. (2):

⎛

Ciinitial = avgbig ⎝

n

Fig. 2. Cluster inverse model process.

⎞

dj ⎠

(2)

j=1

Groups are formed through multiple centroids for data classiﬁcation. The centroids of clusters are set in such a way that the centroids of clusters would be distributed evenly among the data and the distance between the initial centroids of clusters would be the maximum. This is expressed by the following Eq. (3): C = max

K

Cavg − Ci

(3)

i=1

The optimum cluster is searched by maximizing the distance between the initial centroids of clusters and then modifying the distance between the centroid of cluster and the data to an appropriate distance. The distance between the centroid of cluster and the data is calculated by the following equations: 1 NewDistAvg = dj K K

(4)

k

1

Fig. 3. Scatter plots of change-point model about electricity.

each data and the centroid, and a group is formed through this. The equations for RSS are as follows. RSSk =

x=wk

K

OldDistAvg =

K

dj

(5)

k

The previous average distance is compared with the new average distance. If the new average distance is shorter, a new cluster is selected and the previous average distance is deleted. If the previous distance is shorter than the new average distance, it is determined that the optimum cluster has been formed and the group is formed with this cluster. The K-means algorithm considers the centroid of cluster as a sphere that has a center of gravity. The measure of how well the centroid of cluster represents the data that belong to the cluster is expressed by the residual sum of squares (RSS). RSS indicates the sum of squares of the distance between

RSS =

K

(wk ) | |x − u

RSSk

(6)

(7)

k=1

The daily data were analyzed using this K-means algorithm. The data were grouped in this study using an algorithm that has an embedded Matlab program. 2.2. Selection of the number of cluster centroids The position of the centroid of cluster is automatically selected through the algorithm, but the number of cluster centroids must be determined by user. The methods of selecting the number of

320

J.-H. Ko et al. / Energy and Buildings 140 (2017) 317–323

Fig. 4. Compared baseline-model prediction of change-point model with measured data about electricity.

Table 2 Building status. Completion year

1985

Location Building Type Floor Floor area Gross area Height People

Yengdeungop-gu, Seoul, Republic of Korea Ofﬁce 3 stories below and 12 above the ground 990 m3 11,880 m3 (Except basement) 3.8 m 2600

3. Results and discussion

Fig. 5. Scatter plots of cluster inverse model about electricity.

clusters include rule of thumb, elbow method, and information criterion approach [3,11]. However, these methods only deal with how well the data are classiﬁed in line with the data characteristics. In this study, the data are classiﬁed to estimate the energy consumption from daily data [8]. Therefore, the goal is to make the patterns of data stand out rather than classifying data to groups well. Consequently, the number of cluster centroids was chosen through the operation log of the building instead of the conventional statistical method. The operation of heat source equipment in an ofﬁce building was generally changed in a total of three groups: summer, winter, and intermediate season. The heat source equipment was operated also during holidays and weekends differently from the weekdays. In this study, therefore, the cluster analysis was conducted in three groups based on the operation of heat source equipment in the case of weekdays, and the weekdays and weekends were separated.

2.3. Model veriﬁcation criteria The criteria for monthly and hourly data appear in the ASHRAE Guideline 14 and the FEMP’s M&V Guideline 4.0. However, there are no criteria for daily data. Criteria for daily data were required in this study because the purpose was to validate the appropriateness of the baseline model created through daily data [2,4,6]. Therefore, the veriﬁcation criteria in Table 1 were used for veriﬁcation by interpolation of the veriﬁcation criteria for hourly and monthly data. Furthermore, the R2 index was added to analyze the accuracy of the predicted and measured values.

In this study, the existing change-point model and the clustering inverse model proposed in this study were comparatively analyzed through the energy consumption data of the target building. The appropriateness of the baseline model was analyzed based on the total electricity and gas consumptions of the target building. The clustering inverse model is created according to the ﬂowchart in Fig. 2. As shown in Fig. 2, input data are basically necessary for meteorological data provided by Meteorological Agency, weekends written on calendar, holiday information, and energy consumption. In this study, because the target building is an ofﬁce building, the energy use of weekday and weekend was clearly different. Therefore, the basic weekend and holiday schedule provided by the calendar is used as data. For meteorological data, the Meteorological Agency provided information on daily temperature in 2012 and 2013. The target building is located in Yeongdeungpo-gu, Seoul and is used as an ofﬁce building. The measured energy use data are data for 2012 and 2013, and 2012 data was used as training data. Simulated data and measured data of 2013 were compared. The X axis is the outside air temperature, the Y axis is the energy usage (electricity, gas), and the weekday and weekend data are organized into Excel and grouped. Then, clustering is performed with the K-means algorithm built into the Matlab program. A linear regression model is created in Excel through the classiﬁed data. To evaluate the appropriateness of the model, the 2013 weather data were input into the created inverse model and the 2013 simulation data were obtained. The simulated data were compared with the measured data of 2013, and the appropriateness of model was judged based on the presented criteria. The target building was an ofﬁce building with 12 stories above ground and 3 basement stories in Seoul. Completed in 1985, it has a building area of 990 m2 , a total ﬂoor area of 11,880 m2 , and a story height of 3.8 m. The number of users of this building is about 2600. Each model estimated the data for 2013 through the data for 2012, and the estimated data was compared with the mea-

J.-H. Ko et al. / Energy and Buildings 140 (2017) 317–323

321

Table 3 Summary of chiller. Num

Type

2

Capacity [USRT]

Turbo

250

Compressor [kW]

Evaporation

235

Condenser

Flow [LPM]

IT/OT [◦ C]

Flow [LPM]

IT/OT [◦ C]

2520

10/5

3250

32/37

Table 4 Summary of cooling tower. Num

Type

2

Capacity [USRT]

FRP

250

Blower

Coolant

Type

Flow [CMM]

Motor [HP]

Flow [LPM]

IT/OT [◦ C]

OWT [◦ C]

Axial

1800

7.5

3250

37/32

27

Table 5 Summary of steam boiler. Num

Type

Capacity [kG/HR]

Pressure [kG/CM2 ]

Consumption [L/HR]

2

Flue

2500

4

200

Blower Motor [HP]

Valume [CMM]

Static [MMaq]

Mortor [HP]

2

60

300

7.5

Fig. 6. Compared baseline-model prediction of cluster inverse model with measured data about electricity.

sured data for 2013 to analyze the appropriateness of the baseline model. Table 2 shows the overall information on the target building. Tables 3–5 provide information on the capacities of chiller, cooling tower, boiler and general information.

3.1. Electricity As it is an ofﬁce building, most electricity is consumed by business equipment, server room, etc. Furthermore, the cooling is centralized and cool water is supplied through a turbo-refrigerator. The information regarding the turbo-refrigerator is outlined in Table 2 and the cooling tower is described in Table 3. Fig. 3 shows the change-point model created with daily electricity consumptions. This model was trained through the data for 2012 (training data) and the electricity consumption for 2013 was estimated (simulated data). The change-point model regards temperatures under a certain temperature as the base load and the electricity consumption increased together with the temperature above the temperature at which cooling is started. Fig. 4 compares the value predicted through the change-point model and the electricity used in 2013. The MBE and CV (RMSE) representing the bias and size of error seemed appropriate, but the similarity of proﬁles was low and R2 indicating the accuracy of the predicted and measured values was 0.1 which is very low. The reason for this was because the model cannot contain the data around 5,000 kWh in all sections.

Fig. 7. Scatter plots of change-point model about gas.

Fig. 5 shows the classiﬁcation of data through the clustering inverse model. Unlike the change-point model, the group was formed near most data, and the inverse model was created by performing a regression analysis with this data. As a result, as shown in Fig. 6, the difference between the predicted and measured values is small and satisﬁed the speciﬁed criteria. Therefore, it was

322

J.-H. Ko et al. / Energy and Buildings 140 (2017) 317–323

Fig. 8. Compared baseline-model prediction of change-point model with measured data about gas.

Fig. 9. Scatter plots of cluster inverse model about gas.

regarded as appropriate to be used as the baseline model. In the case of electricity consumption, cooling is supplied through the central air conditioning and the electricity consumption varies by the ofﬁce equipment used by individuals. Therefore, the baseline model was created relatively easily because the data group was formed at relatively identical conditions and ambient temperatures and they were classiﬁed appropriately through the algorithm. 3.2. Gas The sections of the target building that consume gas are heating, cooking, and hot water. As shown in Fig. 7, the base load sections

are usually cooking and hot water, and the reason that the load increases as the temperature decreases is the heating load. The solid line in Fig. 7 indicates the change-point model. The target model is heating with an overfeed ﬁre tube boiler. Details about the boiler are listed in Table 4. The 2013 values predicted through the change-point model were compared with the measured values. Fig. 8 shows that the criteria for CV(RMSE) and R2 were not met. The causes of this error were analyzed as the large difference in the gas consumptions at the same temperature conditions and the sections with no gas consumption were not classiﬁed separately. Gas was analyzed with the clustering inverse model as with electricity. The data were classiﬁed as shown in Fig. 9. The inverse model was created by a regression analysis of the classiﬁed groups. The results are as shown in Fig. 10. The predicted values and measured values form relatively similar proﬁles. However, the CV (RMSE) was 43.37% which is higher than the criterion. This problem was caused because the differences in gas consumption among the classiﬁed groups were too large even at the same conditions and same temperature. The differences in gas consumption occurred even at the same condition because the heating was not operated according to the ambient temperature condition but according to the operation standards of the boiler. Therefore, in order to reduce this CV (RMSE) error, another analysis was performed by adding the operating time of boiler as a variable. The inverse model created through multiple regression analysis is shown in Table 5, and the results are shown in Fig. 11. The prediction values were obtained through multiple regression analysis. R2 rose to a very high value of 0.97 just by adding the boiler operating time and the CV (RMSE) also met the

Fig. 10. Compared baseline-model prediction of cluster inverse model with measured data about gas.

J.-H. Ko et al. / Energy and Buildings 140 (2017) 317–323

323

Fig. 11. Compared baseline-model prediction of cluster inverse model(MVR) with measured data about gas.

Table 6 Multiple regression model of cluster inverse model about gas.

Week1 Week2 Week3 Weekend1 Weekend2

Temp Range [◦ C]

Multiple regression equation

0.8 ≤ TOA < 15 −15 ≤ TOA < 0.8 TOA ≥ 15 −15 ≤ TOA < 9.35 TOA ≥ 9.35

0.99 · TOA + 1.96 · B1 + 1.89 · B2 − 0.01 · B3 + 21.65 1.27 · TOA + 0.69 · B1 + 1.04 · B2 + 0.15 · B3 + 69.6 −0.87 · TOA + 1.62 · B1 + 1.6 · B2 + 0.02 · B3 + 101.93 0.001 · TOA + 0 · B1 + 0 · B2 + 0.02 · B3 − 0.05 1.02 · TOA + 1.9 · B1 + 1.78 · B2 + 0.02 · B3 − 0.17

speciﬁed criteria. Therefore, it was analyzed to be appropriate for the baseline model of gas as shown in Table 6.

4. Conclusion In this study, the inverse model was created through the daily electricity and gas consumptions and the operating time of the heat source equipment which are written manually by building operator to examine the building operation of an ofﬁce building. Then the baseline model which consists of the prediction values estimated from the inverse model were compared with the measured values to analyze the appropriateness of the model. As a result, the conventional change-point model used to analyze monthly energy consumption was found to be inappropriate for analyzing the daily energy consumptions, and the cluster inverse model proposed in this paper was found to be relatively appropriate for analyzing the daily energy consumption. In the case of electricity consumption, the electricity consumptions at the same temperature were grouped relatively well through clustering. The inverse model was created by regression analysis of the groups after clustering, and the baseline model derived through the inverse model was analyzed to be an appropriate model based on comparison with measured values. In the case of gas consumptions, grouping was not performed well because of the large differences in gas consumptions at the same conditions and same temperature. Furthermore, the baseline model derived from the inverse model which was created by simple linear regression had large errors. Boiler operating time was added as a variable to correct the errors. As a result, the baseline model was derived from the inverse model created by multiple regression analysis, and it was analyzed to be an appropriate model. The conventional change-point model was appropriate only for monthly data and it was inappropriate for creating the baseline model through monthly data. The cluster inverse model proposed in this paper is appropriate for creating the baseline model through daily data because the data were grouped and analyzed according to the characteristics of the data.

References [1] Korea energy management corporation, Korea Energy Handbook, KEMC, Yongin, Gyeonggi-do, 2015. [2] FEMP, M&V Guidelines: Measurement and Veriﬁcation for Federal Energy Projects, Version 4.0. Energy Efﬁciency and Renewable Energy, 2015. [3] D. Kong, Development of Simulation-Based Baseline Modelling Methodology with Hourly BEMS Data, Doctoral Dissertation, University of Seoul, Seoul, 2016. [4] Y. Zhang, Z. O’Neil, T. Wagner, G. Augenbroe, Comparisons of inverse modeling approaches for predicting building energy performance, Build. Environ. 86 (2015) 177–190. [5] A. Srivastav, A. Tewari, B. Dong, Baseline building energy modeling and localized uncertainty quantiﬁcation using Gaussian Mixture Models, Energy Build. 65 (2013) 438–447. [6] American Society of Heating, Refrigerating, And Air-Conditioning Engineers, ASHRAE Guideline 14–2014, Measurement of Energy and Demand Savings, ASHRAE, Atlanta, GA, 2014. [7] K. Kim, J. Haberl, Development of methodology for calibrated simulation in single-family residential buildings using three-parameter change-point regression model, Energy Build. 99 (2015) 140–152. [8] J. Kissock, Development of a toolkit for calculating linear, change-point linear and multiple-linear inverse building energy analysis models, in: ASHRAE Research Project 1050-RP, 2002. [9] F. Tang, A. Kusiak, X. Wei, Modeling and short-term prediction of HVAC system with a clustering algorithm, Energy Build. 82 (2014) 310–321. [10] S. Lee, Comparison of initial seeds methods for K-Means clustering, J. Internet Comput. Serv. 13 (6) (2012) 1–8. [11] R.M. Neal, G.E. Hinton, A View of the EM Algorithm That Justiﬁes Incremental, Sparse, and Other Variants, Learning in Graphical Models, Springer, Netherlands, 1998, pp. 355–368. [13] M. Royapoor, T. Roskilly, Building model calibration using energy and environmental data, Energy Build. 94 (2015) 109–120. [14] L.C. Harmer, G.P. Henze, Using calibrated energy models for building commissioning and load prediction, Energy Build. 92 (2015) 204–215. [15] J. Granderson, P.N. Price, D. Jump, N. Addy, M.D. Sohn, Automated measurement and veriﬁcation: performance of public domain whole-building electric baseline models, Appl. Energy 144 (2015) 106–113. [16] Y. Zhang, Z. O’Neil, T. Wagner, G. Augenbroe, An inverse model with uncertainty quantiﬁcation to estimate the energy performance of an ofﬁce building, Proceedings of 13th International Building Performance Simulation Association Conference (2013) 25–28.