ERK'2022, Portorož, 498-501 498
Features and Models for Short-term Household
Energy Consumption Forecast
Martin Makovec
Joˇ zef Stefan Institute
Jamova 39, Ljubljana, Slovenia
mmakovec22@gmail.com
Abstract—This paper shows how to implement machine learn-
ing for load forecasting, which would lead to more effective en-
ergy generation. Researches are still trying to improve algorithms
for predictions to the point where they could be released to the
industry. We explore three widely applied techniques in short
term load forecasting (up to 3 h). These are: Random forest
regression, XGBoost regression and recurrent neural networks.
A short explanation of each of these techniques, along with
necessary equations, is provided. For direct comparison of these
techniques UK-DALE and HUE datasets are used. It also dis-
cusses the nature of the load and the different factors inﬂuencing
its behaviour.
Index Terms—short-term load forecasting, long short term
memory, recurrent neural network, deep learning, random forest,
gadient boosted random forest, classical machine learning.
I. INTRODUCTION
The demand for electric energy is increasing each year and
with that, researchers are looking into smarter, more efﬁcient
and environmentally friendly ways of distributing electric
energy. Smart grid deployments would be able to better control
and balance energy supply and demand through near real time,
continuous feedback about energy generation and consumption
patterns. The widespread deployment of smart meters that
provide frequent readings allows insight into continuous traces
of usage patterns, that can be obtained through data analysis
using methods such as classical machine learning and deep
learning algorithms. This in turn enables better designs and
triggers of demand response actions and pricing strategies, and
provides input to the planning for growth and changes in the
distribution network. Besides, customers may also gain better
awareness of their own consumption patterns.
Load forecasting has an important role in power planning
and production, thus also subject to intense research and a
recent competition motivated by the effects of covid crisis on
the load and load forecasting [1]. There are load forecasting
models reported in literature, which can be split into two
main groups, the ﬁrst one being time  series (univariate)
models, that use observed values from the past to form a
function that models the load. The second group consists of the
so-called causal models, which model the load using exoge-
nous factors such as weather and social variables. Some of the
ﬁrst class models include multiplicative autoregressive models,
dynamic linear or nonlinear models, threshold autoregressive
models and methods based on Kalman ﬁltering [2].
An early forecasting study on short term load forecasting
(STLF) uses a multiplicative decomposition model and the
seasonal autoregressive integrated moving average (ARIMA)
model on Singapore’s electricity data [3]. Although both time-
series models can accurately predict the short-term Singapore
demand, the comparison shows, that the Multiplicative de-
composition model slightly outperforms the seasonal ARIMA
model. Even though there are many choices regarding the use
of a causal model, such as autoregressive moving average, Box
and Jenkins transfer functions, structural models, optimization
techniques, nonparametric regression, curve-ﬁtting procedures
and structural models, the most popular casual models are still
the linear regression ones [4]. In [4] the authors use a linear
regression model on electricity consumption data, provided
by the local Estonian company Alexela AS, to do STLF. The
model selection was motivated by speciﬁc design requirements
imposed by Nord Pool and local transmission system operator
data processing partner. Therefore, the focus of the paper lies
on exploring the importance of the feature engineering process
and discovering important factors when it comes to energy
consumption proﬁles.
In more recent studies, artiﬁcial intelligence-based models
are also explored. These models can be split into classical
machine learning models [5] and deep learning models [6].
Some of the most popular classical machine learning models,
when it comes to forecasting, include decision tree alghorithms
and support vector machines. For instance, [5] investigated so-
called kernel methods, starting from simple weighted kernel
regression, all the way up to support vector machines. The
prediction was carried out at different levels of aggregation
(individual meters, feeder sections, distribution substations and
at the system level) and done on power consumption data
monitored by tens of thousands of smart meters in a medium-
sized U.S. city. Additional weather data was supplied by the
National Climatic Data Center and the National Oceanic and
Atmospheric Administration.
The most promising deep learning appraoches to tackle
STLF are recurrent neural networks (RNNs). The authors of
[6] developed a long-short-term-memory (LSTM) RNN for
STLF with weather features as an input. The Smart-Grid
Smart-City dataset, that includes smart meter data for about
10,000 different customers in New South Wales was used. The
prediction was done on residential power load, which has a
greater correlation with exogenous factors such as weather or
499
lifestyle of residents and is more irregular than regional load,
which shows more seasonality.
In this paper, we report a study on developing a short
term forecasting system on households from UK and Canada.
We analyze the effect of various feature combinations on the
forecasting accuracy for 1, 2 and 3 hours ahead. We also
perform a relative performance comparison of three machine
learning methods, namely Random Forests (RF), Gradient
Boosted Random Forests (XGB) and Recurrent Neural Net-
works (RNN) and show that our results of 24% average MAPE
per household are camparable with the existing state of the art
where per household MAPE is between 10% and 35%.
The contributions of this paper are as follows:
  We explore a variety of different exogenous factors as
well as past load data information and show their effects
on model performance. Additionally we show further im-
provement of results upon including engineered features
that capture statistical traits regarding temperature and
load.
  We show that our best recurrent neural network model
outperforms our best classical models, while also relying
less on exogenous factors at the cost of training time.
The paper is structured as follows. Section II provides the
problem statement, Section III analyzes the results provided in
tables, where different features as well as model optimization
is compared. Concluding remarks are drawn in Section IV.
II. PROBLEM STATEMENT
We deﬁne our STLM regression problem as follows. Given
input data consisting of time seriesT representing energy con-
sumption measurments from households, weather information
and other exogenous factors which affect the load, we are able
to formulate a regressor   , which is a function that can predict
energy consumption (E) of households in the future.
 ( T ) =E
The regressor   is realized using classical machine learning
as well as deep learning tehniques and the UK-DALE [7] and
HUE [8] datasets.
A. Dataset summary
The UK-DALE (Domestic Appliance-Level Electricity)
contains appliance-by-appliance power demand of 5 UK
homes recorded approximately once every 6s. House 1 has
almost 5 years of recordings while others have around half a
year. To develop 1-3 hour ahead forecasting we downsampled
the data to every hour by averaging (when predicting 1, 2,
3 hours ahead). To complement the UK-DALE dataset we
also used MIDAS Open: UK daily temperature dataset, which
contains maximum and minimum temperature for each half-
day period
1
.
The HUE dataset contains aggregated power consumption
data of 28 houses in British Columbia, Canada. Energy con-
sumption is recorded with an hourly frequency, with most
1
https://data.ceda.ac.uk/badc/ukmo-midas-open/data/uk-daily-temperature-
obs/dataset-version-202107/
houses having three years of consumption history. Weather
data from the nearest weather station and metadata regarding
houses is also included. Some of the missing temperature
samples were linearly interpolated to perserve as much data
as possible.
B. Feature Engineering
Starting from the available data, we noticed that in addition
to energy and weather measurements, also meta-data such as
type of household and orientation was available, therefore we
engineered several feature sets to study their inﬂuence on the
quality of the forecasting.
Due to space constraints, we report the results for a selection
of three feature sets as follows:
Set1 HUE - Consists of hourly samples of current energy
consumption of the house. It is used as a baseline for com-
parison to other feature sets.
Set1 UK-DALE - Also consists of hourly energy consump-
tion samples.
Set2 HUE - Consists of energy, temperature, part of day,
part of year,weekend,facing,EV,RU,type.
Set2 UK-DALE - UK-DALE data lacks temperature mea-
surements for every sample, as well as meta-data such as the
presence of electrical vehicle (EV), geographical orientation
(facing) and residential unit information (RU), therefore this
set consists only of : energy,part of day,part of year,week-
end,type.
Set3 HUE - Additionally we also engineered
some statistical features regarding energy and
temperature, therefore this set consists of energy,
energy day mean, energy day max, energy day min,
house energy mean, energy diff, energy day diff,
temperature, temperature day mean, temperature day max,
temperature day min, temperature diff, part of day,
part of year,weekend,facing,weekend,EV,RU,type.
Set3 UK-DALE - Similarly, we engineered features
on UK-DALE consisting of: energy, energy day mean,
energy day max, energy day min, house energy mean, en-
ergy diff, temperature day max, temperature day min, en-
ergy day mean,part of day,part of year,weekend,type.
C. Selected Techniques
We consider the following set of techniques: Random Forest
Regressor (RF), Recurrent neural network (RNN), XGBoost
Regressor (XGB). RF combines many different decision trees.
The ﬁnal result is the average of all results given by individual
decision trees. XGB works very similarly but it also imple-
ments different boosting techniques to more optimally split the
nodes. The RNN algorithm is widely used in forecasts, because
it can take into account many data samples from the past to
form a more solid prediction, that is more immune to random
disturbances. The main problem of recurrent neural networks
is their short memory, which means they are inﬂuenced the
most by the last sample of the sequence they process. We
use a so called long-short term memory recurrent neural
network (LSTM RNN), which battles this problem. We present
500
results for the following set-ups selected after hyperparameter
optimization. For the classical models the optimization was
done with the help of GridSearchCV while the deep learning
model was optimised through trial and error. For the deep
learning model we used the ReLu activation function with the
Adam optimizer.
RF default - Random forest model with default conﬁg-
uration, i.e. 100 estimators, with minimum number samples
required to split an internal node of 2 and minimum number
samples required to be at a leaf node of 1
RF optimized - With random forest we also kept the
number of estimators at 100. We increased minimum samples
required to split the node to 20 and mininimum number of
samples required at a leaf node to 40.
XGB default - XGBoost model with default conﬁguration,
i.e. 100 estimators, minimum child weight of 1 and maximum
depth of 3
XGB optimized - We kept the number of estimators at 100
and increased the minimum child weight to 50 and maximum
depth to 21
RNN default - The recurrent neural network at ﬁrst con-
sisted out of a LSTM layer with 16 neurons, followed by a
dense layer with 32 neurons and a dense output layer with
a single neuron. We ran the model for 10 epochs, while
considering only the previous sample to get a baseline result.
RNN optimized - Final model consisted out of only an
LSTM layer with 32 neurons and an output dense layer with
one neuron. This structure worked well, without any overﬁt
compared to more complex models with multiple layers and
an emphasis on previous sample information - having only an
LSTM layer. We increased the number of epochs to 20, that
is when the loss stopped improving. We also increased the
number of previous samples considered to 5 which has shown
to be optimal. It gave the model a good boost in performance,
while not being too computationally heavy.
D. Evaluation Metrics
To ensure credible results k-fold cross-validation was used
and the source code is publicly available
2
. For evaluating
the performance of the predictor we use mean absolute error
(MAE) and mean absolute percentage error (MAPE):
MAE(y; ^ y) =
1
N
N
X
i=1
jy  ^ y
i
j (1)
MAPE(y; ^ y) =
1
N
N
X
i=1
j
y  ^ y
i
y
j (2)
where ^ y is the predicted and y the real value.
III. RESULTS
In this section we will ﬁrst analyze the effect of feature
selection, followed by the effect of model selection on the
performance of STLF 1-3 hours ahead. With increasing time
horizons predictions are generally worse.
2
https://github.com/MMakovec/IJS models
A. Effect of feature selection
It can be seen from Tables I and II that by enriching
the training features the average performance across both
datasets grows from 46,17% for Set1 to 36,19% for Set3.
Although features from UK-DALE data brought a signiﬁcant
improvement in performance, we are still missing some key
features for hourly predictions, like temperature data for every
hour. This explains the poor performance of 46,52% average
MAPE of classical models when predicting on Set3 of Uk-
DALE data. We can see the RNN model performs with an
average MAPE over all prediction horizons of 28,13% on the
same data and 37,23% even without any additional features,
so the lack of features does not affect it as much. That is
because past energy consumption already has a big correlation
with future energy consumption and the RNN model is able
to take into account multiple samples from the past to form a
better prediction than classical models, which cannot do that.
B. Effect of model selection
Turning our attention to Tables III and IV. We can see that
average performance gain from model optimization is 6,46%
MAPE, which is not as drastic as 9,98% MAPE with addition
of features, but still noticable. Although classical models
perform badly on UK-DALE because of missing features, the
RNN model showed even better performance on UK-DALE
than it did on HUE data, with an average 28,13% MAPE
of the opzimited model on UK-DALE compared to average
31,06% MAPE on HUE. The reason for that would be less
variety within the UK-DALE dataset, as it only includes ﬁve
different houses compared to twenty eight different houses of
HUE. Each individual house in the UK-DALE dataset also
has a larger number of samples, leading to better training of
the RNN model. Overall RF and XGB models performed very
similarly with the best performing model being the RNN. It
can perform good predictions without needing a lot of extra
features as long as the sample size is large enough.
IV. CONCLUSION
To summarize, this paper has presented the performance
of state of the art machine learning models in the area of
forecasting system load with prediction times of 1 to 3 hours. It
shows the difference between classical machine learning, deep
learning and their seperate use cases. Classical machine learn-
ing offers a quicker way to train with minimal data available,
while deep learning combined with a large dataset offers better
performance. Although RNN needs a lot more samples to
properly train, the paper showed it to be less feature dependent
than RF and XGB. Besides that, it compared standard features
used in load forecasting and their correlation to the energy
consumption. Energy consumption feature proved to be the
most important for the RNN model, while other features like
temperature and time features became a lot more important
when predicting with RF and XGB models.
501
TABLE I
EFFECT OF FEATURE SELECTION ON STLF ON THE UK-DALE DATASET.
Feature
set
Metric
RF XGB RNN
1h 2h 3h 1h 2h 3h 1h 2h 3h
Set1
MAE [Wh] 289,34 350,70 376,40 295,15 355,07 380,46 252,71 310,02 328,30
MAPE [%] 45,016 60,286 67,385 45,915 60,588 65,588 30,955 38,665 42,087
Set2
MAE [Wh] 280,70 331,99 360,45 297,00 346,82 374,15 244,34 265,20 273,93
MAPE [%] 43,142 55,614 63,867 43,588 55,199 63,200 30,029 34,159 34,896
Set3
MAE [Wh] 270,46 313,46 338,57 272,89 312,85 336,03 233,42 257,67 268,12
MAPE [%] 36,481 49,349 55,409 36,512 47,446 53,973 24,023 28,685 31,672
TABLE II
EFFECT OF FEATURE SELECTION ON STLF ON THE HUE DATASET.
Feature
set
Metric
RF XGB RNN
1h 2h 3h 1h 2h 3h 1h 2h 3h
Set1
MAE [Wh] 317,41 460,59 491,94 315,57 458,70 490,07 282,18 358,31 394,85
MAPE [%] 36,881 41,568 44,275 36,123 42,159 44,313 34,322 44,383 50,640
Set2
MAE [Wh] 319,07 432,90 454,76 294,20 400,04 419,83 271,39 324,88 348,28
MAPE [%] 35,807 40,968 41,162 35,927 41,524 43,013 34,595 34,816 46,157
Set3
MAE [Wh] 275,37 334,83 365,44 275,24 349,76 358,14 248,07 284,79 298,07
MAPE [%] 30,108 33,722 33,517 30,128 33,730 33,544 29,780 30,709 32,682
TABLE III
EFFECT OF MODEL SELECTION ON STLF ON THE UK-DALE DATASET.
Model Metric
RF XGB RNN
1h 2h 3h 1h 2h 3h 1h 2h 3h
Deafault
MAE [Wh] 295,65 337,15 363,27 274,41 319,87 345,25 244,51 261,93 268,14
MAPE [%] 44,285 55,434 62,571 42,574 55,004 62,522 30,197 39,360 41,681
Optimized
MAE [Wh] 270,46 313,46 338,57 272,89 312,85 336,03 233,42 257,67 268,12
MAPE [%] 36,481 49,349 55,409 36,512 47,446 53,973 24,023 28,685 31,672
TABLE IV
EFFECT OF MODEL SELECTION ON STLF ON THE HUE DATASET.
Model Metric
RF XGB RNN
1h 2h 3h 1h 2h 3h 1h 2h 3h
Deafault
MAE [Wh] 300,31 378,41 383,88 290,65 396,35 410,95 264,99 324,73 350,34
MAPE [%] 36,283 40,251 40,404 35,119 39,416 39,785 30,694 35,525 36,647
Optimized
MAE [Wh] 275,37 334,83 365,44 275,24 349,76 358,14 248,07 284,79 298,07
MAPE [%] 30,108 33,722 33,517 30,128 33,730 33,544 29,780 30,709 32,682
ACKNOWLEDGMENT
This work was funded by the Slovenian Research Agency
under the Grant P2-0016. I would like to thank my mentors
Carolina Fortuna
3
, Gregor Cerar
4
and Blaˇ z Bertalaniˇ c
5
for
their support with this work.
REFERENCES
[1] F. Mostafa, B. Jethro, W. Yi, M. Stephen, S. Wencong, and Z. Hamidreza,
“Day-ahead electricity demand forecasting compeition: Post-covid
paradigm,” IEEE Open Access Journal of Power and Energy, vol. 9, pp.
185–191, 2022.
[2] C. Deb, F. Zhang, J. Yang, S. E. Lee, and K. W. Shah, “A review
on time series forecasting techniques for building energy consumption,”
Renewable and Sustainable Energy Reviews, vol. 74, pp. 902–924, 2017.
[3] J. Deng and P. Jirutitijaroen, “Short-term load forecasting using time
series analysis: A case study for singapore,” in 2010 IEEE Conference
on Cybernetics and Intelligent Systems, 2010, pp. 231–236.
3
https://e6.ijs.si/people/carolina-fortuna
4
https://e6.ijs.si/people/gregor-cerar
5
https://e6.ijs.si/people/blaz-bertalanic
[4] M. Spichakova, J. Belikov, K. N˜ ou, and E. Petlenkov, “Feature engineer-
ing for short-term forecast of energy consumption,” in 2019 IEEE PES
Innovative Smart Grid Technologies Europe (ISGT-Europe). IEEE, 2019,
pp. 1–5.
[5] P. Mirowski, S. Chen, T. K. Ho, and C.-N. Yu, “Demand forecasting in
smart grids,” Bell Labs technical journal, vol. 18, no. 4, pp. 135–158,
2014.
[6] W. Kong, Z. Y . Dong, Y . Jia, D. J. Hill, Y . Xu, and Y . Zhang, “Short-
term residential load forecasting based on lstm recurrent neural network,”
IEEE Transactions on Smart Grid, vol. 10, no. 1, pp. 841–851, 2017.
[7] K. Jack and K. William, “The uk-dale dataset, domestic appliance-
level electricity demand and whole-house demand from ﬁve
uk homes,” Scientiﬁc Data, vol. 2, 2015. [Online]. Available:
https://www.nature.com/articles/sdata20157
[8] S. Makonin, “HUE: The Hourly Usage of Energy Dataset
for Buildings in British Columbia,” 2018. [Online]. Available:
https://doi.org/10.7910/DVN/N3HGRN