COMPARISON OF SPATIAL INTERPOLATION METHODS AND MULTI-LAYER NEURAL NETWORKS FOR DIFFERENT POINT DISTRIBUTIONS ON A DIGITAL
ELEVATION MODEL
PRIMERJAVA METOD PROSTORSKE INTERPOLACIJE IN VEČSLOJNIH NEVRONSKIH MREŽ ZA RAZLIČNE GEOMETRIJSKE RAZPOREDITVE TOČK NA DIGITALNEM
MODELU VIŠIN
Kutalmis Gumus, Alper Sen
UDK: 004.032.26:528.4:711 ABSTRACT
Interpolation of a spatially continuous variable from point samples is an important field in spatial analysis and surface models for geosciences. In this study, spatial interpolation methods which are Inverse Distance Weighted (IDW), Ordinary Kriging (OK), Modified Shepard's (MS), Multiquadric Radial Basis Function (MRBF) and Triangulation with Linear (TWL), and Multi-Layer Perceptron (MLP) which is an Artificial Neural Networks (ANN) method were compared in order to predict height for different point distributions such as curvature, grid, random and uniform on a Digital Elevation Model which is an USGS National Elevation Dataset (NED). This study also aims to quantify the effects of topographic variability and sampling density. Errors of different interpolations and ANN prediction were evaluated for different point distributions and three different cross-sections on the characteristic parts of the surface were selected and analyzed. Generally, OK, MS, MRBF and TWL gave promising results and were more effective in terms of characteristics of surface than MLP and IDW. Although MLP simplified the contours obtained from predicted heights, it was a satisfactory predictor for curvature, grid, random and uniform distributions.
Klasifikacija prispevka po COBISS-u: 1.01
IZVLEČEK
Interpolacija prostorsko zvezne spremenljivke iz točkovnih primerov je v geoznanosti pomembno področje prostorske analize in modelov površja. V opisani študiji je bila izvedena primerjava interpolacijskih metod v trirazsežnem prostoru, in sicer so to metoda z inverzno uteženo razdaljo (IDW), navadni kriging (OK), modificirana Shepardova metoda (MS), multikvadrična radialna funkcija (MRBF) in triangulacija z linearno interpolacijo (TWL) ter večslojni perceptron (MLP), ki je predstavnik umetnih nevronskih mrež (ANN). Cilj je bil napovedati višino za različne geometrijske razporeditve točk, kot so ukrivljenost, mreža, naključna in enotna porazdelitev na digitalnem modelu višin, ki je podatkovni niz digitalnega modela višin ameriške geološke službe USGS. Namen študije je količinsko opredeliti učinek topografske variabilnosti in gostote vzorčenja. Napake različnih interpolacij in napovedi z umetnimi nevronskimi mrežami so bile ovrednotene glede na različne geometrijske porazdelitve točk, izbrani in analizirani so bili tri različni prerezi značilnih delov površja. Na splošno se je izkazalo, da metode navadni kriging (OK), modificirana Shepardova metoda (MS), multikvadrična radialna funkcija (MRBF) in triangulacija z linearno interpolacijo (TWL) dajejo boljše rezultate ter so bolj učinkovite glede značilnosti površja kot večslojni perceptron (MLP) in metoda z uteženo inverzno razdaljo (IDW). Čeprav je večslojni perceptron (MLP) poenostavil obrise, pridobljene iz napovedanih višin, se je izkazal kot zadovoljiv pri napovedovanju ukrivljenosti ter določitvi celične mreže za naključne in znane geometrijske porazdelitve točk.
KEY WORDS
KLJUČNE BESEDE
spatial interpolation, neural networks, point distribution
prostorska, interpolacija, nevronske mreže, geometrijska razporeditev točk
vi
s
j5 SÜ
o
■
to
1 INTRODUCTION
Spatial interpolation enables the representation of a surface and predicts values of other unknown areas in an attempt to create a continuous surface (Lam, 1983). Determination of the accuracy of interpolation methods is necessary in geosciences. With developments in computer science and technology, many different spatial interpolation algorithms have been adopted in the software of surface modeling and GIS.
The accuracy of interpolation methods has been discussed in several studies. There have been several comparisons of different spatial interpolators and Artificial Neural Networks (ANN) estimation performed for different fields such as rainfall, air temperature, air pollution, electromagnetic power and so on. Myers (1994) reviewed a variety of methods for spatial interpolation. These techniques range from inverse distance weighted to spline, kriging and radial basis functions. In the study by Snell et. al (2000), the ANN method for the spatial interpolation of daily maximum surface air temperature was presented to generate temperature estimations at different locations. Erxleben et al. (2002) used different interpolation methods to estimate snow depth in datasets with different physiographic and vegetative characteristics. | Chaplot et al. (2006) evaluated the effects of landform types, the density of original data and interpolation techniques for the accuracy of Digital Elevation Model (DEM) generation. Attorre § et al. (2007) analyzed climatic variables and bioclimatic indexes by interpolation methods. Sen et
i
8
i	al. (2008) performed interpolation of indoor electromagnetic field measurements and proposed
is
|	three dimensional prediction by multi-layer perceptron compared with ordinary kriging which
i
|	yielded more accurate results. De et. al. (2009) and Shrivastava (2012) studied the possibility of
I	predicting rainfall through ANN models. Azpurua and Ramos (2010) reviewed the methods of
|	interpolation with the objective of average electromagnetic field magnitude prediction. Guo et
^	al. (2010) quantified the effects of topographic variability and density on DEM accuracy derived
|	from several interpolation methods at different spatial resolutions. Zhang (2010) described
|	the spatial interpolation of meteorological observations using a feed-forward back-propagation
I
I	neural network based on factors affecting the environment, and the goodness of the model was
¡5
|	very high and efficient. Deligiorgi and Philippopoulos (2011) presented the statistical spatial interpolation methods which were commonly employed in the field of air pollution modeling.
§	The comparison of methods for interpolation in these papers is given in Table 1. The stars
|	indicate which method is superior, if the prediction error (i.e. Mean Error, Root Mean Square
§	Error (RMSE) and Mean Absolute Error) is the consideration.
8
S?	is	DEM is a numerical representation of topography, usually made up of equal-sized grid cells, each
co	|	with a value of elevation (Chaplot et al., 2006). DEM can be produced by using photogrammetry
io	§	(aerial and satellite images), SAR interferometry, radargrammetry, airborne laser scanning
'S	j	(LIDAR), cartographic digitization and surveying techniques. LIDAR represents a very accurate
^	I	(e.g. The RMSE of 14 cm for USGS lidar elevation data) and rapid technique for data acquisition,
1	but it is very expensive. Therefore, it is necessary to improve the data quality of DEMs derived
¿s	§
■	Sg	by means of cheaper methods. There are several methods for the assessment of the quality
to	§	of DEMs in terms of Digital Terrain Models and GIS applications. The accuracy of derived
DEMs depends on various factors such as topographic variability (i.e. slope, curvature, aspect, gradient, skeleton, drainage network, catchment boundaries, etc.), sampling density, interpolation methods and spatial resolution (Hu et al., 2009; Guo et al., 2010). Besides, an initial source of errors can be attributed to the data collection. The quality of the estimation of height for each data point depends on the technology applied. Although there have been many studies on the accuracy of interpolation techniques for the generation of DEMs in relation to landform types and data quantity or density, there is still a need to evaluate the performance of these techniques on natural landscapes (Chaplot et al., 2006). What is the accuracy of interpolation methods and ANN in predicting the height of different point distributions on a DEM? It is an important question in the field of modeling surface.
Papers	IDW	NN	S	RBF	K	TIN	C	ANN
Zimmerman et al. (1999)	used	-	-	-	used*	-	-	-
Snell et. al (2000)	used	used	-	-	-	-	-	used*
Erxleben et. al (2002)	used	-	-	-	used	-	used*	-
Chaplot et al. (2006)	used	-	used	used	used	-	-	-
Attore et al. (2007)	used	-	-	-	used*	-	-	used
Sen et al. (2008)	-	-	-	-	used*	-	-	used
Azpurua and Ramos (2010)	used*	-	used	-	used	-	-	-
Guo et. al (2010)	used	used	used	-	used*	used	-	-
Deligiorgi and Philippopoulos (2011)	used	used	used	used	used	used	-	used*
(IDW: Inverse Distance Weighted; NN: Nearest Neighbor; S: Spline; RBF: Radial Basis Function; KKriging; TIN: Triangulated Irregular Network; C: Combined method using binary regression trees)
Table 1. Superiority of methods in the papers is shown by stars.
The purpose of this study is evaluating the performance of spatial interpolation methods which are Inverse Distance Weighted (IDW), Ordinary Kriging (OK), Modified Shepard's (MS), Multiquadric Radial Basis Function (MRBF) and Triangulation with Linear (TWL), and prediction of the Multi-Layer Perceptron (MLP) which is an ANN method consisting of neurons which are mathematical analogs of human cognition in terms of height accuracy on a DEM. This study also aims to quantify the effects of topographic variability and sampling density by comparing different point distributions such as curvature, grid, random and uniform on the DEM. RMSE values of the different interpolations and ANN prediction were evaluated from the original DEM for different point distributions. In addition to this, three different cross-sections on the characteristic parts of the surface were selected and analyzed.
MLP or feed-forward back-propagation neural networks learn with supervision (with known expected outputs). Feed forward neural networks propagate information from the input to the output without recurrence and the error flow is propagated backwards from the output to the input modifying the ANN parameters according to captured dependencies from data (Ratle et al., 2008).
In this study, prediction errors of the methods were statistically compared by several tables and graphics. Generally, OK, MS, MRBF and TWL gave promising results based on RMSE and were
jg S*
o
■
to

more effective in terms of reflecting the characteristics of surface than MLP and IDW. In MLP, the contours were simplified compared to the other methods. However, it was a satisfactory predictor for curvature, grid, random and uniform distributions. Besides, analysis of different cross-sections on the characteristic parts of the surface was performed in order to determine the effect of the methods.
2 METHODS
2.1 MULTI-LAYER PERCEPTRON (MLP)
An artificial neuron is capable of propagating the data and modifying itself accordingly whilst training.
n = Z wx+ b	(1)
i
where w. are the weights of the connections coming to the neuron, xt is the input and b is the
^	bias. These hyper parameters are determined through the training procedure by minimizing
§
|	the error between the target variable data (y) and the perceptron prediction output (yO using
|	conventional optimization algorithms (e.g. gradient descent, etc.). The perceptron weights w i
|	are updated through the learning procedure when the input data (x) are presented to the input
1	and the corresponding output (y)is compared with the expected output (y):
is
|	w t = w + a(y - y') (2)
| where a corresponds to the learning rate. MLP is an extension of the single layer perceptron based on the addition of a hidden layer of neurons between the input and the output neurons.
I MLP with a nonlinear element (sigmoid or of another type) in each hidden neuron is a universal
^
predictor capable of estimating every continuous function with just one hidden layer. MLP
s
| estimation with a single hidden layer with m neurons is calculated as a weighted sum:
I	m
fm (X W, V) = Z WAXVi ) + Wo	(3)
i
I
I where w t are the weights corresponding to each neuron connection and wg is an additive bias | corresponding to the entire hidden layer (Figure 1); s is a sigmoid activation function which
| represents a nonlinear element in MLP: v. is a activation function steepness parameter (Ratle § '
§ et al., 2008). The activation functions are log-sigmoid and tan-sigmoid are respectively given in


^ £ s
equation 4. In this study, tan-sigmoid activation function was used.
^ § s =1/(1+ exp((-«)) s = (1-exp((-2n))/(1 + exp((-2n))	(4)
Error minimization between the expected output and the MLP output can be performed using g I a wide selection of known optimization algorithms. Gradient optimization algorithms are based ^ ^ on calculating the gradient of the minimized function and are very good in finding local minima. ■I I Gradient optimization algorithms vary in performance efficiency and speed. Multiple results of 1 the gradient optimization can be considered in order to reach the global minimum (or the lower
local minimum) (Ratle et al., 2008). In this study, six different training functions, namely trainlm (Levenberg-Marquardt) (More, 1978), traincgf (Fletcher-Powell Conjugate Gradient) (Scales, 1985), traingd (Gradient Descent), traingdx (Gradient descent with momentum and adaptive learning rule backpropagation) (Beale, 1972), trainrp (Resilient Backpropagation) (Riedmiller and Braun, 1993) and trainscg (Scaled Conjugate Gradient) (Moller, 1993) were used.
Figure 1. MLP structure (3-6-1) (Ratle et al., 2008).
An advantage of this method is that the guiding variables are not necessarily assumed to be linearly related with the data being interpolated, and combinative effects are taken into account (Rigol et al., 2001). The disadvantage is that this method is non-deterministic. It is difficult to adapt and set the learning parameters for different models.
2.2 ORDINARY KRIGING (OK)
Kriging is a geostatistical method that takes into account both the distance and the degree of variation between known data points (Chaplot et al., 2006). To fulfill the unbiasedness, an additional constraint is imposed over the weights:
N(x„)
Z w (Xo) = 1	(5)
i=1
Minimization of the estimation error provides the system of equations called an ordinary kriging system. In a more general way it can be written in terms of the variogram:
NX )	N(xo )
Z WJ (Xo )rff +M = r,o Z Wj (x0) = 1 yi = 1. . . .N(Xo)	(6)
j=1	j=i
where ^ is a Lagrangian multiplier, introduced because of the variance minimization with a constraint and y is a semi-variogram. Note that yj can be calculated for each data pair Z(x) and Z(x) (i,j=l..., N(x) the number of data in the neighborhood of (x0)) whereas ym is approximated by the fitted theoretical variogram model 1(h) with the separating vector h=x-x0 as an argument. An ordinary kriging variance (Ratle et al., 2008):
N(Xo )
<?ok (xo) =	wi(xo )ri0	(7)
i=1
LJU
0 Q*
IC/5
z p
C/5 0
z
<
z
N N
The advantage of this method is the statistical formulation of the best linear unbiased estimate. The disadvantage is that the weights must be computed for each node of the grid, that is why this method is used for small samples, and the method produces undesirable "pits" and "circular" contours (Dressier, 2009).
2.3 INVERSE DISTANCE WEIGHTED (IDW)
The simplest form of IDW interpolation is called the Shepard's method (Shepard, 1968).
n
F (x y) = Z wifi (8) i= 1
where n is the number of distributed points in the data set, f are the function values at the distributed points and w. are the weight functions assigned to each point. The influence of a known data point is inversely related to the distance from the unknown location that is being estimated. Shepard (1968) used weight function:
| Wi = d-p /(9)
I	i=°
| wherep is an arbitrary positive real number called the power parameter (typically p = 2), d. are
as	1
| the distances from the dispersion points to the interpolation point:
dno)
| where (x, y) are coordinates of the interpolation point and (x, y ) are the coordinates of each | dispersion point. The weight functions are normalized as a sum of the weights of the unit I (Azpurua and Ramos, 2010).
I
| Although IDW is simply implemented, high computer time is needed if the number of points is | large due to the computation of distances, and has the tendency to generate "bull's-eye" patterns of concentric contours (Dressler, 2009).
§
I 2.4 MODIFIED SHEPARD'S METHOD (MS)
I
I
The Modified Shepard's Method uses an inverse distance weighted least squares method. The | Modified Shepard's Method can be either an exact or a smoothing interpolator. Franke and § Nielson (1980) developed a modification that eliminates the insufficiency of the Shepard's method, and does not tend to generate "bull's eye" patterns.

S
As detailed in Basso et al. (1999), the first step in this method is the definition of the radii of influence, R and R . While R denotes the radius of influence of the data points in the nodal
q	w	q	r
^ I functions, R denotes the radius of influence of the nodal functions in the interpolant.
^ § ^ s
ss
Rq= (D/2)gNq I N Rw = (D 12) JNw I N	(11)
where D = max d (x,y), N is the total number of data points. The values of Nq and Nw can ■ I be interpreted as representing the number of data points estimated to lie within the circles
o
■
of radii R and R respectively. After calculating the radii R and R , the nodal functions are
q	w r	J	°	q	w'
computed using the least squares method. These functions are calculated based only on data points sampled inside radius Rq. This slight modification gives a local behavior to the method. The distance function p:
1/ p (xy) = (R - d)/Rq d
(12)
(Rq - d) denotes the positive part of the quantity. The next step is to solve (for each nodal point (x„yk), k=1, ... , N) the least squares problem. The Modified Shepard formula:
N	N
D[f ]( X, y) = (2 Qk ( x, y) / Vk2 ( x, y)) /(Z1 / Vk2 ( x, y))
(13)
k=\ k=1 where Qk(x,y) is the nodal functions.
Each grid point is calculated based on the nodal functions computed for the sampled points inside a circle of radius R . This is defined by the Equation (13).
1/Vk(x, y) = (R - dk)/Rw dk
(14)
2.5 TRIANGULATION WITH LINEAR INTERPOLATION METHOD (TWL)
In the Triangulated Irregular Network (TIN) model, a finite set of points is stored together with their elevation. The points need not lie in any particular pattern, and the density may vary. On these points a planar triangulation is given based on a Delaunay's triangulation. Any point in the domain will lie on a vertex, an edge or in a triangle of the triangulation. If the point does not lie on a vertex, then its elevation is obtained by linear interpolation (Kreveld, 2000). Suppose T is a DEM point in triangle abc. Under the assumption that the triangle vertices are error-free, TIN interpolation can be written as (Hu et al., 2009)
H = w H + w.H. + wH ,w + w. + w = 1, w , w., w > 0
T a a b b c c a b c	abc
(15)
where H is the height, wa, wb, wc are the areal proportions of the sub-triangles constructed using T. If s is the total area of triangle abc, sa, sb, sc are the areas of the sub-triangles, then
sb/s, wc = Sc/s	(16)
w = s /s, w,
a a	b
The TWL is very fast, but it is limited to the convex envelope of the points, resulting surface is not smooth and the division into triangles may be ambiguous, and requires a medium-large number of data point to generate acceptable results (Dressler, 2009).
2.6 MULTIQUADRIC RADIAL BASIS FUNCTION (MRBF)
Hardy (1968) introduced the multiquadric method for the construction of approximate two-dimensional surfaces of field data. The method of radial basis functions uses the interpolation function in the given form:
f ( x, y) = p( x, y) + £ w x, y ) - ( X t , Y )|)
(17)
i=1
Ci
s jg
Co *
o
■
to
wherep(x, y) is a polynomial, wteR are real weights, is the Euclidean distance between points (x, y) and (X, Y), is a radial basis function. A commonly used multiquadric radial basis function is given in Table 2. (r, is the anisotropically rescaled, relative distance from the point to the node; c, is the smoothing parameter):
Radial Basis Function (fi(r) )	Equation
Multiquadric	Vr2+c2
Inverse Multiquadric	l/Vr2+c2
Multilog	log(r2 +c2)
Natural Cubic Spline	(r 2 + c 2 )3/2
Thin Plate Spline	(r 2 + c 2)log(r2 +c2)
Table 2. Types of radial basis function.
The interpolation process starts with polynomial regression using the polynomial p(x, y). Then the following system of n linear equations is solved for unknown weights w,, (i=l...,n) determined by imposing the interpolation conditions, which lead to a symmetric system of linear equations. The Lagrangian form is an easy evaluation of linear equations.
Zj -P(X.Y) = ±w4x,,Y)-(x,Y^
1=i
j = K-,n
(18)
As soon as the weights (wi) are determined, the z-value of the surface can be directly computed from Equation (17) at any point.
The advantages of this method are that the system of linear equations has to be solved only once in contrast to the Kriging method, where to be solved for each grid node, and uses a range of kernel functions. The disadvantages are that if the number of points n is large, the number of linear equations is also large, moreover, the matrix of the system is not sparse, which leads to a long computational time and possibly to the propagation of rounding errors (Dressler, 2009).
3 STUDY AREA
r-C-
s
o
USGS National Elevation Dataset (NED) 1/3 arc-second (approximately 10 meters grid spacing) DEM was used in this study. The dataset has been tested by comparing it with a reference data which are the geodetic control points that the National Geodetic Survey uses for gravity and geoid modeling. The NED value at each of the control point locations was derived through bilinear interpolation, and error statistics were calculated. Bilinear interpolation is the calculation of the new pixel value performed by the weight of the four surrounding pixels (Hu et al., 2009). The production method is an improved contour-to-grid interpolation known as "LineTrace+" (LT4X) (Osborn et al., 2001). The overall absolute vertical accuracy expressed as RMSE for LT4X production is 2.17 m (Gesch, 2007).
The study area is Lower Beaver in Colorado, USA. The difference between maximum and minimum elevations (the relief) is 82 m in the study area. The horizontal datum is the North
American Datum of 1983 and UTM zone is 12N. The lower left corner coordinates of DEM are (9: 38° 44' 16.8''; X: -112° 38' 45.6'') and the upper right corner coordinates are (9: 38° 45' 25.2''; X: -112° 37' 22.8''). The study area is 2-by-2 square kilometers. The 3D model of the study area is shown in Figure 2.
Figure 2. 3D model of the study area.
4 DATA PREPROCESSING
Depressions of a DEM may be natural reflections of the terrain, or the artificial depressions as a result of interpolation methods, input data errors, or restrictions in DEM resolution. Sinks are often errors due to the resolution of the data or rounding of elevations to the nearest integer value. In this study, hydrology tool of ArcGIS 10 software was utilized. The depressions were removed by rising to the lowest elevation value on the rim of the depression (Jenson and Domingue, 1988) for improving DEM. Filling sinks modifies the elevation value to eliminate the problems about cells surrounded with higher elevation and resumes the water flow. Sinks were filled based on appropriate the z-limit (in this study, 16 cm). The z-limit specifies the maximum depth of a sink that will be filled. A procedure using geoprocessing tools to find z-limit for the "Fill" as follows (ArcGIS 10 Help).
•	Use "Sink" to create a raster of sinks coded with depth.
•	Use "Watershed" to create a raster of the contributing area for each sink.
•	Use "Zonal Statistics" with the minimum statistic to create a raster of the minimum elevation in the watershed of each sink.
•	Use "Zonal Fill" to create a raster of the maximum elevation in the watershed of each sink.
•	Use "Minus" to subtract the minimum value from the maximum value to find the depth.
For the neural network implementation, all of the input data were normalized based on minimummaximum normalization to avoid the numerical conflicting. Minimum-maximum normalization (Han et al., 2012):
jg
co *
o
■
to
X — Xi xmin
(19)
where x . and x are the minimum and maximum values of X coordinate in the dataset,
mm	max
respectively. X is normalized based on the range of the data (likewise, Y coordinate).
s
??
^ §
■ 1 SK <3
1 n

5 POINT DISTRIBUTION
In predicting height, different point distributions were compared such as curvature, grid, random and uniform on the DEM. Grid distribution decreases the number of points in an irregular point space, not considering curvature and original density of pattern. However, curvature distribution decreases the number of points on surfaces of which height is closely similar but maintains details in high curved areas. Uniform distribution decreases the number of points uniformly on surfaces of which height is closely similar, but reduces the number of points on curved surfaces to a specified density. Random distribution removes a percentage of points randomly from an irregular point space.
Figure 3. Study area with contours and four different point distributions in the red rectangular.
In this study, totally 47,089 points obtained from DEM were used. As a result of using grid, random, uniform and curvature distributions, the number of points was reduced by 50%, e.g.
x — x
max min
more points were selected on the characteristic parts of surface, while fewer points were used on the flat parts in the curvature distribution. Different point distributions in the interpolation methods and ANN prediction were used for comparing the effect of different sampling densities (in this case, 100% and 50%) and the topographic variability. Besides, the characteristic lines (ridges and valleys) could contribute to the point distribution. In Figure 3, four different point distributions (only shown in the red rectangular as an example) are represented with contours of the study area obtained from the DEM. It is realized that the curvature distribution maintains more points among dense contours (highly curved) than flat areas.
6 EXPERIMENTATION
In this study, MLP was applied by Matlab R2009b neural network toolbox and interpolation methods were performed by Surfer 9.0. When training multilayer networks, the general practice is to first divide the data into three subsets. The first subset is the training set, which is used for computing the gradient and updating the network weights and biases. The second subset is the validation set. The error in the validation set is monitored during the training process. The validation error normally decreases during the initial phase of training, as does the training set error. However, when the network begins to overfit the data, the error in the validation set typically begins to rise. The test set error is not used during training but it is used to compare different models. Test error is not minimized and thus not propagated backwards from the MLP outputs to the input. The size of the test can range between 1-25% of the total amount of data available (Ratle et al., 2008). In this study, ratios for training, testing and validation were 0.7, 0.15 and 0.15, respectively.
Before training MLP, the weights and biases are initialized. The process of training a neural network involves tuning the values of the weights and biases of the network to optimize network performance, as defined by the network performance function which is the Root Mean Square Error (RMSE) as defined in Equation (18).
1 N	2	S
RMSE=hi?i(Z'"Zi)2 (20) |
8


where Z ' is the network output height, Z is the target output height.	5
' ' |
In neural network toolbox, the magnitude of the gradient and the number of validation checks | are used to terminate the training. The gradient will become very small as the training reaches | a minimum of the performance. The number of validation checks represents the number of successive iterations that the validation performance fails to decrease.
1 ^
In this study, the X-Y coordinates were the input, while the Z coordinates were the output.
I -v
parameters of the multi-layer neural network were experimentally determined by approximately | 15 70 variations. Six different training functions, namely trainlm, traincgf, traingd, traingdx, trainrp, I |
and trainscg, and different hidden layer and neuron numbers and epochs were examined. The 1
§	'S
best variations are given in Table 3. The best performance was performed by the Levenberg- |	■
Marquardt (trainlm) function based on RMSE within the range of 12 cm and 16 cm that changed I	I
in the distribution of points. The fastest training was performed by the trainrp function (Intel Core 2 Duo CPU 2.40 GHz). The best performance of MLP is the trainlm that used 3 hidden layers with 30 neurons (2-30-30-30-1). There are also some experiments failed because of the local minima. The trained network can be used for predicting height (output) while position is input. The parameters used in the interpolation methods are given in Table 4.
£
O
Distribution	%	Train	Layer	Neuron	Epoch	Time	Grad.	RMSE (m)
All	100	trainlm	3	30	480	5 1 m5 5 s	0.00006	0.156
All	100	trainlm	1	10	787	06m14s	0.00076	1.732
All	100	trainscg	5	70	1731	3h7m5	0.00018	0.335
All	100	trainrp	3	30	539	13m34s	0.00041	0.561
All	100	traincgf	3	30	774	21m15s	0.00032	0.649
Curvature	50%	trainlm	3	30	429	40m55s	0.00001	0.136
Curvature	50%	trainlm	1	10	989	03m41s	0.00267	1.795
Curvature	50%	trainscg	5	80	948	43m56s	0.00115	0.424
Curvature	50%	trainrp	3	30	1243	09m50s	0.00009	0.511
Grid	50%	trainlm	3	30	663	2h21m	0.00006	0.127
Grid	50%	trainlm	1	10	134	43s	0.00001	2.509
Grid	50%	trainscg	5	80	522	27m26s	0.00067	0.516
Grid	50%	trainrp	3	30	2000	20m51s	0.00006	0.497
Random	50%	trainlm	3	30	795	01h30	0.00002	0.123
Random	50%	trainlm	1	10	275	01m01s	0.00010	2.676
Random	50%	trainrp	5	80	411	12m04s	0.00013	0.339
Random	50%	trainscg	5	80	678	31m34s	0.00038	0.394
Uniform	50%	trainlm	3	30	747	2h31m	0.00000	0.132
Uniform	50%	trainlm	1	10	376	2m 10s	0.00016	1.769
Uniform	50%	trainrp	3	30	1493	15m54s	0.00007	0.476
Uniform	50%	trainscg	5	80	539	28m15s	0.00041	0.541
Table 3. Variations for the architecture of MLP (errors in meters).	
Methods	Used Parameters
OK	Variogram Model = Linear, Slope = 1, Aniso = 1, Kriging Type = Point, Searchs Elips (Radius 1 = 70, Radius 2 = 70, Angle = 0)
IDW	Power (p)= 2, Smoothing = 0, Anistropy* (Ratio =1, Angle = 0) Searchs Elips (Radius 1 = 70, Radius 2 = 70, Angle = 0)
MS	Smoothing factor: 0, Quadratic Neighbors: 13, Weighting Neighbors: 19, Search (R: 70, R,: 70, Angle: 0)
MRBF	Basic Function: Multiquadric, R2 Parameter: 6.8, Anistropy* (Ratio =1, Angle = 0), Searchs Elips (Radius 1 = 70, Radius 2 = 70, Angle = 0)
TWL	Anistropy* (Ratio =1, Angle = 0)
*It is used for finding ranges in three principal, orthogonal directions and transforming a three dimensional lag vector.
Table 4. The parameters used in different interpolation methods.
The graphic of mean square error per each epoch of the training, validation and test set for grid distributed points trained by trainlm function with 3 hidden layers and 30 neurons are compared in Figure 4. The error graphic shows the distribution of the network errors.
Figure 4. The comparison of training, validation and test error.
7 RESULTS
Height interpolation was performed by five different interpolation methods and an ANN method on the basis of point distributions. The methods were statistically compared by several tables and graphics. The accuracy of the methods was calculated by RMSE comparing to the original DEM. The statistics (minimum, maximum, mean, median, standard error, variance and average deviation) of errors are given in Table 5 and the comparison of methods via point distributions based on RMSE is indicated in Figure 5. According to the RMSE values in Table 5 and Figure 5, OK, MS, MRBF and TWL gave the best results. MRBF showed weaker performance considering uniform distribution. The performance of IDW for curvature distribution was prominently worse than OK, MS, MRBF and TWL. However, its performance increased when it was performed for uniform distribution, such that it was the best. Since using all data (100%) for the training of MLP decreased the effect of specific data (representative) representing the characteristics of the dataset, its prediction was weaker than the other methods. MLP performed better for the distributions which decrease the number of points. However, OK, MS, MRBF and TWL gave the best performance in the case of using all data. The best interpolation performances are MS in the case of using all points, grid and random distributions; MRBF for curvature distribution, and IDW for uniform distribution.
Ci
s
s
o su to
50%	Ordinary Kriging				Modified Shepard's				Inverse Distance Weighted			
Dist.	Cur.	Grid	Rand.	Uni.	Cur.	Grid	Rand.	Uni.	Cur.	Grid	Rand.	Uni.
Min.	-1.268	-1.417	-1.615	-1.671	-1.458	-1.660	-1.172	-1.937	-2.104	-2.059	-1.472	-1.617
Max.	2.139	1.828	1.931	2.225	2.758	1.574	2.263	1.934	2.522	1.894	3.802	1.581
Mean	0.001	0.001	0.001	0.001	0.000	0.000	0.000	0.000	0.002	0.000	0.001	0.000
Median	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
Standard error	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.001	0.000	0.000	0.000
Variance	0.005	0.008	0.006	0.007	0.007	0.005	0.005	0.008	0.020	0.007	0.009	0.005
Average deviation	0.034	0.037	0.031	0.038	0.036	0.030	0.032	0.038	0.062	0.035	0.038	0.028
RMSE	0.074	0.087	0.075	0.086	0.084	0.073	0.072	0.092	0.142	0.082	0.097	0.069
50%	Multiq. Radial Basis Function				Triangulation with Linear				Multi-Layer Perceptron			
Dist.	Cur.	Grid	Rand.	Uni.	Cur.	Grid	Rand.	Uni.	Cur.	Grid	Rand.	Uni.
Min.	-1.171	-1.458	-1.933	-1.504	-1.671	-2.012	-1.309	-1.952	-1.970	-1.851	-2.736	-2.384
Max.	2.261	2.758	1.973	2.428	2.225	2.015	1.872	1.887	2.260	1.275	1.490	1.692
Mean	0.000	0.000	0.000	0.002	0.001	0.000	0.001	0.001	0.000	-0.002	0.000	-0.000
Median	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	-0.000	-0.001	0.000	0.000
Standard error	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.001	0.001	0.001	0.001
Variance	0.005	0.007	0.006	0.010	0.007	0.007	0.007	0.008	0.018	0.016	0.015	0.017
Average deviation	0.032	0.036	0.033	0.041	0.038	0.036	0.034	0.036	0.090	0.085	0.082	0.086
RMSE	0.071	0.084	0.078	0.102	0.086	0.085	0.082	0.087	0.136	0.127	0.123	0.132
100%	Ordinary Kriging				Modified Shepard's				Inverse Distance Weighted			
Min.	-0.737				-1.838				-1.495			
Max.	1.253				1.554				2.797			
Mean	0.000				0.000				0.001			
Median	0.000				0.000				0.000			
Standard error	0.000				0.000				0.000			
Variance	0.001				0.001				0.006			
Average deviation	0.011				0.008				0.022			
RMSE	0.038				0.034				0.080			
100%	Multiq. Radial Basis Function				Triangulation with Linear				Multi-Layer Perceptron			
Min.	-0.756				-0.917				-2.915			
Max.	1.226				1.649				4.147			
Mean	0.000				0.001				-0.003			
Median	0.000				0.000				0.000			
Standard error	0.000				0.000				0.000			
Variance	0.001				0.002				0.024			
Average deviation	0.009				0.013				0.001			
RMSE	0.035				0.046				0.156			
Table 5. The statistical results of error (in meters) via point distributions for height interpolation.
0.1800 0.1600 0.1400 0.1200 0.1000 0.0800 0.0600 0.0400 0.0200 0.0000
liiirtli
OK MS IDW MRBF TWL MLP
1100% curvature grid random uniform
Figure 5. Graphic of performances (RMSE) for each method using different point distributions.
The contours were obtained for visualizing the interpolated surface by different methods. Using all data (100%) for MLP training yielded weak prediction results. Therefore, MLP simplified the contours. For example, the eliminated details of the contour lines, as an MLP result compared with other interpolation results in the case of using all data, are shown by red circle in Figure 6.
Figure 6. The contours obtained by interpolation methods and MLP for each point distribution and eliminated details of contours in red circles.
In addition to these results, three different cross-sections on the characteristic parts of the surface were selected and analyzed in order to determine the effect of methods using all data (100%). A-B, C-D and E-F cross- sections are shown on the 3D model and the contours in Figure 7. The A-B cross-section is beveled and E-F is more rugged than the C-D cross-section. The methods were examined based on RMSE for 588, 1500 and 522 points on the A-B, C-D and E-F cross-sections, respectively. The RMSE results of the interpolations are given in Table 6. A graphic of performances of each method via point distributions based on RMSE for each cross-section is given in Figure 8.
<~N
2
■¡2 O
3
Figure 7. A-B, C-D and E-F cross- sections on 3D model and contours.
r-C-
S
o to
Cross	Method	All	Cur.	Grid	Rand.	Uni.
Section		100%	50°%	50%	50%	50%
	OK	0.032	0.06	0.118	0.128	0.058
	MS	0.051	0.08	0.089	0.067	0.06
A-B	IDW	0.227	0.202	0.398	0.406	0.216
	MRBF	0.042	0.073	0.098	0.108	0.057
	TWL	0.034	0.044	0.126	0.104	0.094
	MLP	0.421	0.157	0.216	0.149	0.141
	OK	0.018	0.029	0.026	0.032	0.025
	MS	0.02	0.034	0.028	0.029	0.02
C-D	IDW	0.032	0.059	0.04	0.048	0.048
	MRBF	0.019	0.029	0.025	0.031	0.024
	TWL	0.019	0.027	0.025	0.029	0.022
	MLP	0.143	0.064	0.064	0.074	0.072
	OK	0.043	0.121	0.153	0.18	0.094
	MS	0.054	0.138	0.151	0.226	0.1
E-F	IDW	0.208	0.284	0.362	0.335	0.28
	MRBF	0.048	0.134	0.144	0.179	0.111
	TWL	0.049	0.122	0.171	0.157	0.116
	MLP	0.61	0.332	0.234	0.211	0.206
Table 6. The best method shown with gray highlight for each point distribution on cross- sections based on RMSE (in meters).
In the case of using all data, OK performed better on all cross sections. TWL performed better on the A-B and C-D cross-sections, while OK performed better on the E-F cross-section for curvature distribution. MS performed better on the A-B cross-section, while MRBF performed better on the C-D and E-F cross-sections, and TWL accompanied MRBF on the C-D cross-section for grid distribution. MS performed better on the A-B and C-D cross-sections, while TWL performed better on the C-D and E-F cross-sections for random distribution. Finally, MRBF was better on
the A-B cross-section; MS was better on the C-D cross-sections and OK was better on the E-F cross-sections for uniform distribution. As a result of this, OK and TWL were superior for three different cross-sections. IDW results were poor especially on the A-B cross-sections and MLP results decreased performance again in the case of using all data.
Figure 8. Graphics of performances for A-B, C-D and E-F cross-sections respectively. The cross-sections are presented for all data (100%) processing in Figures 9, 10 and 11. Some specific parts are viewed by close-ups. The results of OK, TWL, MS and MRBF were almost the same on the cross-sections. However, IDW sometimes moved away from the other curves on slightly changed slopes. Since using all data (100%) for the training of MLP decreased the effect of specific data (representative) that represents the characteristics of the dataset; its prediction was weaker than the other methods and it deleted surface details and provided a smooth curve. The TWL resulting surface was not smooth and the curve consisted of line segments.
is
o
■
to
Figure 9. The views of the specific parts of A-B cross- section obtained different interpolation methods (TWL, IDW, OK, MS, MRBF, respectively) and ANN (MLP).
r-C-
s
o su to
Figure 10. The views of the specific parts of C-D cross- section obtained different interpolation methods (TWL, IDW, OK, MS, MRBF, respectively) and ANN (MLP).
Figure 11. The views of the specific parts of E-F cross- section obtained different interpolation methods (TWL, IDW, OK, MS, MRBF, respectively) and ANN (MLP).
8 CONCLUSIONS
In this study, the authors determined the accuracy of interpolation methods and an artificial neural network in predicting the height of different point distributions on a Digital Elevation Model (DEM). This study also aims to quantify the effects of topographic variability and sampling density. Errors of different interpolations and ANN prediction were evaluated for different point distributions and three different cross-sections on the characteristic parts of the surface were selected and analyzed. Generally, OK, MS, MRBF and TWL gave promising results based on RMSE and were more effective in terms of reflecting the characteristics of the surface. Although MLP simplified the contours, it was a satisfactory predictor for curvature, grid, random and uniform distributions. An advantage of ANN is that the guiding variables are not necessarily assumed to be linearly related with the data being interpolated, and combinative effects are taken into account. However, ANN is a non-deterministic method and it requires so many experiments. The best interpolation performances are MS in the case of using all points, grid and random distributions; MRBF for curvature distribution, and IDW for uniform distribution. Since the RMSE is a statistical value, it could not give enough information about the characteristic parts of topography (e.g. ridges, valleys, peaks, pits etc.). Therefore, the cross- | sections on the characteristic parts were visualized and evaluated. However, it would be useful | to prove the tested interpolation methods by means of geovisual methods. In the analysis of §
S3
different cross-sections on the characteristic parts of the surface, OK and TWL were superior | for three different cross-sections. In the future, DEMs for topographically different terrains (e.g. | flat, hilly and mountainous) will be evaluated with several training functions of MLP.	|
ACKNOWLEDGEMENTS	I
I
I
The authors would like to thank the referees for their careful review and Assoc. Prof. Dr. Turkay | Gokgoz, Yildiz Technical University, Geomatic Engineering Dep., for the valuable comments. §
I
Literature and reference:	|
ArcGIS 10 Help: http://help.arcgis.eom/en/arcgisdesktop/10.0/help.	^
Attorre, F., Alfo, M., Sanctis, M. D., Francesconi, F., Bruno, F. (2007). Comparison of interpolation methods for	|
mapping climatic and bioclimatic variables at regional scale. International Journal of Climatology, 27 (13), 1825-1843, (ISSN)1097-0088doi:10.1002/joc.	|
Azpurua, M,. Ramos, K.D. (2010). A comparison of spatial interpolation methods for estimation of average	^
electromagnetic field magnitude. Progress In Electromagnetics Research M, Vol. 14, 135-145.	§
I
Basso, K., Zingano, P. R. A., Freitas, C. M. D. S. (1999). Interpolation of scattered data: investigating alternatives for	|
the modified shepard method. In: Proceedings Of Sibgrapi 99. - Xii Brazilian Symposium On Computer Graphics And g; Image Processing, 39-47.	|
Beale, E.M.L., (1972). A derivation of conjugate gradients. in F.A. Lootsma, Ed. Numerical methods for nonlinear	|
optimization. London:Academic Press.	| riS
Chaplot, V., Darboux, F., Bourennane, H., Leguedois, S., Silvera, N., Phachomphon, K. (2006). Accuracy of interpolation techniques for the derivation of digital elevation models in relation to landform types and data density. Geomorphology, 77(1-2), 126-141. doi:10.1016/j.geomorph.2005.12.010.
>1 at
De, S.S., Chattopadhyay, G., Debnath, A. (2009). Artificial Neural Network - a Tool for Prediction of Monsoon Rainfall | ® over North and South Assam in India. Bulg. J. Phys. 36, 55-67. § O



Deligiorgi, D., Philippopoulos K. (2011). Spatial Interpolation Methodologies in Urban Air Pollution Modeling: Application for the Greater Area of Metropolitan Athens, Greece. Advanced Air Pollution, Farhad Nejadkoorki (Ed.), ISBN: 978-953-307-511-2, InTech, DOI: 10.5772/17734.
Dressler, M. (2009). Art of Surface Interpolation, e-book, http://m.dressler.sweb.cz/AOSIM.pdf
Erxleben,J., Elder, K, Davis, R. (2002). Comparison of spatial interpolation methods for estimating snow distribution in the Colorado Rocky Mountains. Hydrol. Processes, 3649(August), 3627-3649. doi:10.1002/hyp.1239.
Franke, R., Nielson, G. (1980). Smooth Interpolation of Large Sets of Scattered Data.. International Journal for Numerical Methods in Engineering, v. 15, 1980, 1691-1704.
Gesch, D.B.(2007). The National Elevation Dataset. Digital Elevation Model Technologies and Applications: The DEM Users Manual, Maune, D. (ed.), 2nd Edition, American Society for Photogrammetry and Remote Sensing, Bethesda, Maryland, 99-118.
Guo, Q., Li, W., Yu, H., Alvarez, O. (2010). Effects of Topographic Variability and Lidar Sampling Density on Several DEM Interpolation Methods. Photogrammetric Engineering & Remote Sensing, Vol. 76, No. 6, 1-12.
Hardy, R.L. (1990). Theory and Applications of the Multiquadric-biharmonic Method. Computers Math. With Applic, v 19, no. 8/9, 163-208.
Hu, P., Liu, X., Hu, H. (2009). Accuracy Assessment of Digital Elevation Models based on Approximation Theory. Photogrammetric Engineering and Remote Sensing, Vol. 75, No.1, 49-56.
^ Jenson, S. K. and Domingue, J. O. (1988). Extracting topographic structure from digital elevation data, for geographic S information system analysis, Photogramm. Eng. Rem. S., 54, 1593-1600.
I
| Han, J., Kamber, M., Pei, J. (2012). Data mining Concepts and Techniques, Elsevier Inc. Third Edition..
ij Kreveld, M. (2000). Digital Elevation Models and TIN Algorithms. Chap. 3 of Algorithmic Foundations of Geographic | Information Systems, Kreveld, M., Nievergelt, J., Roos, T., Widmayer, P. (eds.), Springer, 40-41.
| Lam, N.S.N. (1983). Spatial Interpolation Methods: A Review. The American Cartographer, Vol. 10, No. 2, 129-149.
| Moller, M.F. (1993). A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, Vol. 6, | 525-533.
I
5 More, J.J. (1978), The Levenberg-Marquardt algorithm: Implementation and theory. Numerical Analysis, Lecture notes | in Math., No. 630, Springer, Berlin 1978.
I
^	Myers, D. E., (1994). Spatial interpolation: An overview. Geoderma, 62, 17-28.
%
Osborn, K., List, J., Gesch, D.B., Crowe, J., Merrill, G., Constance, E., Mauck, J., Lund, C., Caruso, V., Kosovich, J.
|	(2001). National Digital Elevation Program (NDEP). Chap. 4 of Digital elevation model technologies and applications—
|	the DEM users manual, Maune, D. (ed.), American Society for Photogrammetry and Remote Sensing, Bethesda,
|	Maryland, 83-120. (Also available online at http://topotools.cr.usgs.gov/pdfs/osb_eros-pub-bkc_asprs.2001.pdf)
I
Ratle, F., Pozdnoukhov, A., Demyanov, V., Timonin, V., Savelieva E. (2008). Spatial Data Analysis and Mapping Using j Machine Learning Algorithms. Advanced Mapping of Enviromental Data Geostatistics, Machine Learning and Bayesian
I Maximum Entropy, Mikhail Kanevski (ed.), ISTE,Wiley.
§
Riedmiller, M., Braun, H. (1993). A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP | Algorithm.Proceedings ofthe lEEEInternational Conference on Neural Networks (ICNN),San Francisco.586 - 591.
I Rigol, J.P., Jarvis C.H., Stuart, N. (2001). Artificial neural networks as a tool for spatial interpolation. Int. Journal of
äs
Geographical Information Science, Volume 15, Issue 4, 323-343.
^ §	Scales, L.E. (1985). Introduction to Non-Linear Optimization, New York, Springer-Verlag.
t-n I
|	Sen,A.,Gumusay, M.U.,KavasA.,Bulucu U. (2008).ProgramminganArtificialNeuralNetwork ToolforSpatial
|	InterpolationinGIS-ACaseStudyforlndoorRadioWavePropagationofWLAN. Sensors(8),5996-6014.
jS I	doi:10.3390/s8095996.
is V
cu ¡s	Shepard, D. (1968). A Two Dimensional Interpolation Function for Irregularly Spaced Data. Proc. 23rd Nat. Conf.
^ 1	ACM, 517-523.
S t
|	Shrivastava, G., Karmakar, S., Kowar, M.K., Guhathakurta, P. (2012). Application of Artificial Neural Networks in
§ |	Weather Forecasting:A Comprehensive Literature Review. International Journal of Computer Applications, Vol. 51, No.
I §	18, Foundation of Computer Science, New York, NY, USA, 17-29.
Snell, S.E., Gopal, S., Kaufmann R.K. (2000). Spatial Interpolation of Surface Air Temperatures Using Artificial Neural Networks: Evaluating Their Use for Downscaling GCMs. Journal Of Climate, 13(5), 886-895.
Zhang. Y. (2010). Spatial Interpolation of meteorology monitoring data, for western China, using back-propagation artificial neural networks. Communications and Networking in China (CHINACOM), 5th International ICST Conference, Beijing/ China..
Zimmerman, D., Pavlik, C., Ruggles, A., Armstrong, M. (1999). An experimental comparison of ordinary and universal kriging and inverse distance weighting. Mathematical Geology, 31, 375-390. doi:10.1023/A:1007586507433.
Received for publication: 28 January 2012 Received by: 23 April2013
Kutalmis Gumus, Research Assistant
Yildiz Technical University
Civil Engineering Faculty, Geomatic Engineering Depatment, Surveying Division
Davutpasa Street, Esenler, Istanbul - Turkey E-mail: kgumus@yildiz.edu.tr
Alper Sen, Research Assistant
Yildiz Technical University
Civil Engineering Faculty, Geomatic Engineering Department, Cartography Division
Davutpasa Street, Esenler, Istanbul - Turkey E-mail: alpersen@yildiz.edu.tr
S jg
co *
o
■
to