Informática 35 (2011) 323-329 323
Distributed Multi-ant Algorithm for Capacity Vehicle Route Problem
Jie Li, Yi Chai and Cao Yuan Chongqing University
E-mail: leighbyl6@gmail.com, cqchaiyi@gmail.com, 122269587@qq.com
Keywords: capacity vehicle route problem, distributed multi-ant algorithm, cellular ants, performance potential Received: November 2, 2010
This paper proposes a Distributed Multi-ant Algorithm for capacity vehicle route problem (CVRP) where cooperation is helpful for accelerating prior solution by executing a decomposition-based separation methodology for the unsteady capacity constraints. It decreases the complex coupling network with others to solve small instances with less correlation in parallel processing. The main goal of this work is to play well on large scale CVRP with interaction between subsystems and certain state vectors. The results show that Distributed Multi-ant Algorithm plays better performance on average solution and the importance of potential action is analyzed.
Povzetek: Za problem CVRP je razvit izboljšan postopek, temelječ na algoritmih z mravljami.
1 Introduction
With decomposition, every subsystem could be described as a Markov Decision Process (MDP) model: Let M = {S,A,T,R,fv} be a five tuple model, where S = {s} is a set of states, A = {a} is a set of actions, T = {i>(-|s, a), s G S, a G A} is the next-state transition probability distribution, with />(■ | s, a) describing the probability of action a in state stos'. R(s, a, s>) is the reward function, fv is the additional reward function, ir(s) is a policy function in the state space. The discussion about applicability and feasibility is based on discount value MDP and Q-learning.
Algorithms works on CVRP are researched for years. Edge assembly (EAX) crossover with well-known local searches is employed to CVRP (Yuichi Nagata et al, 2009). Deoxyribonucleic acid (DNA) computing model and a modified Adleman-Lipton model accelerate the search on large nodes CVRP (Yeh Chung Wei, 2009). Ellipse rule approach reduces the average distance to the lower bound by about 44% (Santos Luis et al, 2009). A multi-objective evolutionary algorithm is used for CVRP (Borgulya Istvan, 2008). Particle swarm optimization (PSO) can also apply for CVRP, in two models (Ai The Jin et al, 2009). Cellular GAs has solved vehicle routing problem, minimized transportation cost and recombined a new problem (Carlos Bermudez et al, 2010) and genetic algorithm has worked at it fewer than 100 nodes (Wang Chungho et al, 2010). Normally, it is solved based on decentralized model. However, coupling among subsystems (called SCVRP in the paper) is not considered well and being trap into a local solution or no solution easily, deviating from what we expected. Multi-ant algorithm is extension of ant algorithm who plays a better performance in the best solution and used to VRP already (Yuvraj Gajpal et al, 2009). The ant isn't punished
if the strategy misleads it to suboptimal policies. And if there isn't new knowledge during state s, the reward function is still working accumulation. In this paper, decomposition of CVRP based on a distributed model is presented with iteration and cooperation between subsystems searching for their own optimization by distributed multi-ant algorithm. As cooperation, a cellular ant contacts with other ants both in its own subsystem and others through reward strategies obtained whenever the related strategy is optimal by traditional relative reinforcement learning. Reward shaping undergoes through structuring additional reward function for distributed multi-ant algorithm, making it more efficient.
As the large scale of CVRP is divided into smaller SCVRPs in a distributed system where bunches of cellular ants set to seek an optimal solution of corresponding subgraph for SCVRP. On the one hand, cellular ants in the same subsystem collaborate with each other and refresh their own knowledge. On the other hand, cellular ants in different SCVRP regenerate strategies as they meet in the crossed arcs over several disparate SCVRPs. The paper is organized as follows: Section 2 will introduce some basic knowledge of tree cutest and decomposition of CVRP for distributed system using tree cut-set. Section 3 describes how those SCVRPs figuring out the optimal solution respectively based on distributed multi-ant algorithm with reward shaping. The procedure of cooperation between subsystems is gotten. Then, the experimental results in several algorithms are contrasted using ArcView, matlab R2009a and GAMS softwares in Section 4. Finally, the discussion and conclusions is in Section 5, as well as directions for future work.
326 Informatica 35 (2011) 323-329
J. Li et al.
2 Decomposition algorithms for CVRP
For the limited capacity constraint, a large scale of CVRP is decomposed into SCVRPs with unsteady capacity constraints in the distributed system. The mathematical model of CVRP is transformed through Tree Description (Chen Yulin et al, 2002). An algorithm named Tree Cut-Set (TCS) is proposed for decomposition. The number of subsystems is determined by the carrying capability of vehicles, demands of customers and connection between customers.
A large scale of undirected graph can be transformed into complete trees and split into subgraphs where leaves of subgraphs are the compound boundaries. Then, the semantic representation of CVRP could be replaced by SCVRPs. Based on these definitions in (Y Xiang, 1996), calculation must comply for some principle as follows:
(1)	a * b = ab
(2)	(a * b)' = ab' + a'b = a + b
(3)	aAb = a'b + ab' = a + b
(4)	(aAb)' = (ab)' = 1
In ripping, two subgraphs are brought out and the summation of their semantic representation is equat to the original graph. We can call the original graph the father graph of the two subgraphs. It requires the knowledge of the interface between SCVRPs only, not the knowledge of the internal structure of them. TCS is applied to closed the structural details to separate the model. In the result, every SCVRP could obtain its own solution through an independent set of ants and the link between them is loosely coupled by employing TCS. Whatever the structure CVRP is, TCS is suitable for it, whether customers may join and leave or not.
We will explain the theory of t — sepset couple based on d - sepset (Li Yan et al, 2004).
Definition 1 In a tree, for tree-node i, there is only two parents and one child node j. For tree-node j, there is only two different children. Then, (i,j) is called as a t-sepset couple nodes.
Searching the tree created from Definition 1 from top to button by Greedy policy, t-sepset couple nodes for customers with uncertain demands are ripped. As what Bayesian Theory (Andrew Y Ng et al, 1999) says, every
separate demand is weighted by t, a random variable that
2
must satisfy some limitation: w > 0 and E m,: I.
i=l
For the limited carrying capability of vehicles 6, the total demands /„ in each subsystem must be lower than b. The purpose is that the parameter e = ta - b is gradually close to zero. If every SCVRP meets this condition, the problem would be solved perfectly. The procedure would stop till e is lower than a by constant p. Otherwise, TCS will continue.
3 Distributed Multi-ant Algorithm
3.1	Reward shaping
(Andrew Y Ng et al, 1999) presents a method that if a potential function <J>(s) exists so that R'(s7a7s') — R(s, a, s') + <J>(s') - <J>(s) for any policy tt(s), V*(s) = 7r(s) - 3>(s). Reward shaping causes the optimal policies in M would be the optimal policies in M , to exchange the original reward function R of MDP M with new reward function R of new MDP M\ In this paper, we definite the R' in M' :
R'(s, a, s') = R(s, a, s') + fv(s, a, s') (1)
Where fv(s, a, s') is a function, carrying ant colony information. The proof of this equation is in next chapter.
3.2	Cellular ant
Cellular automata (CA) is a discrete grid dynamics model both in time, space and state vector. It is utilized to imitate complex and abundant macroscopic phenomenon in single regulations of parallel evolutionary. The distributed cellular in grid net of SCVRP has finite discrete state, following the same action regulations to update its state by local rules synchronously.
As equations in reference (Moere Andrew Vande et al, 2005), the dynamic evolutionary of CAs is: F(S\ + 1) = f(s\-r\ ... ,s\,..., s\+r). S\ describes the state of cellular ant in position i at time / and the local updating rule is / : Sf+1 —>• St.
The structure of CAs contain four basic parts: cellular ants space, grid dynamics net, local rules and transfer function, discrete time set. For the speciality of SCVRP, cellular space has two dimensions with uncertainty states of cellular ant. The renewed knowledge function is composed of its information at time / and its neighbors' using the extended Moore neighbor model:
/ : s\+1 = f(s\, s?)	(2)
The key of CAs is to gain the strategies of certain neighbors' by the extended Moore neighbor model, then it could compare decision strategies referred to the last state in MDP of cellular ant with its neighbors. Reward function of distributed multi-ant algorithm is gotten as follows by
N
f = f(sl(a), E si(«)) and fv(s,a,s') = F(sl+1) :
3 = 0
2 r	N
R'(s, a, s-) = R(s, a, S') + ]T/K+», s{(a)) (3)
r=0	j=0
Proof of Equation 3: From Equation 1, we can see the additional reward function should be potential function which is proven (Moere Andrew Vande et al, 1999). Equation 2 gives the description of / function that includes local rules and transfer principles. The action of / function is renewing information among neighbors for best states of each
DISTRIBUTED MULTI-ANT ALGORITHM FOR...
Informatica 35 (2011) 323-329 327
ant at the next moment. For this system, it is a discrete simple space model. Every state is a dot in the time vector, function / could be considered as the max value in each state dot. We could say / is the gradient function in the space/time vector space. Then, F is a vector field composed by a set of gradient field, in other words, F is deemed as potential field which is also called potential function.
3.3 Distributed Multi-ant Algorithm
Dynamic learning promotes the efficiency in Back Propagation (Yu Xiaohu et al, 1995). It is expected to accelerate the learning rate so that running time could cut down and final solution could be gained sooner. The value y>t(«) describes the learning rate of ant i at time / being gradually decreased to zero in the limit of search procedure. Let fiii) > 0 be a series of constants for every ant at time /
T
and satisfy the equation: lim J2 ft(i) = oo.
T—^ooi+1
There are some destabilization in the circumstance in SCVRP, for example, the weather will delay the arriving time of transportation, the difference performance of the vehicles may also influent the efficiency, and so on. The influence of disturbance in the system can be measured by Performance Potential (Cao Xien et al, 1997). As the definition, it could set performance potential of a random state being benchmark for any other states. We choose the laststep state as the basis of performance potential for current-step state where it helps to make decision more accurately. Let X = {X t,t = 1,2...} picture the decision progress of MDP. Considering the principles of reward function, the description of performance potential in state s' (s is the last-step state) is as following:
N-1
ga.= lim {E[pN Y/R'(s,a,s')\X0 = s]
1 -Von	' *
s=0
-(N-l M
Standard ant colony algorithm is integrated with reward shaping function. New value function and strategy function are gained based on Q-learning (Bagnell I Andrew et al, 2006; Dietterich Thomas G, 2000):
cellular ants which chose city j for the next city is rk where
Mi) = i -
Proof of Equation 4 Convergence : An equation is proposed and proven in (liang Lingyan et al, 2007)
as Qt+1(s,a) <— (1 - a)Qt(s,a) + a[rk + 7 * maxQt(sa')], according to this equation, it is easy to say 7 * maxQt(s', a') is bounded. Because Rk is greater than rk, Ar-jj is bounded in each updating, in other words, the amount of trail every iteration is bounded. By reason of //,;, is finite constant, then P(i,j) is bounded. To sum up,
N
si(a)j 2 st (a)) should be bounded and that results
3=0
in bounded n(i,j). |5| and ,1 are both finite sets, then
N
f = /(st(a)> 2 si(a)) is ^e state transition in MDP
3=0
that must be finite. Therefore, -k(i,j) is bounded and finite, for every decision, there is a prior strategy ir(i,j) to leading cellular ants towards global optimal solution.
3.4 Algorithm process
For the simulation, one formula should be presented firstly. Formula 1: Transition probability /'/,■ of each cellular ant facing the near city is defined as following:

pk
ij
^tsWvUt)
j e 3
otherwise
^k(hj) = argminmax{^ 7ra(i,j) * Q*} (4)
aeA
Q* = maxa^kQ(i,j, k) - Q(i,j, s)
Q{i,3,k) = (1 - jf-) ■ Q ~ S ■ gs, + ^ - V* ilk ilk
V* = R<{i,k,j) + 1V*{s<)
Where a is the discounted factor. As the amount of cellular ants which arrived in city i is Rk, and the amount of
Where t,:i expresses the trail of arc («, j), raj describes heuristic degree and j* is the set of allowed cities for cellular ant j. The pseudo-codes are shown as following:
1: Put m cellular ants on n cities of a decomposed SCVRP stochastically.
2: Choose the next city j for each ant by probability /■ when it stands in city i;
3: Calculate values of target function (F function) for each cellular ant and list the best one;
4: Search for the best performance by evolving in neighbors' as the definition from equation 2;
5: Update R' function. If latest strategy is helpful for the program, reward value is positive and it becomes advanced step, vice versa;
6: Replace gs for g's by rewards from the latest iteration to weaken influence from disturbance variables;
7: Renew Q(i,j, k) function for ant k through their knowledge and others';
8: Make the prior strategy ix following equation 4 and choose the next city according to strategy 7r;
9: End till no optimal solution figuring out.
326 Informatica 35 (2011) 323-329
J. Li et al.
4 Computational experiments
Computational experiments have been conducted to analyze the performance of the proposed algorithm and present the results along with comparative analyse. All these algorithms have been compared with result quality. In the experiments, corresponding parameters could be: The amount of cellular ants is 31; Parameter a is trail evaporation coefficient, if it is over certain limit, the probability of revisiting the same city could be increased. If it is lower than certain limitation, it could influence the convergence; Parameter ¡3 is the heuristic information. From (Jiang Lingyan et al, 2007), the best parameters regions are 0.1 < a < 0.3 3 < /3 < 6; Combining the Q-learning, the parameters are set as following: 7 = 0.8, a = 0.2, /3 = 4, Q = 2,p= 0.7.
4.1 Case 1: computational analysis on benchmark problems
For decomposition, we establish the extended codes based on GrThoery toolbox (http : / / luimu .mathworks .coni / niatlabcentral / f ile— exchange/4266) in software Matlab R2009a. Some existing functions are utilized for simulations, such as grMinCutSet function, grMaxFlows function, grDecOrd function and grValidation function. With further verification of our algorithm, standard CVRP (http://www.branchandcut.org/) is also solved by those three algorithms. For each algorithm variant, ten independent simulations are taken per benchmark. With different iteration value, the average distances are illustrated in Table 1. The convergence of distributed multi-ant algorithm (DMA) is around 100 to 150 shown in Figure 1 as the benchmark problem E-nl01-kl4 presented by (Christofides Nicos et al, 1979). Therefore, it is quick to find out best solutions with our algorithm. To test the determinacy, we make experiments under iteration 1000. In most benchmark problems, best solutions are almost the same as the one that under around iteration 150. To limit disturbance, iteration value of our algorithm is set as 200.
Figure 1: Computational results ofE-nl01-kl4
ues of the best found feasible solutions (column Average). Compared to adaptive ant colony (AAC) and distributed ant cooperation without decomposition (DAC), DMA obtains better solutions to those 10 problems. Moreover, DMA runs less time (The unit of CPU time is second.). With incremental scale of problem, executed time increases slowly.
Table 3 illustrates the comparative results of best performances on the benchmark problem proposed by (Christofides Nicos et al,1979) according to the literature (Richard Eglese, 2009). For the experimental results of branch-and-cut algorithm and the algorithm presented by Richard Eglese, performing larger problems by exponential growing costs too much time. But the performance on DMA is stable. The fluctuation of time consume is steady growth and the quality of solution is outperformed at the large scale CVRP. By reason of decomposition, DMA plays well even on large scale problems, inducing the disadvantage of DMA that behaviors on smaller problems also take high time consume.
4.2 Case 2: computational analysis with an Arc View graph data
Based on the customers address and request, we try to set forth the location of cities by utilizing an GIS software, 'ArcView", creating geographic data and transformed to network data containing latitude vector and longitude vector. The data set of original map is loading automatically in ArcView software shown as Figure 2. We store the digital map data of autologous city in ArcView. Its information is shown through digital road map utilized in several areas and edited layer structured data of spatial objects with latitude and longitude, buildings and so on. It could be found in Figure 3 amplifying Figure 2 for details. The building with red square is the distributed center and the buildings with red triangle are part of the distributed destinations. For the readability, we scale down the latitude and longitude in the digital road map by one hundred times.
In Table 2, the figures stand for best obtained fitness values (column Best solution) and average objective val-
Figure 2: Digital road map of city
DISTRIBUTED MULTI-ANT ALGORITHM FOR...	Informatica 35 (2011) 323-329 327
Table 1: Results comparison under different iterations
Benchmark problem	Iteration: 200		Iteration: 50	
	Best solution	Time	Best solution	Time
E-n22-k4	252.610	56.3131	305.696	12.8750
E-n30-k3	393.427	79.1482	463.636	18.9688
E-n33-k4	511.208	84.1334	640.079	19.8906
E-n51-k5	416.059	114.7873	525.942	34.3906
E-n76-k7	528.711	187.1651	677.091	53.2969
E-n76-k8	536.684	189.2290	668.377	57.0469
E-n76-kl0	559.024	242.3379	672.265	59.3671
E-n76-kl4	610.793	250.3661	784.509	61.2823
E-nl01-k8	635.468	307.6808	854.114	89.3750
E-nl01-kl4	677.007	386.9579	757.438	87.7813
Table 2: Solutions comparison with the Christofides et al. instances AAC	DAC	DMA
u^iiuimaiiv puuinii	Best solution	Average	Best solution	Average	Best solution	Average
E-n22-k4	310.524	317.191	305.696	309.958	252.610	257.295
E-n30-k3	466.714	472.816	458.745	465.064	393.427	400.841
E-n33-k4	651.878	659.097	640.079	651.852	511.208	527.384
E-n51-k5	520.126	523.726	514.174	518.546	416.059	419.572
E-n76-k7	677.091	680.483	672.844	673.267	528.711	535.547
E-n76-k8	686.901	688.339	668.377	676.189	536.684	539.960
E-n76-kl0	679.881	683.925	672.265	675.061	559.024	560.275
E-n76-kl4	689.764	690.598	672.265	675.696	610.793	613.670
E-nl0Lk8	816.362	819.362	732.735	751.482	635.468	637.117
E-nl01-kl4	927.577	934.523	883.894	898.789	677.007	678.563

—-—--—=T»-crari
v *
*
- ,-r • v-.	■:■■■
Figure 3: Details of digital road map
4.2.1 Results comparison
The number of places in the city is 500. In other words, the scale of CVRP is 500. Performance potential parameter S is 1. The simulations have undergone under three algorithms: AAC, DAC and DMA. Time consume time ordered by AAC,DAC and DMA is 1.7514e+003 s | 202.8906 s | 179.5156 s. Shortest distance by the same order is 1.4263e+004 | 1.4164e+003 | 1.3541e+003. From these data, DMA gets prior performance on best solution and costs less running time. With the amount of places increasing, distributed multi-ant cooperation with decomposition will play more excellent. These graphs also display that DAC is easy to trap into local solution, even though DMA takes more iteration.
4.2.2 Performance potential analysis
With the scale of problem increasing, decentralized algorithm becomes weaker than distributed one according to its interrelate variables restrained with each others. Interaction and cooperation are critical characters of distributed model where each ant communicates through reinforcement learning and renovates its next strategy even under additional places. The experiments run from two points. In Table 4, performance potential parameter <5 is 1. The scales of CVRP are 50,100,200,300,500 and 1000. Three algorithms are executed: adaptive ant colony (AAC), distributed ant cooperation without decomposition (DAC) and DMA. The results are in table 4. Best solutions in DAC and AAC are trapping into local ones. While the scale is small, DMA costs more time than DAC. Because complex structure process in DMA needs additional operation. However, DMA shows disadvantages in a large scale one. It takes less running time and gains better solutions than DAC and AAC.
As the scale of problem is 1000, simulation runs with different S value. Noise misleads to trivial solutions. The efficiency of per for mance potential who is utilized to reduce noisy impact on the system is determined by parameter S value. If S is too small to zero, the influence from performance potential can be ignored. From Table 5, because of noisy disturbance, smaller 6 costs much iteration and more time for best solutions. The difference between best solution and average solution is in inverse proportion to S value. Consequently, performance potential
326 Informatica 35 (2011) 323-329	J. Li et al.
Table 3: Best solution and time consume comparison with the Christofides et al. instances
Benchmark problem	Branch-and-cut algorithm		Richard Eglese's	algorithm	DMA	
	Best solution	Time	Best solution	Time	Best solution	Time
E-n22-k4	375.28	0.2	252.614	0.08	252.610	56.3131
E-n30-k3	535.797	2	393.512	1	393.427	79.1482
E-n33-k4	837.672	2	511.263	0.6	511.208	84.1334
E-n51-k5	524.611	11	416.063	16	416.059	114.7873
E-n76-k7	682.563	3600	530.022	1103	528.711	187.1651
E-n76-k8	733.686	3600	537.239	636	536.684	189.2290
E-n76-kl0	828.655	3600	559.233	3600	559.024	242.3379
E-n76-kl4	989.257	3600	614.442	3600	610.793	250.3661
E-nl01-k8	820.552	3600	639.744	2379	635.468	307.6808
E-nl01-kl4	1049.534	3600	699.985	3600	677.007	386.9579
Table 4: Experimental results comparison of DAC and DMA(<5=1)
amount of places	Adaptive ant colony		Decentralized algorithm		Distributed multi-ant algorithm	
	Best solution	Cost time	Best solution	Cost time	Best solution	Cost time
Amount=50	58.76	3.4875	58.69	3.8653	56.86	4.1250
Amount=100	157.4581	25.1541	111.86	20.6354	109.22	24.7343
Amount=200	451.5714	74.1572	296.26	57.1351	235.72	50.5688
Amount=300	2874.1674	558.1547	882.29	198.7326	719.18	103.2497
Amount=500	1.4263e+004	1.7514e+004	2967.53	289.5187	1354.11	179.5156
Amount=1000	4.8417e+005	7.1541e+005	9233.76	3.5763e+003	4617.43	815.9327
plays a crucial role in distributed multi-ant algorithm.
5 Conclusion
Decomposition is usually used in decentralized model to scale down the problem into subsystems we can handle with. However, the relationship between subsystems is ignored easily, leading to local solution or non-solution. In this analysis, decomposition is undergoing through hybrid algorithms for large scale of CVRP in a distributed model. Cooperation and interaction are considered and solved by distributed multi-ant algorithm. Disturbance from circumstance is conquered by Potential function whose efficiency is verified from simulations. From the experiments, the algorithm has solved the large scale CVRP better and efficiently. Furthermore, the next work is further control of fluctuation on solutions.
References
[1]	Yuichi Nagata, Olli Braysy (2009) Edge Assembly based Memetic Algorithm for the Capacitated Vehicle Routing Problem, Networks, 54(4) pp. 205-215.
[2]	Yeh Chung Wei (2009) Solving Capacitated Vehicle Routing Problem using DNA-based Computation,
Proceedings International Conference On Information Management and Engineering, pp. 170-174.
[3]	Santos Luis, Coutinho Rodrigues Joao, Current John R (2009) An Improved Heuristic for the Capacitated Arc Routing Problem, Computers and Operations Research, 36(9) pp. 2632-2637.
[4]	Borgulya Istvan (2008) An Algorithm for the Capacitated Vehicle Routing Problem with Route Balancing,
Central European Journal of Operations Research, 16(4) pp. 331-343.
[5]	Ai The Jin, Kachitvichyanukul Voratas (2009) Particle Swarm Optimization and Two Solution Representations for Solving the Capacitated Vehicle Routing Problem, Computers and Industrial Engineering, 56(1) pp. 380-387.
[6]	Carlos Bermudez, Patricia Graglia, Natalia Stark, Carolina Salto, Hugo Alfonso (2010) Comparison of Recombination Operators in Panmictic and Cellular GAs to Solve a Vehicle Routing Problem, Inteligencia Artificial, 14(46) pp. 34-44.
[7]	Wang Chungho, Lu Jiuzhang (2010) An Effective Evolutionary Algorithm for the Practical Capacitated Vehicle Routing Problems, J Intell Manuf, 21(4) pp. 363-375.
[8]	Gajpal Yuvraj, Abad PL (2009) Multi-ant Colony System(MACS) for a Vehicle Routing Problem with Backhauls, European Journal of Operational Research, 196(1) pp. 102-117.
[9]	Chen Yulin, Liu Jiancheng (2002) An Tree Generation Algorithm of Undirected Graphs, The Application of Computer Engineer, 38(20) pp. 115-117.
[10] Y Xiang (1996) Distributed Structure Verification in Multiply Sectioned Bayesian Networks, Florida Artificial Intelligence Research Symposium, pp. 295-299.
DISTRIBUTED MULTI-ANT ALGORITHM FOR...	Informatica 35 (2011) 323-329 327
Table 5: Experimental results under different S value (places=1000)
Parameter value		Distributed ant algorithm		
	Best solution	Average solution	Cost time	Iteration
¿=0	4822.31	6728.34	3.2644e+004	545
¿=0.1	4792.93	6577.19	2.9883e+004	357
¿=0.2	4881.27	6221.69	2.3458e+004	288
¿=0.5	4632.36	5994.62	1.2336e+004	214
¿=0.8	4723.56	5938.43	899.3654	194
¿=1	4617.43	5769.27	815.9327	106
[11]	Li Yan, Yin Zongmou (2004) Techniques by Compound Branch and Network Ripping to Find out All Spanning Trees of an Undirected Graph, Journal of Naval University of Engineering, 16(5) pp. 74-77.
[12]	Andrew Y Ng, Daishi Harada, Stuart Russell (1999) Policy Invariance under Reward Transformations: Theory and Application to Reward Shaping, ICML 1999.
[13]	Moere Andrew Vande, Clayden Justin James (2005) Cellular Ants: Combining Ant-based Clustering with Cellular Automata, International Conference on Tools with Artificial Intelligence (ICTAI), pp. 177184.
[14]	Yu Xiaohu, Chen Guoan, Cheng Shixin (1995) Dynamic Learning Rate Optimization of the Backpropa-gation Algorithm, IEEE Transactions on Neural Networks, 6(3) pp. 669-677.
[15]	Cao Xien, Chen Hanfu (1997) Perturbation Realization, Potentials and Sensitivity Analysis of Markov Processes, IEEE Transactions of Automatic Control, 42(10) pp. 1382-1393.
[16]	Bagnell J Andrew, Ng Andrew (2006) On Local Rewards and Scaling Distributed Reinforcement Learning, Neural Information Processing Systems.
[17]	Dietterich Thomas G (2000) Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition, JAIR.
[18]	Jiang Lingyan, Zhang Jun, Zhong Shuhong (2007) Analysis of Parameters in Ant Colony System, Computer Engineering and Applications, Beijing, China, 40(20) pp. 31-36.
[19]	Richard Eglese (2009) The Open Vehicle Routing Problem and the Disrupted Vehicle Routing Problem: a Tale of two Problems, http:// www-eio. upc. es/seminar/09/r_eglese.pdf.
[20]	Christofides Nicos, Mingozzi Aristide, Toth Paolo (1979) The Vehicle Routing Problem-Combinatorial Optimization, Wiley, Chichester, pp. 315-338.