Iterative learning control for impulsive multi-agent systems with varying trial lengths

. In this paper, we introduce iterative learning control (ILC) schemes with varying trial lengths (VTL) to control impulsive multi-agent systems (I-MAS). We use domain alignment operator to characterize each tracking error to ensure that the error can completely update the control function during each iteration. Then we analyze the system’s uniform convergence to the target leader. Further, we use two local average operators to optimize the control function such that it can make full use of the iteration error. Finally, numerical examples are provided to verify the theoretical results.


Introduction
With the development of swarm intelligence algorithms, multi-agent systems are widely used in communication networks, wireless sensor networks, and unmanned vehicles. The consensus problem is a basic problem of MAS because it has a wide range of applications in formation control, distributed estimation, and congestion control. This is essentially the agent's consensus tracking of a given target trajectory through the network. The multi-agent system is a system abstracted from the biological world, and the biological population may suddenly change state at certain moments. Due to predation, disease, and bird migration, changes in population status can occur. For this situation, MAS with impulsive can well describe the inevitable interference during actual system operation. The problem of consensus tracking of researching impulsive MAS is to study whether an agent can return to a predetermined trajectory through information exchange after being subject to external interference. In this regard, Cui [6] has carried out the relevant research. Zhang et al. [18,30,32] considered the consensus problem of impulsive MAS in the traditional consensus framework. In addition, impulsive control approach has advantage in simplicity and flexibility for such kind of systems because the standard continuous state information is not required. As a consequence, this approach has been offered to study adaptive consensus and synchronization problems [22,23] and consensus problem [5,26] for MASs.
Iterative learning control (ILC) is suitable for robots to perform trajectory tracking tasks within a limited time interval. ILC uses the error information of the previous or multiple tracking batch measurements to correct the next control input, which can improve tracking accuracy along the iteration axis. ILC was first proposed in [2] for a robot, whereas Ahn and Chen [1] applied ILC to the consensus tracking trajectory of a MAS. Recently, ILC laws have been extensively studied for various types of MASs [19]. Note that MASs with impulse can generate discontinuous inputs, thus it is still challenging to consider whether ILC can be successfully applied to collect the sampled error data from each agent and track continuous or discontinuous trajectory, i.e., achieving leaderfollowing consensus for nonlinear dynamics of MAS with impulse [4]. In addition, [7,8] used Lyapunov stability theory to analyze the coordination performance of MAS.
Under normal circumstances, ILC requires the same length of time for each iteration cycle [12,24]. However, in some practical applications, due to the inherent properties of the system or the needs of the operator, the operation may be terminated early, that is, the trial lengths of the iteration will be less than the complete trial lengths. People began to consider ILC with varying trial lengths (VTL). Li et al. [10,29] considered continuous-time nonlinear systems and discrete-time linear systems, and designed an averaging operator based on the above method to construct an ILC scheme. Subsequently, Li [9] proposed two improved schemes to control discrete real linear systems, and in literature [11], the ILC problem of nonlinear dynamic systems was considered. Shen et al. [13,25,31] studied the ILC of VTL by using a composite energy function. Liu et al. [14] used the two-dimensional Kalman filter technology to study the ILC of VTL.
Fractional-order calculus was first proposed in the letter of Leibniz and l'Hôpital, and it has a history of more than 300 years. In recent years, the viscoelasticity and memory https://www.journals.vu.lt/nonlinear-analysis effects of fractional calculus have been widely concerned in the field of engineering applications and become an important research tool in numerical calculations. In general, fractional ILC has the following advantages: • Fractional iterative learning law covers PID learning law.
• Fractional iterative learning laws has a weighted function (singular kernel) and has an additional parameter to adjust the learning procedure. • Fractional iterative learning laws has a memory function and keeps the global information fully, which can be used to improve the learning effect. Recently, Luo et al. [20,21] studied the fractional ILC problem of fractional multi-agent systems. Liu et al. [15][16][17] studied the ILC of VTL for fractional impulsive systems. However, there are very few works on ILC of impulsive multi-agent systems with varying trial lengths even for the classical learning laws or fractional learning laws. Note that impulsive effects often appear in the control of multi-agent systems, and the memory communication always happen in each agent. In order to achieve the consensus of multiagent systems with impulse and past communication, one can try to adopt fractional ILC approach to deal with this problem. Here fractional ILC will be used to deal with the past communication in each agent.
Based on the above discussion, this paper introduces new error processing methods and designs a variety of learning laws to consider the consensus tracking of the target trajectory by the impulsive multi-agent system. The specific work is as follows: • For impulsive multi-agent systems with VTL, we first use zeros to replace nonexistent errors and then consider the system's consensus tracking of the target trajectory under the D α D-type learning law. • The domain alignment operator is introduced to deal with errors, and the consensus of the system under the I β D-type learning law is considered. • Based on the above, the local average operator is used to improve the control function, and the control convergence of the D α D and I β D learning laws to impulse multi-agents is considered, respectively. Compared with the previous work, this paper uses the memory effect of fractional-order calculus to adjust the input of the system and combines the domain alignment operator to design an appropriate learning law to control the multi-agent system in the VTL case. Combining the two methods, the iterative accuracy and speed of the system are higher. Fractional-order learning law is more complex than integer-order learning law, and it also needs to consider the uncertainty caused by varying trial lengths, which makes it more difficult to construct the learning law and analyze the convergence of the system. The rest of the paper is organized as follows: Section 2 provides the problem formulation and preliminaries. Section 3 provides the main results of this paper. An illustrative example is presented in Section 4.

Preliminaries and problem formulation
We consider a weighted directed graph composed of the set of vertices V = {1, 2, 3, . . . , N }, N represents the number of agents in the system, the set of edges E ⊆ V × V , and the adjacency matrix Z. Set Q = (V, E, Z). V represents the set of multi-agents. Set of edge E is composed of directed sequence pairs (i, j), where (i, j) means that agent i can pass information to agent j, that is, i is called the parent node of j, and j is called the child node of i. All the sets adjacency with the i agent are called the adjacency sets of the i agent denoted as M i = {j ∈ V | (j, i) ∈ E}. Z = (z i,j ) N is the weighted adjacency matrix of Q, which is composed of nonnegative elements z i,j . In particular, z i,i = 0; if (j, i) ∈ E, z i,j = 1, it is means that agent i can receive information from agent j; if (i, j) / ∈ E, z i,j = 0, it is means that agent i cannot receive information from agent j. The Laplace operator of Q is defined as: In order to describe the communication relationship between virtual leader and follower, let d i = 1 denote that the i agent can receive the leader's information directly; otherwise, let d i = 0 and D = diag(d 1 , d 2 , . . . , d N ).
In this paper, a is used to represent the 2-norm of vector a, and A is used to represent the matrix norm compatible with it. The λ-norm of the function v is expressed The symbol ⊗ denotes the Kronecker product.
Consider a system with N agents, each agent with T impulsive points. Q = (V, E, Z) represents their interaction topology. The ith agent is controlled by the following nonlinear impulsive systems:Ẋ for all i ∈ V and τ ∈ [0, G], where t = 1, 2, . . . , T . This system is right-continuous, where X i ∈ R n is the state vector of the ith agent, u i ∈ R p is the control function of the ith agent, B is R n×p matrix, y i ∈ R m is the output vector of the ith agent, (·, ·) : [0, G] × R n → R n and M t : R n → R n are continuous, C(τ ) is a continuous R m×n matrix function. Impulsive time sequence is denoted by 0 < τ 1 < τ 2 < · · · < τ T < G. X (τ + t ) = lim h→0 + X (τ t + h) and X (τ − t ) = X (τ t ) represent the right and left limits of X (τ ) at τ = τ t , respectively.
Under assumptions (H1) and (H2), following [28, Remark 4.1], system (1) with X (0) = X 0 has a unique solution in a piecewise continuous functions space is not necessarily continuous on the whole time interval [0, G]. We regard the desired trajectory y d (τ ) as the virtual leader in the communication topology and mark it with vertex 0. Then the information exchange among agents can be represented by an extended communication topology graph where E * represents the edge set, and A * represents the weighted adjacency matrix. The control objective is to design appropriate iterative learning laws such that the output of all agents can asymptotically converge to the desired trajectory y d (τ ).
In order to describe the phenomenon of varying trial lengths, we introduce a random variable K i . K i represents the end time of the ith iteration. K i satisfies Here p(K i ) represents the probability density function of the random variable K i , ν ∈ (0, 1), G is the maximum running time of the system. In particular, when K i = 0, that is, all the error data of this iteration is lost, so we can think that the current trial is not running. For the VTL system, we can deal with errors in the following ways (see the work of Li et al [10]): where ψ i (τ ) represents the tracking error of the ith iteration.
. ζψ i satisfies the following: In this paper, we use C 0 D α τ ψ(τ ) to represent the Caputo fractional-order left derivative of the function ψ(τ ), RL τ D α a C(τ ) to represent the Riemann-Liouville fractional-order right derivative of the function ψ(τ ) and RL τ I 1−α a C(τ ) to represent the Riemann-Liouville fractional-order right integral of the function ψ(τ ). In this paper, 0 < α < 1. The following is fractional-order integration by parts formulas.
) Let for G 0, the following inequality holds: , a is nondecreasing, and θ f , θ t > 0. Then, for G 0, the following inequality is valid:

Main results
We use the symbol σ i,j (τ ) to represent all the information received by the jth agent in the ith iteration. Then it can be expressed as the sum of the information transmitted from other agents to the jth agent and the possible information transmitted from the leader to the jth agent The jth agent can get information directly from the desired trajectory. That is, if (0, j) ∈ E * , then d j = 1; otherwise, d j = 0. Here the first subscript of σ and y indicates the number of iterations, and the second subscript indicates the sequence number of the agent. The subscripts of z and d are explained in Section 2. The derivative of the σ i,j (τ ) function is defined as follows: lim ∆τ →0 − σi,j (τ0+∆τ )−σi,j (τ0) ∆τ , τ 0 = τ t . https://www.journals.vu.lt/nonlinear-analysis Here τ t is defined in formula (1). In order to make the intelligent body track the target trajectory, the following D α D-type learning laws are employed: where P (τ ), C(τ ) are R p×p matrix function and are differentiable during the interval [0, G]. The initial state learning rule is as follows: Set ψ i,j (τ ) as the tracking error of the agent; that is, ψ i,j (τ ) = y d (τ ) − y i,j (τ ). The learning law (8) can be written as We set all involved quantities of all agents of arbitrary iteration into vector form as follows: where (·) T is the transpose of (·). Then (9) and (10) can be written as follows: To study the multi-agent consensus problem with impulsive points, (H1), (H2), and the following assumptions are necessary in this paper. Assumption 1. The desired trajectory y d is trackable; that is, there exists a desired input u d such that y d = CX d .
Assumption 2. The length of the system's first run time is complete.
In order to make the proof process more concise, we introduce the following symbols to replace the norm of some variables that frequently appear in the proof process, let θ 0 = max(θ t ) and BP (s) ,

D α D-type learning law
Considering the multi-agent system (1), under the condition of varying trial lengths, analyze the convergence of the correction error (6), similar to (8)-(11), we give the following D α D-type learning law: Theorem 1. For the multi-agent system (1) with Assumption 1, let (H1), (H2) hold, and the D α D-type learning law (12) is applied. Consider the varying trial lengths as the iteration number approaches infinity. The corrected tracking error ψ * i (τ ) converges to zero, i.e., lim i→∞ y i, represents the compression coefficient in the iterative process, and θ t is the Lipschitz constant in (3).
Proof. We are divided into the following three cases for discussion.
The tracking error of the jth agent in the (i + 1)th iteration is From (4) and (12) it can be known that where F (X i , s) = ( (X i,1 , s) T , (X i,2 , s) T , . . . , (X i,N , s) T ) T .

I β D-type learning law
Considering the multi-agent system (1) and the correction error (7), similar to (8)-(11), we give the following I β D-type learning law: Proof. Since the I β D-type learning law uses the domain alignment operator (formula (7), ζ(·)) to correct ψ i (τ ), Assumption 2 is needed.

D α D-type learning law with local average operator 1
Li et al. [11,Eq. (11)] introduced the local average operator (LAO) This operator effectively utilizes the information from the most recent m * experiments, where m * is any known positive integer.
Considering system (1) and the correction error (6), similar to (8)-(11), we give the following D α D-type learning law with local average operator: Due to the lack of iterative information, only using the tracking error of the last iteration to adjust the input will slow down the convergence speed. The local average operator can make full use of the tracking error of multiple iterations to adjust the input so that the convergence speed is faster.
Theorem 3. For system (1) with Assumption 1, let (H1) and (H2) hold, and the D α Dtype learning law (30) is applied with local average operator. Consider the varying trial lengths as the iteration number approaches infinity. The corrected tracking error ψ * i (τ ) converges to zero, i.e., lim i→∞ y i, Proof. When i m * , the proof of the theorem is similar to Theorem 1.
When i > m * , we only need to analyze a few key steps, and the rest of the proof is similar to Theorem 1.

I β D-type learning law with local average operator 2
To characterize local average operator 2, we set (·) : is the set of serial numbers of all trials with full trial length before the ith iteration, num( (i)) is the number of elements in (i).
https://www.journals.vu.lt/nonlinear-analysis Considering system (1) and the correction error (6), similar to (8)-(11), we give the following I β D-type learning law with local average operator: where m * is any known positive integer. The design idea of this learning law is similar to the learning law (formula (30)), but the set (·) is introduced here, which makes the learning law only use the complete iterative error and discard the incomplete iterative error. The advantage of this method is to further accelerate the convergence speed, but it needs several complete iterations as the basis. Proof. When num( (i)) m * , the proof of the theorem is similar to Theorem 2. When num( (i)) > m * , we only need to analyze a few key steps, and the rest of the proof is similar to Theorem 2.
The remaining parameters of learning laws are P = C = 2.     Theorems 1-4. Therefore, the multi-agent system can uniformly track the target trajectory under the given learning control. Figures 2 and 3 show that the error between the output value and the target trajectory gradually converges to 0. Figures 4-7 shows the iterative learning process of the second state output trajectory with D α D-type and I β D-type learning law with LAO.
When the iteration reaches 60, the consensus errors of four learning laws are shown in Table 1. It should be noted that the error correction method of I β D without LAO is different from that of I β D with LAO (see (6) and (7)).

Conclusion
We introduce four ILC schemes for I-MAS with VTL via the domain alignment operator to correct the tracking error. In particular, the idea of local average operators to optimize the control function is applied to optimize the control function. Convergence results for I-MAS are shown and a numerically example is illustrated. In the future, on the one hand, we will consider the case of non fixed time impulse and non instantaneous impulse, and the actual model is often subject to the uncertainty of impulse interference, including the uncertainty of interference time point and the uncertainty of interference duration; on the other hand, we will consider the case of different running batch length between different agents, so we need to design the appropriate topological relationship to solve the problem Ensure the integrity of the iterative process.