A hybrid of Bayesian-based global search with Hooke–Jeeves local reﬁnement for multi-objective optimization problems

,


Introduction
The Bayesian approach is one of the most active in the development of methods for non-convex black-box optimization [1,4,27]. Despite active research, some challenges remain unresolved; see, e.g., the review in [27]. A serious challenge in widening the application area of Bayesian algorithms is their inner complexity. One way to reduce the complexity could be by the partition-based implementation which was successful in Lipschitz optimization [12,17,[19][20][21]32]. Indeed, the inner complexity of partitionbased Bayesian algorithms for single and multi-objective optimization was proved to be lower than that of the standardly implemented algorithms [25,29,31]. Nevertheless, the complexity of these algorithms still limits the number of evaluations of the objective functions which would be appropriate for solving problems of higher dimensionality and problems with a larger number of objectives. In the present paper a multi-objective optimization algorithm implementing heuristically the ideas related to the Bayesian approach without using a formal model of uncertainty about the objective functions is presented. A version of the bi-objective selection of a site for the current computation of the objective functions is applied following the idea proposed in [24]. Two heuristic criteria are used for the selection of a computation point: an estimate of its distance to the Pareto front and the uncertainty of the estimate. These criteria mimic the stochastic model-based assessments used in [24] which assess the exploitation and exploration character of the search strategy. The balance between exploration and exploitation can be varied rather broadly by accepting an appropriate compromise between the expectation and uncertainty criteria. However, the local refinement of the approximate solutions is frequently more efficient using a local search method than by a locally biased global search method [18,22,26]. The hybridization of the single-objective Bayesian global search with local optimization methods is shown to be efficient in [25,30]. In the present paper a local search algorithm for the refinement of the approximation of the Pareto front found by the global search algorithm was applied. A version of the Hooke-Jeeves algorithm is adapted to multi-objective optimization. The proposed hybrid algorithm is tested under conditions previously applied to test other Bayesian algorithms [28,31] so that performance could be compared. Other experiments were performed to assess the efficiency of the proposed algorithm under conditions where the previous versions of Bayesian algorithms were not appropriate. The results were compared with the results of the popular evolutionary optimization algorithms NSGA2 and NSGA3 [3,7,8] using test problems up to 15 objective functions with up to 30 variables.

The proposed hybrid algorithm
Black-box multi-objective minimization problems: are considered assuming that the feasible region (decision space) is a unit hypercube A = [0, 1] d to which any hyperrectangular region can be rescaled. Ranges of the objective functions are equal and rescaling is provided by the algorithm. The algorithm consists of two alternating counterparts carried out multiple times: random global search and local refinement of the Pareto front approximation found by the global search algorithm. The Bayesian global optimization strategy would be preferable because of the rational balancing of exploration and exploitation. However, the number of iterations (computations of the objective functions) of the Bayesian algorithms is considerably limited due to their inner computational complexity. The number of iterations is limited to several hundred for the standardly implemented Bayesian algorithms, and several thousand for the Bayesian partition-based algorithms [27]. The idea is to mimic the search strategy of the Bayesian algorithm without using a stochastic model of the objective functions and thus avoiding the basic computational burden. A randomized algorithm is proposed where the criteria for selecting a point for computing the objective functions are similar but much simpler than the Bayesian approach-based criteria [24]. https://www.journals.vu.lt/nonlinear-analysis The reduced computational burden allows a considerably larger number of iterations; see the section on testing results. The global search phase interchanges with the local refinement phase. The approximation of the Pareto front is refined by a version of the Hooke-Jeeves algorithm adapted to multi-objective optimization. The process of alternating phases of global and local searches is terminated by a user-defined termination condition.
The considered algorithm is initialized by the predefined number of computations of the objective functions at the random points uniformly distributed in the feasible region A.

Global search
The proposed algorithm consists of two alternating counterparts carried out multiple times: global search and local refinement. Global search runs after the initialization or after previous global-local stages and thereafter is altered with local search. Let us consider the current start of global search; the set of points where the objective functions are already computed is denoted by and the corresponding set of function values is denoted by where non-dominated points of U constitute current Pareto optimal solutions set in objective space P ⊆ U and corresponding Pareto optimal solutions set in decision space P A ⊆ U A . This kind of global search with the uniform distribution of function evaluation points is a worst-case optimal algorithm in case function evaluation budget is predefined and optimized multi-objective functions are Lipschitz functions [23]. The main disadvantage of this method is that when a decision space dimensionality increases the function evaluation count increases exponentially. However, the worst-case is very specific: the Pareto optimal set of optimized multi-objective function have a single point, and the solutions are represented by a single point [23]. Most optimization problems in engineering and science are not worst-case so some strategies improving the method's performance by imitating Bayesian search can be applied [2,14,29]. Bayesian approachbased optimization algorithms search new location in decision space for a new function evaluation based on trade-off values of statistically assessed function's mean value and variance [16,28]. The location of decision space with near-optimal function's mean value has a chance to be a new optimal point even in the case of little improvement of the function's evaluation value. Otherwise, a location with a high function's variance value has a chance of being a new optimal point in the case a vast improvement of the function's evaluation value. However, the computational burden of searching location in decision space for next function evaluation is limiting a Bayesian multi-objective optimization algorithm's application area. Coping with such computational burden the implementation based on the rectangular partition of the feasible region is proposed in [29,31]. The vertices count of hyperrectangle doubles with every new dimension. The algorithm is an iterative hyperrectangle subdivision by bisection. The hyperrectangle is bisected by a hyperplane orthogonal to its longest edges. The function evaluations are computed at intersection points and the evaluation count is only half of the hyperrectangle vertice count. This way rectangular partition-based Bayesian algorithm is not competitive for optimization problems having high dimensional decision space.
Strategies to mimic Bayesian search should combine simplicity and numerical substantiation. Combining both mentioned properties the exploration-exploitation strategy can be used, i.e., let us say new random points U new A = {x i ∈ A | i = 1, . . . , q · N } (q is parameter value) are uniformly generated in feasible region A and for every point bi-objective selection functions are calculated: Euclidean distance to the closest known point x min ∈ U A , and θ 2 (x i ) is the closest known point's x min ∈ U A function evaluation vector's F (x min ) ∈ U Euclidean distance to the closest current Pareto optimal solution F min ∈ P . In case when the closest known point's function evaluation vector F (x min ) ∈ U is non-dominated Pareto optimal solution F (x min ) ∈ P ⊆ U , the function θ 2 (x i ) has zero value. Please note that θ 2 (x i ) value is calculated using min-max normalized function . . , f * m ) T and a nadir F nad = (f nad 1 , . . . , f nad m ) T point values are taken as minimum and maximum function values [3]. Normalized function values can be expressed by the following formula: Function θ 1 (x i ) value can be viewed as the radius of hypersphere with center point x i ∈ U new A and with no function evaluations inside of it. In case when function θ 1 (x i ) has a large value, a large unexplored hypersphere is found, and new function evaluation in its center (point x i ) is reasonable as this corresponds to the exploration strategy. On the other hand, the second function θ 2 (x i ) value can be viewed as the minimal distance of optimized function's F (x) nearest-neighbor interpolation value at point x i to current Pareto optimal solutions. In the case of a small second objective function θ 2 (x i ) value, the generated point's x i ∈ U new A closest known point x min ∈ U A has near-optimal function value F (x min ) ∈ U , so the new function evaluation at point x i is reasonable as this corresponds to the exploitation strategy. The exploration-exploitation strategy is achieved by making new function evaluations at points x i ∈ U new A having minimal trade-off Pareto optimal values of selection functions (−θ 1 (x i ), θ 2 (x i )) T . Function θ 1 (x i ) has a negative sign because it is maximized. After new function evaluations are made, the sets U , U A , P , P A are updated and reused in next phases or next iterations of the algorithm. Reuse of search information is shown to be efficient for multi-objective optimization problems [11].
To induce function evaluations clustering near Pareto optimal solutions, the global search has two random points generation modes: global uniform generation in all feasible https://www.journals.vu.lt/nonlinear-analysis region A and local uniform generation near current Pareto optimal solutions, i.e., for all Pareto optimal solutions x P ∈ P A , a series of small hypercubes A 1 , . . . , A n having center point x P is created, and the above-described procedure is applied to a particular subregion of feasible region A n ⊂ · · · ⊂ A 1 ⊂ A. Hypercube A i has a double-sized edge compared to next A i+1 hypercube, and the shrinking of hypercubes stops when there are no known point solutions in A i+1 except center point x P ∈ P A .

Local refinement
The proposed exploration-exploitation global search strategy lacks effective local refinement since randomly generated points lack improvement direction. On the contrary, the Hooke-Jeeves single-objective optimization algorithm searches function improvement direction by taking a step in all decision variables [13]. A multi-objective optimization problem can be reduced to a single-objective optimization problem by scalarization methods such as Chebyshev scalarization [15]. The Chebyshev scalarization requires a weight vector to get a single optimal trade-off solution. To get an approximation of Pareto optimal front, a multiple uniformly generated weight vectors should be used. Although weight vectors are uniformly generated, the resulting Pareto front approximation may be non-uniformly distributed, and human expert intervention may be needed to generate additional weight vectors to get uniformly distributed Pareto front approximation [2,14]. To avoid complicated weight vectors selection, a weight-free approach to multi-objective optimization problems using a modified version of Hooke-Jeeves direct search is presented [5]. In this paper a novel approach for conversion of a multi-objective optimization problem to a single-objective optimization problem without the use of the weight vectors is suggested.
The multi-objective function is converted to a single-objective surrogate function f s (x), x ∈ A, which has a current solution: a decision vector x cur and multi-objective function evaluation vector F (x cur ). Initially, the surrogate function has predefined value For every current Pareto optimal solution x P ∈ P A , a separate surrogate function having Pareto optimal solution as current solution x cur = x P is defined. Every defined surrogate function is optimized using the Hooke-Jeeves optimization algorithm taking x P as the start point: In the case of negative value of surrogate function at minimal point f s (x min ) < 0, x min = arg min x∈A f s (x), the value of original multi-objective function at the newly found solution F (x min ) dominates multi-objective function's value at Hooke-Jeeves optimization start point F (x P ), i.e., the newly found solution's value F (x min ) is located closer to the true Pareto front compared with the initial function value F (x P ). Function evaluation F (x min ) at solution found by surrogate function minimization x min = arg min x∈A f s (x) is in the area bounded by true Pareto front and hyperrectangle defined by points: F (x P ) and ideal point of minimal function values F * = (f * 1 , . . . , f * m ) T . An example of the bi-objective case is shown in Fig. 1.

Implementation and pseudo-code
The basic concepts of the proposed optimization algorithm are presented in previous subsections, but a more detailed explanation of the whole picture is still needed. Therefore the pseudo-code of the proposed optimization algorithm is given in Algorithm 1. List of the notations used: 6. P A -points in a feasible region A = [0, 1] d of current non-dominated Pareto optimal solutions. 7. P -function evaluations of current non-dominated Pareto optimal solutions. 8. N max -maximum allowed number of function F (·) evaluations. 9. I max -maximum iteration number of global search with local refinement. 10. N -number of initial random solutions. 11. q · N -number of randomly generated candidate points in decision space for new function evaluations. 12. p -part p ∈ [0, 1] of function evaluations get by local generation near current Pareto optimal solutions compared to function evaluations get by global generation in the entire feasible region A. 13. h 0 and h n -step size parameters of the Hooke-Jeeves optimization algorithm.
The full set of step sizes from largest to smallest are calculated using the following expression: where the largest step size is 0.8 · 2 −h0 and the smallest step size is 0.8 · 2 −hn . 14. Update -a Boolean parameter value if it is set to true, the step size parameters h 0 and h n are updated at the second and following iterations of global search with local refinement. For every Pareto optimal solution x P ∈ P A the value h 0 is updated so that the largest step size would be approximately equal to minimal Euclidean distance to another Pareto optimal solution: 0 The value h n is also updated so that the smallest step size would be smaller than the largest: h n = max{h 0 + 2, h n }.
Some remarks explaining the pseudo-code of the proposed optimization algorithm will be given. At the first algorithm's iteration (I = 1), the local refinement phase optimizes not only the surrogate function but also every single-objective function of the optimized multi-objective function f i (·) ∈ F (·) to get limit solutions of Pareto front approximation. Solutions that have been found using the surrogate function or single criteria function optimization are not used to define and optimize the new surrogate function anymore because the surrogate function value at this point is near-optimal already. Instead, a non-dominated Pareto optimal solutions found by the global search phase are used to define and optimize new surrogate functions this way inducing a search in the unexplored area to find new Pareto optimal solutions. The step size parameters h 0 and h n are updated (value of parameter Update must be set to true) so that the Hooke-Jeeves optimization algorithm's largest step size would be approximately equal to minimal Euclidean distance to another Pareto optimal solution. In the case of a small distance to another Pareto optimal solution, only small adjustments are needed so small step sizes should be used. In the case of a large distance to another Pareto optimal solution, the new distantly located solution is found, therefore more excessive local search is needed during surrogate function optimization so that initially large step sizes should be used. Please note that the global search phase can be reduced to simple uniform distribution of function evaluation points in case when the following parameters are selected: q = 1/N , p = 0. Algorithm 1. The pseudo-code of the proposed optimization algorithm.
1: procedure OPTIMIZE(F) 2: U ← function evaluation F (x) of randomly generated points x ∈ U A , 4: P A ← points in feasible region A = [0, 1] d of current non-dominated Pareto optimal solutions, 5: P ← function evaluations of current non-dominated Pareto optimal solutions, 6: HJ ← set of solutions found by the Hooke-Jeeves algorithm, I ← 0 iteration count, 7: while |U A | Nmax and I Imax do 8: 9: if p > 0 then 10: P old ← P A , 11: for x P ∈ P old do 12: A 1 ← x P centered hypercube having edge size 0.2, 13: while A 1 have inside only one point x P do 14: Increase the edge size of hypercube A 1 by 0.2, 15: end while 16: while A i has inside more than one point x P and edge size 2 −hn do 18: Update the sets U, U A , P, P A with new function evaluations, 21: end while 23: end for 24: end if 25: while (1 − p) > part of function evaluations get by generation in the entire feasible region A do 26: Update the sets U, U A , P, P A with new function evaluations, 29: end while 30: P old ← P A \ HJ, 31: for x P ∈ P old do 32: Define surrogate function fs(x) having current solution xcur ← x P , 33: if I > 1 and Update ≡ true then 34: Update step size parameters h 0 and hn of Hooke-Jeeves algorithm, 35: end if 36: Optimize x min = arg min x∈A fs(x) using Hooke-Jeeves and taking x P as start point, 37: Complement set of solutions with new solution HJ ← HJ {x min }, 38: Update the sets U , U A , P , P A with new function evaluations, 39: end for 40: if I ≡ 1 then 41: Find solution with minimal value of current function x P = arg min x∈P old f i (x), 44: Optimize The global search phase interchanges with the local refinement phase multiple times. The algorithm works until the maximum allowed number of function F (·) evaluations or the maximum iteration number of global searches with local refinement is reached. The algorithm returns a set of all function evaluations and a set of non-dominated Pareto optimal solutions.

Performance analysis
This section first presents theoretical convergence analysis of the proposed algorithm, and then the performance of the proposed algorithm is evaluated by the numerical experiments. Many optimization problems in engineering and science have very limited function evaluation budget, so the proposed optimization algorithm was tested in case of extremely low function evaluation budget, and the results were compared with the results of Bayesian rooted optimization algorithms [31]. On the other hand, a good performance in case of the high dimensionality of feasible region is a desirable feature, so the optimization algorithm was tested with test suite ZDT having many decision variables (up to 30) [33]. Finally, test suite DTLZ was used in case of many-objective (up to 15) optimization problems [9]. The results of the last two cases were compared with the results of popular evolutionary optimization algorithms NSGA2 and NSGA3 [3,7,8].

Convergence analysis
To prove the function evaluations in the decision space of the proposed algorithm are everywhere dense, let part of function evaluations get by generation in the entire feasible region A have a non-zero value (1 − p > 0), and a number of generated random sample points inside of a unit hypercube A have q · N 2 value. Let a maximum allowed number of function evaluations and a maximum iteration number of a global search with a local refinement approach infinite (N max → ∞, I max → ∞), then points of function evaluations in a decision space U A are everywhere dense in the decision space A.
Let us say the opposite, there remains some hypersphere S having inside no points from the set U A of function evaluations in a decision space. Let the radius value of the hypersphere S be ε > 0, and a center located at a point x S ∈ A in a feasible area. Let us define a hypersphere S 0 having a radius value ε/3 and a center located at the same point x S , hereby this newly defined hypersphere is inside of the hypersphere S, S 0 ⊂ S. Let us define another hypersphere S 1 having a radius value ε/3 and a center located at a point x S1 which is a point of function evaluations in a decision space x S1 ∈ U A . During every iteration of the proposed algorithm, a number q · N of random sample points are uniformly generated inside of the hypercube A. Let us define an event of the case, then one of random sample points x rand hits inside of the hypersphere S 0 , so the Euclidean distance between points x S and x rand is less than a radius of the hypersphere S 0 , i.e., x S − x rand < ε/3, and the remaining generated random points {x i | i = 1, . . . , q · N − 1} hit inside of the hypersphere S 1 , so the Euclidean distances between points x S1 and x i , i = 1, . . . , q · N − 1 are less than a radius of the hypersphere S 1 , i.e., x S1 − x i < ε/3, i = 1, . . . , q · N − 1. Such a defined event has a non-zero probability value, and it can be calculated in the following way. Assume that the value of volumes of parts of the hypersphere S 0 and the hypersphere S 1 inside of the feasible area A are noted as V S0 , V S1 . Volumes will have non-zero values V S0 > 0, V S1 > 0, as well as the value V A = 1 > 0 of the unit hypercube volume of the feasible area A. Then the probability of the defined event has a non-zero constant value of C 1 This means the defined event will definitely occur as a maximum iteration number of a global search with a local refinement approach infinite I max → ∞, and so experiment count approaches infinite. Let us say the defined event occurred at an iteration I 1 . Selection functions are calculated at points (−θ 1 (x i ), θ 2 (x i )) T , i = 1, . . . , q · N − 1, and at a point (−θ 1 (x rand ), θ 2 (x rand )) T , taking points having minimal tradeoff Pareto optimal values as evaluation points of an optimized function F (·). A selection function θ 1 (x) is the generated point's x Euclidean distance to the closest known point x min ∈ U A (as defined by Eq. (1)). Therefore, these inequalities are valid: θ 1 (x rand ) > 2ε/3 and θ 1 (x i ) < ε/3, i = 1, . . . , q · N − 1, so the selection function θ 1 (x rand ) at a point x rand has the biggest value; consequently, a vector of selection functions values (−θ 1 (x rand ), θ 2 (x rand )) T at a point x rand will belong to minimal trade-off Pareto optimal values of the selection functions, so the optimized function F (·) is evaluated at a point x rand , and a new point is added to the set of function evaluations in the decision space U A ← U A {x rand }. This means a function evaluation point x rand ∈ U A is added to the hypersphere S 0 and S as S 0 ⊂ S, and this is a contradiction to the statement that there remains some hypersphere S having inside no points from the set of function evaluations in the decision space U A . This means that points of U A are everywhere dense in the decision space A.
To prove the convergence of the proposed algorithm, let the optimized multi-objective function F (x) = (f 1 (x), . . . , f m (x)) T , x ∈ A, consist of Lipschitz-continuous functions f 1 (x), . . . , f m (x), and A is a feasible area of a unit hypercube. Let the part of function evaluations get by generation in the entire feasible region A have a non-zero value (1 − p > 0), and a number of generated random sample points inside of the unit hypercube A have q · N 2 value, and let a maximum allowed number of function evaluations and a maximum iteration number of a global search with a local refinement approach infinite N max → ∞, I max → ∞, then to any point on the true Pareto front x * ∈ A, F (x * ) the proposed algorithm will find a solution x P ∈ U A , F (x P ) with any small error value ε > 0 as an Euclidean distance between the found solution and a point on the true Pareto front x * − x P < ε.
Let us say the opposite, there is a point on the true Pareto front x * ∈ A, F (x * ) having no solution of the proposed algorithm with an error value ε. This means there is a hypersphere S with a radius value ε and a center located at a point x * having inside no points from a set U A of function evaluations in a decision space. But we already proved that points of U A are everywhere dense in the decision space A. This means there is a solution x P ∈ U A , F (x P ) inside of the hypersphere S and, consequently, inequality x * − x P < ε is valid, and this is a contradiction to the statement that there is a point on the true Pareto front x * ∈ A, F (x * ) having no solution of the proposed algorithm with an error value ε. This means to any point on the true Pareto front x * ∈ A, F (x * ) the proposed algorithm will find a solution x P ∈ U A , F (x P ) with any small error value ε > 0.
The investigation of the rate of convergence is more complex. On the other hand, the performance of the proposed algorithm is evaluated by the numerical experiments presented in the next subsections.

Numerical experiments having low functions evaluation budget
In case when experiments have low functions evaluation budget, the results of the proposed algorithm were compared with the results of Bayesian rooted optimization algorithms: standard and partition-based implementations of the P-algorithm [31]. Several test problems are solved to illustrate the performance of Bayesian rooted optimization algorithms [31]. The same test problems will be used here.
Frequently used multi-objective non-convex test problems were proposed in [10]. The objective functions are defined as follows: where d = 2, and the feasible region is A: −4 x 1 , x 2 4. The next bi-objective test problem is two Shekel functions frequently used to evaluate global optimization algorithms [16]: , (4 1 ) where the feasible region is A: 0 x 1 , x 2 1.
For the comparison performance of the proposed algorithm having low functions evaluation budget, the metrics applied which were used in a recent publication related to Bayesian rooted optimization algorithms [31]. One of the most simple and most popular performance metrics is the number of non-dominated solutions found by an optimization algorithm (NN). The estimation of the distance between the found approximation and the true Pareto front is the so-called generational distance (GD) [6]. GD is computed as the maximum of distances between the found non-dominated solutions and their closest neighbors from the Pareto front. A metric integrating measures of the approximation precision and spread is the so-called epsilon indicator (EI) [34]. EI is computed as the maximum of distances between the true Pareto front and their closest neighbors from found non-dominated solutions. Metrics GD and EI can be expressed by the following equations: where P is a set of non-dominated solutions found by the considered algorithm, and P * is a set of solutions well representing the Pareto front, i.e., the solutions are sufficiently densely and uniformly distributed over the Pareto front.
The proposed algorithm was tested with 100 function F (x) evaluation budget. The proposed algorithm was run with the following parameters: The parameter N max value is selected 10% below function evaluation budget since stop condition is checked once every iteration after global search and local refinement phase is finished. On average function evaluation budget of 100 evaluations is not exceeded, but some algorithm's runs make more, and some runs make fewer function evaluations. The same parameter N max selection strategy will be used in this paper. The number of iteration I max is selected to be the maximum number that cannot be achieved, so it can be noted as infinite (I max = ∞). More parameters are N and q, where N = 20 is the number of initial random function evaluations, and q = 10000 is the coefficient value of expression q · N giving the number of randomly generated candidate points in decision space for new function evaluations. The number of randomly generated candidate points should be high in the case of low functions evaluation budget so such a high value of q is selected. Value of p = 0.8 is part of function evaluations get by local generation near current Pareto optimal solutions compared to function evaluations get by global generation in all feasible region A. The step size parameters of the Hooke-Jeeves optimization algorithm have the following values: h 0 = 2 and h n = 4. The parameter Update is set to true, so the step size parameters h 0 and h n are updated at second and following iterations of global search with local refinement. The same parameters were used for both test problems except when the parameter's h o value was h o = 4 for problem (4).
Since the proposed algorithm and P-algorithm are stochastic, the test problems were solved 100 times [16,28]. The mean values and standard deviations of the considered metrics are present in two columns of Table 1. Otherwise, the hyperrectangle partitionbased P-algorithm is deterministic, so its results occupy a single column for each test problem [31]. The proposed algorithm shows decent performance. In case of problem (3) the proposed algorithm gives better NN and EI values than the P-algorithm. In case of problem (4) the proposed algorithm gives the best NN value and gives better GD and EI values than the hyperrectangle partition-based P-algorithm.
For visualization of the proposed algorithm's operation, the solutions of problems (3) and (4) having the best metric EI value are presented in Fig. 2. The Pareto front is well https://www.journals.vu.lt/nonlinear-analysis Table 1. Mean values and standard deviations of performance criteria (NN, GD, EI) of the P-algorithm, partition-based implementations of the P-algorithm and the proposed algorithm for test problems (3)   approximated by Pareto optimal solutions found by the proposed algorithm in both test problems (Figs. 2(a) and 2(c)). Function evaluation points made by global search tend to explore the area with no function evaluations (triangle points) and to cluster new function evaluation points near current Pareto optimal solutions (circle points) (Figs. 2(b) and 2(d)). Function evaluation points made by local refinement (square points) are well clustered near the line of Pareto optimal solutions in decision space (Figs. 2(b) and 2(d)).

Numerical experiments using test problems having many decision variables
In the case of the high dimensionality of the feasible region, the results of the proposed algorithm were compared with the results of the evolutionary optimization algorithm NSGA2 [3,8]. The performance of the optimization algorithm was tested with the biobjective test suite ZDT having many decision variables. Biobjective test problems ZDT1, ZDT2, ZDT3 have 30 decision variables, and ZDT4, ZDT6 have 10 decision variables [33].
For the comparison performance of the proposed algorithm in the case of the high dimensionality of the feasible region, the metrics applied which were used in publications related to the evolutionary optimization algorithm NSGA2 [3,8]. The estimation of the average distance between the found approximation and the true Pareto front is the average generational distance (GD avg ) [8]. GD avg is computed as the average of distances between the found non-dominated solutions and their closest neighbors from the Pareto front. A metric integrating measures of the approximation precision and spread is so-called inverted generational distance (IGD avg ) [3]. IGD avg is computed as the average of distances between the true Pareto front and their closest neighbors from found non-dominated solutions. Metrics GD avg and IGD avg can be expressed by the following equations: ) where P is a set of non-dominated solutions found by the considered algorithm, and P * is a set of solutions well representing the Pareto front, i.e., the solutions are sufficiently densely and uniformly distributed over the Pareto front, i.e., |P * | = 500 uniformly distributed points were used as a set of solutions well representing the Pareto front. The proposed algorithm was tested with fixed function F (x) evaluation budget. The parameter N max value is selected 10% below the function evaluation budget, so on average the function evaluation budget is not exceeded. The proposed algorithm was run with the following parameters: The number of iteration I max = ∞ is selected to be the maximum number which cannot be achieved. The number of initial random function evaluations is N = 100, and the  coefficient value of number q · N of randomly generated candidate points in decision space for new function evaluations is q = 1. The number of randomly generated candidate points should not be high in the case of a large functions evaluation budget as it cause a great burden for the proposed algorithm, so a low coefficient value of q is selected. The value of p = 0.8 is part of function evaluations get by local generation near current Pareto optimal solutions compared to function evaluations get by generation in the entire feasible region. The step size parameters of the Hooke-Jeeves optimization algorithm have the following values: h 0 = 2 and h n = 8. The step size parameters h 0 and h n are updated at second and following iterations of global search with local refinement as parameter Update is set to true. The same parameters were used for all ZDT test problems. Since the proposed algorithm and evolutionary optimization algorithm NSGA2 are stochastic, the test problems were solved multiple times [3,8]. The mean values and variance or standard deviations of the considered metrics are presented in Tables 2 and 3. Note that in this subsection the normalization of function values is performed with ideal and nadir points in the case of IGD avg metric calculation; see Eq. (2). Table 2 shows GD avg metric results of the proposed algorithm compared to the results of the evolutionary optimization algorithm NSGA2 after 10 experiments having 25000 function evaluation budget [8]. Table 3 shows IGD avg metric results of the proposed algorithm compared to the results of the evolutionary optimization algorithm NSGA2 after 51 experiments [3].
The proposed algorithm shows good performance. Table 2 shows GD avg metric results; the proposed algorithm has better performance compared to the results of the evolutionary optimization algorithm NSGA2. The proposed algorithm has a lower GD avg metric value compared to NSGA2 except in the case of the ZDT2 test problem, where the binary-coded NSGA2 performs better. Table 3 shows IGD avg metric results; the proposed algorithm has better performance compared to the results of the evolutionary optimization algorithm NSGA2 in the case of ZDT1, ZDT3 and ZDT6 test problems. The proposed algorithm has worse performance compared to the results of the evolutionary optimization algorithm NSGA2 in the case of ZDT2 and ZDT4 test problems.

Numerical experiments using many-objective test problems
Many-objective optimization problems having up to 15 objective functions (M 15) are considered. Representation of Pareto optimal surface for many-objective optimization problems is usually difficult as representation needs exponentially growing points count for the additional objective function. Multiple predefined reference points can be specified, and Pareto-optimal trade-off solutions corresponding to each reference point are found as it is done in the genetic algorithm NSGA3 [7]. In the case of many-objective optimization problems, adequate trade-off Pareto optimal surface approximation may not be found using a few hundred reference points [7]. Reference points selection may be tricky if trade-off solutions with specific properties are needed, so the newly developed optimization algorithm does not need any predefined reference points. The newly developed optimization algorithm was tested with many-objective test suite DTLZ having up to 15 objective functions (M 15) [9]. The number of variables is M + k − 1, where number k = 5 is for DTLZ1, while number k = 10 is for DTLZ2, DTLZ3 and DTLZ4 test problems. The results were compared with the results of popular evolutionary optimization algorithms NSGA3 [7].
The proposed algorithm was tuned to find raw solutions since having moderate function evaluation budget an adequate representation of Pareto optimal surface for manyobjective optimization problems usually is impossible. The parameter N max value is selected to be very large, it can be noted N max = ∞, so the number of iteration I max is specified as a stop condition in Table 4. The proposed algorithm was run with the following parameters: (N, q, p, h o , h n , Update) T = (100, 1, 0.8, 2, 6, false) T .
The number of initial random function evaluations is N = 100, and coefficient value of number q · N of randomly generated candidate points in decision space for new function evaluations is q = 1. Value of p = 0.8 is part of function evaluations get by local generation near current Pareto optimal solutions compared to function evaluations get by generation in all the feasible region. The step size parameters of the Hooke-Jeeves optimization algorithm have the following values: h 0 = 2 and h n = 6. Raw solutions are searched, so step size parameters h 0 and h n update can drastically increase function evaluation budget, so parameters are not updated at second and following iterations since the parameter Update is set to false.
After a raw trade-off solutions {P, P A } is found, random solutions {P , P A } are selected to be tuned by the surrogate function optimization using the Hooke-Jeeves algorithm. Table 5 shows how many raw solutions are taken for tuning using surrogate function optimization. For every random solution x P ∈ P A , a separate surrogate function having Table 4. Mean values of the function evaluation count and the best, median and worst IGD ref values obtained for proposed algorithm and NSGA3 on M-objective DTLZ test problems. The number of iterations Imax is specified as the stop condition of the proposed algorithm.

NSGA3
Proposed algorithm NSGA3 Proposed algorithm  current solution x cur = x P is defined. Every defined surrogate function is optimized using the Hooke-Jeeves optimization algorithm taking x P as the start point: where step size parameters are set to h 0 = 2 and h n = 12. All test problems (Table 4) were optimized with the same parameters, except in the case of DTLZ2 and DTLZ4 the proposed algorithm's step size parameters were set to h 0 = 2 and h n = 4, and step size parameters of raw solutions tuning phase was set to h 0 = 2 and h n = 6 to have larger step sizes, and therefore function evaluation budget is of similar size with NSGA3. Also, in the case of M -objective DTLZ4 test problems having M = 3 and M = 5 objective functions, a number of initial random function evaluations was set to N = 3000, and coefficient value of number q · N of randomly generated candidate points was set to q = 0.03. This way enough raw solutions was found, and the needed count of raw solutions is given in Table 5.
For the comparison performance of the proposed algorithm in the case of a manyobjective optimization problem, the metrics was applied which was used in publications related to the evolutionary optimization algorithm NSGA3 [7]. A metric similar to inverted generational distance IGD avg defined by (5) equation, but instead of computing the average of distances between the true Pareto front and their closest neighbors from found non-dominated solutions, a new metric IGD ref compute average of distances between reference points on the true Pareto front and their closest neighbors from found nondominated solutions. Metric IGD ref can be expressed by the following equation: where P tuned is non-dominated solutions tuned to reference points, P * is a set of reference points on the true Pareto front. In the case of NSGA3 a multiple predefined Pareto-optimal reference points P * are specified, and Pareto-optimal trade-off solutions corresponding to each reference point are found [7]. In the case of the newly developed optimization algorithm a predefined reference points selection is not needed. Raw solutions after final iterations of the proposed algorithm are randomly selected {P , P A }, and closest points on the true Pareto front are selected as a set of reference points P * . Selected non-dominated raw solutions {P , P A } are tuned by the surrogate function optimization using the Hooke-Jeeves algorithm to get a non-dominated solution set P tuned . As raw solutions are selected https://www.journals.vu.lt/nonlinear-analysis randomly {P , P A }, so this is equivalent to raw solutions selection by a human expert to get solutions with specific properties. The sizes of sets of raw solutions {P , P A } and reference points P * are given in Table 5.
Since the proposed algorithm and evolutionary optimization algorithm NSGA3 are stochastic, the test problems were solved 20 times [7]. The best median and worst values of the considered metric IGD ref are presented in Table 4. The results of the proposed algorithm were compared to the results of the evolutionary optimization algorithm NSGA3 [7].

Conclusions
The hybrid multi-objective optimization algorithm is proposed combining random global search and local refinement of the found approximation of the Pareto front. The global search algorithm mimics the Bayesian algorithm. The Hooke-Jeeves algorithm is used for local refinement. At the local optimization phase, the multi-objective optimization problem is converted to a single-objective optimization problem by introducing a surrogate function without the use of the weight vectors. The developed algorithm was tested in the case of extremely low functions evaluation budget, and the proposed algorithm gives decent performance comparing results with the results of Bayesian rooted optimization algorithms. Also, the proposed optimization algorithm was tested with many decision variables test suite and with many-objective test suite, and the results of numerical experiments showed good performance compared with the results of popular evolutionary optimization algorithms NSGA2 and NSGA3. In the case of many-objective optimization, the proposed algorithm need no predefined reference points, so raw solutions selection can be done by a human expert to be tuned to final solutions of needed properties. Future plans include the development parallel version of the proposed algorithm to optimize functions with an extreme computational burden. Another research theme of interest is integrating the proposed algorithm execution with human expert decisions to get solutions of needed properties.