On the Numerical Solution of Chaotic Dynamical Systems Using Extend Precision Floating Point Arithmetic and Very High Order Numerical Methods

Multiple results in the literature exist that indicate that all computed solutions to chaotic dynamical systems are time-step dependent. That is, solutions with small but different time steps will decouple from each other after a certain (small) finite amount of simulation time. When using double precision floating point arithmetic time step independent solutions have been impossible to compute, no matter how accurate the numerical method. Taking the well-known Lorenz equations as an example, we examine the numerical solution of chaotic dynamical systems using very high order methods as well as extended precision floating point number systems. Time step independent solutions are obtained over a finite period of time. However even with a sixteenth order numerical method and with quad-double floating point numbers, there is a limit to this approach.


Introduction
The numerical solution of nonlinear systems of differential equations that exhibit chaotic behavior, such as the well-known Lorenz equations [1] x = σ(y − x), x(0) = x 0 , y = rx − y − xz, y(0) = y 0 , (1) z = −bz + xy, z(0) = z 0 , has proven very challenging.Using existing numerical methods and current day computer technology, researchers have not been able to produce time-step independent numerical solutions of chaotic systems.Solutions with small but different time steps will decouple from each other after a certain (small) finite amount of simulation time.One of the defining traits of a chaotic system is its sensitive dependence on initial conditions which is characterized by nearby solutions separating exponentially fast due to the fact that the c Vilnius University, 2011 system has a positive Liapunov exponent [2].Thus, approximate solutions starting from exactly the same initial conditions but using a different time step size will drift away from each other if truncation and rounding errors prevent the approximate solution from being accurate to a sufficient number of decimal places.
The sensitivity of the system (1) was originally noted by Lorenz in 1963 [1].While trying to repeat previous numerical work with a second order Runge-Kutta (RK) method, Lorenz wrote down the output of the method and stopped the long calculation.Later the method was restarted with the recorded intermediate values as initial conditions.The final result was a vastly different solution than he was trying to recreate.The disparity was caused by the recorded intermediate values not exactly matching the floating point numbers in memory.
Recent accounts of the difficulty in calculating accurate long time, time-step independent solutions of the Lorenz equations (and other chaotic equations) include the following: • Reference [3] considers the numerical solution of chaotic differential equations in general with an application to the one-dimensional Kuramoto-Sivashinsky equation.The author concludes that "No computed chaotic solution, which is independent of the integration time-step employed, exists.".• In [4], the following was concluded about the Lorenz equations: "Similar behavior was noted with Adams-Bashforth methods up to the fifth order; an implicit Crank-Nicholson method; second-order and forth-order Runge-Kutta methods; adaptive methods; and compact time-difference schemes.In no case was a convergent solution obtained for t 20.".• Reference [5] discusses the inability to produce time-step independent solutions of the Lorenz equations with second and fourth order RK methods.Comments supporting the conclusions of [5] were made in the note [6].
• Reference [7] uses the arbitrary precision capabilities of Mathematica with up to 800 digits of decimal precision and high order Taylor series methods of up to order 400 to calculate accurate solutions of the Lorenz equations.Extended arbitrary precision floating point arithmetic is much more computationally expensive than the extended fixed precision that we consider in this work.• Higher order methods (order 4 to 8) with error control and adaptive step-size have been used on the Lorenz equations.The results are summarized in the book [8, p. 245] where a numerical example is explained: "The solution is, for large t, extremely sensitive to the errors of the first integration steps.For example, at t = 50 the numerical solution becomes totally wrong, even if the computations are preformed in quadruple precision with tol = 10 −20 .Hence the numerical results of all methods would be equally useless and no comparison makes any sense.Therefore we chose t end = 16." The goal of this work is to calculate solutions of the Lorenz equations that are timestep independent up to t = 100 in the sense that the maximum error between two computed solutions with small but different time steps is less than a tolerance of tol = 5 e − 14.In addition to obtaining time step independent solutions, we use two vastly different types of numerical methods to obtain the solutions.This is because, as pointed out in [9], that a practical way of judging the validity of the numerical results from a non-linear dynamical system is to use two or more different methods to solve the same problem.If the two solutions agree, then we can have some confidence in the computed solutions.However with chaotic systems, we can not claim that the agreeing solutions are the true solutions of the system.The two numerical methods that are used are a 11 stage, order 8, explicit Runge-Kutta method and a 8 stage, order 16, implicit Runge-Kutta method.Constant step sizes are used and the methods are implemented in floating point number systems using double, double-double, and quad-double data types.
In this work we numerically explore chaotic solutions of the Lorenz system.It should be noted that much is known theoretically about the existence of chaos in the Lorenz system as well.The first mathematical proof of such chaotic behavior in the Lorenz system was given in reference [10] in 1995.Subsequently, more efficient ways of proving chaos have been proposed [11].

Floating point numbers systems
The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is the most widely-used standard for floating-point computation.It is followed by virtually every computer used in scientific computing.The current version, IEEE 754-2008, was published in August 2008 and is a revision of the original version that was published in 1985.Properties of three of the binary formats specified by the standard are listed in Table 1.The column labeled dps lists the number of decimal digits to which a base ten number is correctly represented to in the floating point system.In a given floating point number system, every real number is approximated and every floating point operation is performed with a relative error of at most machine epsilon, ε m .IEEE double floating point arithmetic is sufficiently accurate for most scientific applications.However, for a rapidly growing body of important scientific computing applications, a higher level of numeric precision is required.A sampling of such applications are surveyed in [12].An additional example is given by the first author in radial basis functions approximation methods [13].The addition of the quadruple type to the 2008 IEEE standard is in response to the need for more precise floating point arithmetic.To date, a hardware implementation of the quadruple type does not exist for mainstream desktop computers.When a hardware implementation of quadruple precision is available it will likely be at least twice slower than double precision [14].The books [15] and [16] can be consulted for more information on floating point arithmetic.
An alternative to the quadruple type is the double-double type, in which the unevaluated sum of two IEEE double numbers is used to represent a number with twice the precision.A simple example is that 8.765 × 10 5 + 4.32 × 10 1 can be used to represent the number 8.765432 × 10 5 .The idea can be extended to a quad-double number that is an unevaluated sum of four IEEE doubles with four times the precision of a double.A freely available, open source C++ library is available that implements algorithms for basic arithmetic operations for the double-double and quad-double types, as well as some algebraic and transcendental functions [17].Modifying existing C or C++ software to implement extended precision is in most cases trivial and only requires an include statement and a define statement such as the following:

h> #define double dd_real
There is a performance penalty for precisions beyond double.Typical execution time of the formats are listed in Tables 1 and 2 where the double time has been normalized to be one.There is a range of execution time penalty factors for each extended type due to several reasons.When implemented in software, the quadruple and double-double types can be expected to run 5 to 10 times slower than the double format, with the exact factor depending on the particular computer platform, the compiler, and the compiler flags that are used.If implemented in hardware, the quadruple format should executive in twice the time that a double does [14].However, the algorithms for extended, but fixed, precision floating point systems are many times more efficient than the algorithms for arbitrary precision floating point calculations of comparable length.In our benchmarks, the C++ arbitrary precision package available from [17] with the decimal precision set to 62 (the decimal precision of a quaddouble) runs 5 times slower than does the fixed quad-double precision code.

Runge-Kutta methods
For an autonomous system of ODEs, Runge-Kutta (RK) methods are of the form where the internal stages are determined by and where s denotes the number of stages.If a ij = 0 whenever i j the method is explicit and the equations (3) provide a recursion for computing each Y i in terms of the proceeding stages.Otherwise, the method is implicit and the Y i need to be calculated by solving a system of nonlinear algebraic equations.If the system of ODEs is stiff, the nonlinear equations are typically solved by Newton's method, or a variant of Newton's method, as described in [18, p. 118] in order to maintain the good stability properties of the method.Otherwise, if stiffness is not an issue, the equations can be solved by a simple fixed point iteration The equations are iterated until the increment of two successive approximations satisfies either or ] which indicates that the increments of the iteration start to oscillate due to roundoff error [19].The value of δ depends on the floating point system and is taken to be a number slightly large than machine epsilon in the system.In double precision we take δ = 1 e − 14, in double-double δ = 1 e − 28, and in quad-double δ = 1 e − 56.In all cases, the iteration typically converges in 2 to 6 iterations.Two particular RK methods are used in the numerical experiments.The first is an eighth order explicit method with eleven stages.The a and b coefficients of the RK8 method are given in [20].
The second method, referred to as Gauss16, is an eight stage, sixteenth order, implicit Gauss RK method.The Gauss RK methods (also known as Butcher-Kuntzmann RK methods) are of order 2s and have the highest possible order of any RK method relative to the number of stages [21].The coefficients of Gauss RK methods of order 2, 4, 6, 8, and 10 are listed in [22].We are unaware of any Gauss RK method that is higher than order 10 being described in the literature.This is most likely because using such a high order method would be overkill when employed using double precision.With such a high order method, reducing a moderately small step size any further would not further reduce the overall error due to roundoff errors.However, we intend to use the method with extended precision and have derived a sixteenth order method.The coefficients of the method are listed with 65 decimal places of accuracy in the appendix.

Numerical results
The parameters in the system (1) have been set to σ = 10, b = 8/3, and r = 28 as these have been considered by many other authors, for example [1,4,5].The initial conditions x 0 = 1, y 0 = −1, and z 0 = 10 have been used.In all numerical runs, the solution has been approximated from t = 0 to t = 100 and the numerical solutions have been saved and plotted at intervals of 0.1, i.e. at times t = 0, 0.1, 0.2, . . ., 99.9, 100.Time step size independent solutions are sought in the sense that solutions calculated with small but different size time steps that have a maximum difference of less than a tolerance of tol = 5 e−14.The RK8 and Gauss16 methods are used with three different floating point types: double, double-double, and quad-double.We only discuss the x variable from the system as the results for y and z are similar.With each numerical method and floating point system, we seek a time step independent solution by refining the time-step in the sequence ∆t = 1 e − 3, 1 e − 4, 1 e − 5, 1 e − 6, and 1 e − 7.
The double precision results are shown in Fig. 1.In the upper left image of the figure, the ∆t = 1 e − 6 and ∆t = 1 e − 7 RK8 solutions visually agree to about t = 42.At t = 0.1 when their difference is first measured, the two solutions differ by 9.8 e−14 which exceeds the set tolerance.The double precision Gauss16 ∆t = 1 e − 6 and ∆t = 1 e − 7 solutions are compared in the lower images of Fig. 1.As was the case for the RK8 double solutions, the two Gauss16 solutions visually agree to about t = 42.The two solutions differ by more than the tolerance at t = 0.3.At first it may seem odd that the much more accurate Gauss16 method does not produce time step independent solutions over a longer period of time than does the RK8 methods.However, for such high order numerical methods, the local truncation errors of both methods may be much smaller than machine epsilon when even moderately small time steps are used.At this point, it is the lack of accuracy and rounding errors in the floating point system that is preventing a longer time period of agreement.
The double-double precision results with ∆t = 1 e − 6 and ∆t = 1 e − 7 are shown in Fig. 2. In the upper images of the figure, the RK8 solutions are visually indistinguishable up to about t = 78.The two RK8 numerical solutions agree to within the tolerance through time t = 46.6.In the lower images, the Gauss16 solutions are also visually indistinguishable up to about t = 78.The two Gauss16 numerical solutions agree to within the tolerance through time t = 46.5.Right: logarithmic plot of the difference of the two solutions.
The quad-double precision RK8 results with ∆t = 1 e − 6 and ∆t = 1 e − 7 have a maximum difference of 2.83 e−6.The solutions are visually indistinguishable to t = 100, but only agree to within the tolerance through time t = 72.7.The time step size is halved and another run is computed with ∆t = 5 e − 8 and compared to the ∆t = 1 e − 7 solution.The two solutions agree to within the tolerance to t = 100 which is the last time their difference was measured.The two solutions agree to within the double precision machine epsilon through time t = 96.6.The quad-double solution with ∆t = 1 e − 7 is time step size independent up to t = 100.
The Gauss16 quad-double solutions with ∆t = 1 e − 3∆t = 1 e − 4 agree to within the tolerance to t = 75.4.Reducing the size of the time step once more we find that the Gauss16 quad-double solutions with ∆t = 1 e − 4∆t = 1 e − 5 not only agree within the tolerance to t = 100 but agree to the 16 decimal places.Additionally, the solutions also agree with the RK8 quad-double solutions to 16 decimal places.
The quad-double RK8 ∆t = 1 e − 7 and ∆t = 5 e − 8 and Gauss16 ∆t = 1 e − 4 and ∆t = 1 e − 5 all agree to within the tolerance up to t = 100.Additionally, the quaddouble RK8 ∆t = 5 e − 8 and Gauss16 ∆t = 1 e − 4 and ∆t = 1 e − 5 are in agreement up to 16 decimals places to t = 100.This solution is taken as our time step independent solution.The solution is plotted in Fig. 3 and is tabulated at intervals of t = 10 in Table 3.In Table 4, the agreement with the time step independent solution and RK8 and Gauss16 in double-double and double precision with ∆t = 1 e − 7 is listed.Another number that was recorded in all numerical runs was the maximum absolute value of x(t) over the time interval [0, 100].To fifteen decimal places, all RK8 doubledouble, and quad-double runs with ∆t = 1 e − 5 to 5 e − 8 agree that max t∈[0,100] All Gauss16 runs with double-double and ∆t = 1 e − 4 to 1 e − 7 and quad-double runs with ∆t = 1 e−2 to 1 e−5 agree with the above number to fifteen decimal places as well.

Conclusions
Using quad-double floating point arithmetic, we have calculated time step independent solutions up to t = 100 with two different numerical methods, an eighth order explicit Runge-Kutta method and a sixteenth order implicit Runge-Kutta method that agree to 16 decimal places.The solutions may serve as benchmarks in other numerical studies.
Calculating numerical solutions of chaotic dynamical systems that are time step independent over long time periods remains a challenging problem when using computer technology that is currently available.Standard double precision floating point number systems are not adequate for this purpose.A quadruple precision type has been added to the IEEE 754 standard that was revised in 2008.However, to date, the quadruple type has not been implemented in hardware or software.Software packages are available that implement double-double and quad-double floating point arithmetic that offer two and four times more decimal precision than the double type.A performance penalty is incurred in using software floating point systems, however the fixed extended types are more efficient than the arbitrary precision extended types where the decimal precision is user specified.
Although not yet available, it is possible to extend the quad-double algorithms [23] to develop floating point systems that have 8 or 16 times the decimal precision of an IEEE double.These more precise floating point systems will be necessary for obtaining time-step independent numerical solutions over longer time periods.
Despite employing very precise floating point number systems and very high order numerical methods, we were unable to compute time step size independent solutions significantly beyond t = 100, which cannot be considered a large time in computational chaos.If even more precise floating point numbers systems than quad-double were used with even higher order numerical methods, this finite interval of time step independent solutions could be extended.However, extending the finite interval to an infinite interval is impossible due to the chaotic nature of the true solution.
As the numerical experiments that have been presented in this work were being completed, the authors became aware of emerging technology that potentially make computing with the quad-double software hundreds of times faster.In [24] the authors describe a port of the quad-double package that runs on general purpose graphics processor units (gpgpus).The software is freely available from http://code.google.com/p/gpuprec/.In benchmarks, the gpgpu based quad-double package ran up to 27 times faster than the cpu quad-double code.The gpgpu benchmarks were made using the Nvidia 280.Modern gpgpus such as the Nvidia Tesla M2070 claim to offer float point operations that are up to 250 times faster than their cpu based counterparts.The new gpgpu technology and the gpgpu based quad-double package should allow numerical experiments, such as we have conducted in this work, to be performed much more efficiently than can be done with a computer with only cpus.

Appendix: Guass16 coefficients
The coefficients of an implicit 8 stage Gauss Runge-Kutta that is sixteenth order.The coefficients are accurate to 65 decimal places.

Table 2 .
Information on floating point types derived from the IEEE double and implemented in software.

Table 3 .
Time step independent solutions computed with quad-double precision and Gauss16 with ∆t = 1 e − 4 and RK8 with ∆t = 5 e − 8.

Table 4 .
The agreement with the time step independent solutions of RK8 and Gauss16 in double-double and double precision with ∆t = 1 e − 7.