2 % This file is part of the GROMACS molecular simulation package.
4 % Copyright (c) 2013,2014,2015, by the GROMACS development team, led by
5 % Mark Abraham, David van der Spoel, Berk Hess, and Erik Lindahl,
6 % and including many others, as listed in the AUTHORS file in the
7 % top-level source directory and at http://www.gromacs.org.
9 % GROMACS is free software; you can redistribute it and/or
10 % modify it under the terms of the GNU Lesser General Public License
11 % as published by the Free Software Foundation; either version 2.1
12 % of the License, or (at your option) any later version.
14 % GROMACS is distributed in the hope that it will be useful,
15 % but WITHOUT ANY WARRANTY; without even the implied warranty of
16 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
17 % Lesser General Public License for more details.
19 % You should have received a copy of the GNU Lesser General Public
20 % License along with GROMACS; if not, see
21 % http://www.gnu.org/licenses, or write to the Free Software Foundation,
22 % Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
24 % If you want to redistribute modifications to GROMACS, please
25 % consider that scientific software is very special. Version
26 % control is crucial - bugs must be traceable. We will be happy to
27 % consider code for inclusion in the official distribution, but
28 % derived work must not be called official GROMACS. Details are found
29 % in the README & COPYING files - if they are missing, get the
30 % official version at http://www.gromacs.org.
32 % To help us fund GROMACS development, we humbly ask that you cite
33 % the research papers on the package. Check out http://www.gromacs.org.
35 \chapter{Some implementation details
}
36 In this chapter we will present some implementation details. This is
37 far from complete, but we deemed it necessary to clarify some things
38 that would otherwise be hard to understand.
40 \section{Single Sum Virial in
{\gromacs}}
42 The
\normindex{virial
} $
\Xi$ can be written in full tensor form as:
44 \Xi~=~-
\half~
\sum_{i < j
}^N~
\rvij\otimes\Fvij
46 where $
\otimes$ denotes the
{\em direct product
} of two vectors.
\footnote
47 {$(
{\bf u
}\otimes{\bf v
})^
{\ab}~=~
{\bf u
}_
{\al}{\bf v
}_
{\be}$
} When this is
48 computed in the inner loop of an MD program
9 multiplications and
9
49 additions are needed.
\footnote{The calculation of
50 Lennard-Jones and Coulomb forces is about
50 floating point operations.
}
52 Here it is shown how it is possible to extract the virial calculation
53 from the inner loop~
\cite{Bekker93b
}.
56 In a system with periodic boundary conditions
\index{periodic boundary
58 periodicity must be taken into account for the virial:
60 \Xi~=~-
\half~
\sum_{i < j
}^
{N
}~
\rnij\otimes\Fvij
62 where $
\rnij$ denotes the distance vector of the
63 {\em nearest image
} of atom $i$ from atom $j$. In this definition we add
64 a
{\em shift vector
} $
\delta_i$ to the position vector $
\rvi$
65 of atom $i$. The difference vector $
\rnij$ is thus equal to:
67 \rnij~=~
\rvi+
\delta_i-
\rvj
73 In a triclinic system, there are
27 possible images of $i$; when a truncated
74 octahedron is used, there are
15 possible images.
76 \subsection{Virial from non-bonded forces
}
77 Here the derivation for the single sum virial in the
{\em non-bonded force
}
78 routine is given. $i
\neq j$ in all formulae below.
79 \newcommand{\di}{\delta_{i
}}
80 \newcommand{\qrt}{\frac{1}{4}}
83 &~=~&-
\half~
\sum_{i < j
}^
{N
}~
\rnij\otimes\Fvij \\
84 &~=~&-
\qrt\sum_{i=
1}^N~
\sum_{j=
1}^N ~(
\rvi+
\di-
\rvj)
\otimes\Fvij \\
85 &~=~&-
\qrt\sum_{i=
1}^N~
\sum_{j=
1}^N ~(
\rvi+
\di)
\otimes\Fvij-
\rvj\otimes\Fvij \\
86 &~=~&-
\qrt\left(
\sum_{i=
1}^N~
\sum_{j=
1}^N ~(
\rvi+
\di)
\otimes\Fvij~-~
\sum_{i=
1}^N~
\sum_{j=
1}^N ~
\rvj\otimes\Fvij\right) \\
87 &~=~&-
\qrt\left(
\sum_{i=
1}^N~(
\rvi+
\di)
\otimes\sum_{j=
1}^N~
\Fvij~-~
\sum_{j=
1}^N ~
\rvj\otimes\sum_{i=
1}^N~
\Fvij\right) \\
88 &~=~&-
\qrt\left(
\sum_{i=
1}^N~(
\rvi+
\di)
\otimes\Fvi~+~
\sum_{j=
1}^N ~
\rvj\otimes\Fvj\right) \\
89 &~=~&-
\qrt\left(
2~
\sum_{i=
1}^N~
\rvi\otimes\Fvi+
\sum_{i=
1}^N~
\di\otimes\Fvi\right)
91 In these formulae we introduced:
93 \Fvi&~=~&
\sum_{j=
1}^N~
\Fvij \\
94 \Fvj&~=~&
\sum_{i=
1}^N~
\Fvji
96 which is the total force on $i$ with respect to $j$. Because we use Newton's Third Law:
100 we must, in the implementation, double the term containing the shift $
\delta_i$.
102 \subsection{The intra-molecular shift (mol-shift)
}
103 For the bonded forces and SHAKE it is possible to make a
{\em mol-shift
}
104 list, in which the periodicity is stored. We simple have an array
{\tt mshift
}
105 in which for each atom an index in the
{\tt shiftvec
} array is stored.
107 The algorithm to generate such a list can be derived from graph theory,
108 considering each particle in a molecule as a bead in a graph, the bonds
111 \item[1] Represent the bonds and atoms as bidirectional graph
112 \item[2] Make all atoms white
113 \item[3] Make one of the white atoms black (atom $i$) and put it in the
115 \item[4] Make all of the neighbors of $i$ that are currently
117 \item[5] Pick one of the gray atoms (atom $j$), give it the
118 correct periodicity with respect to any of
121 \item[6] Make all of the neighbors of $j$ that are currently
123 \item[7] If any gray atom remains, go to
[5]
124 \item[8] If any white atom remains, go to
[3]
126 Using this algorithm we can
128 \item optimize the bonded force calculation as well as SHAKE
129 \item calculate the virial from the bonded forces
130 in the single sum method again
133 Find a representation of the bonds as a bidirectional graph.
135 \subsection{Virial from Covalent Bonds
}
136 Since the covalent bond force gives a contribution to the virial, we have:
139 V_b &~=~&
\half k_b(b-b_0)^
2 \\
140 \Fvi &~=~& -
\nabla V_b \\
141 &~=~& k_b(b-b_0)
\frac{\rnij}{b
} \\
144 The virial contribution from the bonds then is:
146 \Xi_b &~=~& -
\half(
\rni\otimes\Fvi~+~
\rvj\otimes\Fvj) \\
147 &~=~& -
\half\rnij\otimes\Fvi
150 \subsection{Virial from SHAKE
}
151 An important contribution to the virial comes from shake. Satisfying
152 the constraints a force
{\bf G
} that is exerted on the particles ``shaken.'' If this
153 force does not come out of the algorithm (as in standard SHAKE) it can be
154 calculated afterward (when using
{\em leap-frog
}) by:
156 \Delta\rvi&~=~&
\rvi(t+
\Dt)-
157 [\rvi(t)+
{\bf v
}_i(t-
\frac{\Dt}{2})
\Dt+
\frac{\Fvi}{m_i
}\Dt^
2] \\
158 {\bf G
}_i&~=~&
\frac{m_i
\Delta\rvi}{\Dt^
2}
160 This does not help us in the general case. Only when no periodicity
161 is needed (like in rigid water) this can be used, otherwise
162 we must add the virial calculation in the inner loop of SHAKE.
164 When it
{\em is
} applicable the virial can be calculated in the single sum way:
166 \Xi~=~-
\half\sum_i^
{N_c
}~
\rvi\otimes\Fvi
168 where $N_c$ is the number of constrained atoms.
170 %Another method is the Non-Iterative shake as proposed (and implemented)
171 %by Yoneya. In this algorithm the Lagrangian multipliers are solved in a
172 %matrix equation, and given these multipliers it is easy to get the periodicity
173 %in the virial afterwards.
178 \section{Optimizations
}
179 Here we describe some of the algorithmic optimizations used
180 in
{\gromacs}, apart from parallelism.
181 One of these, the implementation of the
182 1.0/sqrt(x) function is treated separately in
\secref{sqrt
}.
183 The most important other optimizations are described below.
185 \subsection{Inner Loops for Water
}
186 \label{sec:waterloops
}
187 {\gromacs} uses special inner loops to calculate non-bonded
188 interactions for water molecules with other atoms, and yet
189 another set of loops for interactions between pairs of
190 water molecules. There highly optimized loops for two types of water models.
191 For three site models similar to
192 SPC~
\cite{Berendsen81
},
{\ie}:
194 \item There are three atoms in the molecule.
195 \item The whole molecule is a single charge group.
196 \item The first atom has Lennard-Jones (
\secref{lj
}) and
197 Coulomb (
\secref{coul
}) interactions.
198 \item Atoms two and three have only Coulomb interactions,
201 These loops also works for the SPC/E~
\cite{Berendsen87
} and
202 TIP3P~
\cite{Jorgensen83
} water models.
203 And for four site water models similar to TIP4P~
\cite{Jorgensen83
}:
205 \item There are four atoms in the molecule.
206 \item The whole molecule is a single charge group.
207 \item The first atom has only Lennard-Jones (
\secref{lj
}) interactions.
208 \item Atoms two and three have only Coulomb (
\secref{coul
}) interactions,
210 \item Atom four has only Coulomb interactions.
213 The benefit of these implementations is that there are more floating-point
214 operations in a single loop, which implies that some compilers
215 can schedule the code better. However, it turns out that even
216 some of the most advanced compilers have problems with scheduling,
217 implying that manual tweaking is necessary to get optimum
218 \normindex{performance
}.
219 This may include common-sub-expression elimination, or moving
222 \subsection{Fortran Code
}
223 Unfortunately, on a few platforms
\normindex{Fortran
} compilers are
224 still better than C-compilers. For some machines (
{\eg} SGI
225 Power Challenge) the difference may be up to a factor of
3, in the
226 case of vector computers this may be even larger. Therefore, some of
227 the routines that take up a lot of computer time have been translated
228 into Fortran and even assembly code for Intel and AMD x86 processors.
229 In most cases, the Fortran or assembly loops should be selected
230 automatically by the
{\tt configure
} script when appropriate, but you can
231 also tweak this by setting options to the
{\tt configure
} script.
233 \section{Computation of the
1.0/sqrt function
}
235 \subsection{Introduction
}
236 The
{\gromacs} project started with the development of a $
1/
\sqrt{x
}$
237 processor that calculates:
239 Y(x) =
\frac{1}{ \sqrt{x
} }
241 As the project continued, the
{\intel} processor was used to implement
242 {\gromacs}, which now turned into almost a full software project. The
243 $
1/
\sqrt{x
}$ processor was implemented using a Newton-Raphson
244 iteration scheme for one step. For this it needed look-up tables to
245 provide the initial approximation. The $
1/
\sqrt{x
}$ function makes it
246 possible to use two almost independent tables for the exponent seed
247 and the fraction seed with the IEEE floating-point representation.
250 According to~
\cite{Bekker87
} the $
1/
\sqrt{x
}$ function can be evaluated using
251 the Newton-Raphson iteration scheme. The inverse function is:
253 X(y) =
\frac{1}{y^
{2}}
255 So instead of calculating:
264 can now be solved using Newton-Raphson. An iteration is performed by
267 y_
{n+
1} = y_
{n
} -
\frac{f(y_
{n
})
}{f'(y_
{n
})
}
270 The absolute error $
\varepsilon$, in this approximation is defined by:
272 \varepsilon \equiv y_
{n
} - q
274 Using Taylor series expansion to estimate the error results in:
276 \varepsilon _
{n+
1} = -
\frac{\varepsilon _
{n
}^
{2}}{2}
277 \frac{ f''(y_
{n
})
}{f'(y_
{n
})
}
280 according to~
\cite{Bekker87
} equation (
3.2). This is an estimation of the
283 \subsection{Applied to floating-point numbers
}
284 Floating-point numbers in IEEE
32 bit single-precision format have a nearly
285 constant relative error of $
\Delta x / x =
2^
{-
24}$. As seen earlier in the
286 Taylor series expansion equation (
\eqnref{taylor
}), the error in every
287 iteration step is absolute and in general dependent of $y$. If the error is
288 expressed as a relative error $
\varepsilon_{r
}$ the following holds:
290 \varepsilon _
{{r
}_
{n+
1}} \equiv \frac{\varepsilon_{n+
1}}{y
}
294 \varepsilon _
{{r
}_
{n+
1}} =
295 - (
\frac{\varepsilon _
{n
}}{y
} )^
{2} y
\frac{ f''
}{2f'
}
297 For the function $f(y) = y^
{-
2}$ the term $y f''/
2f'$ is constant (equal
298 to $-
3/
2$) so the relative error $
\varepsilon _
{r_
{n
}}$ is independent of $y$.
300 \varepsilon _
{{r
}_
{n+
1}} =
301 \frac{3}{2} (
\varepsilon_{r_
{n
}})^
{2}
305 The conclusion of this is that the function $
1/
\sqrt{x
}$ can be
306 calculated with a specified accuracy.
310 \newcommand{\twltt}{\tt}
311 \setlength{\unitlength}{0.0080in
}
312 \begin{picture
}(
489,
176)(
40,
390)
314 \put(
180,
505)
{$
\underbrace{\hspace{2.68in
}}$
}
315 \put(
60,
505)
{$
\underbrace{\hspace{0.88in
}}$
}
316 \put(
40,
510)
{\framebox(
480,
30)
{}}
317 \put(
45,
505)
{\vector(
0,-
1)
{ 15}}
318 \put(
40,
540)
{\dashbox{4}(
0,
0)
{}}
319 \multiput(
250,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
320 \multiput(
220,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
321 \multiput(
190,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
322 \multiput(
280,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
323 \multiput(
310,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
324 \multiput(
340,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
325 \multiput(
370,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
326 \multiput(
400,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
327 \multiput(
430,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
328 \multiput(
460,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
329 \multiput(
490,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
330 \multiput(
205,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
331 \multiput(
235,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
332 \multiput(
265,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
333 \multiput(
295,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
334 \multiput(
325,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
335 \multiput(
355,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
336 \multiput(
385,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
337 \multiput(
415,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
338 \multiput(
445,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
339 \multiput(
475,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
340 \multiput(
505,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
341 \put(
40,
510)
{\framebox(
480,
30)
{}}
342 \put(
40,
540)
{\dashbox{4}(
0,
0)
{}}
343 \multiput(
250,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
344 \multiput(
220,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
345 \multiput(
190,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
346 \multiput(
160,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
347 \multiput(
130,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
348 \multiput(
100,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
349 \multiput(
280,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
350 \multiput(
310,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
351 \multiput(
340,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
352 \multiput(
370,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
353 \multiput(
400,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
354 \multiput(
430,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
355 \multiput(
460,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
356 \multiput(
490,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
357 \multiput(
70,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
358 \put(
55,
540)
{\line(
0,-
1)
{ 30}}
359 \multiput(
85,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
360 \multiput(
115,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
361 \multiput(
145,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
362 \multiput(
205,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
363 \multiput(
235,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
364 \multiput(
265,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
365 \multiput(
295,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
366 \multiput(
325,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
367 \multiput(
355,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
368 \multiput(
385,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
369 \multiput(
415,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
370 \multiput(
445,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
371 \multiput(
475,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
372 \multiput(
505,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
373 \put(
175,
540)
{\line(
0,-
1)
{ 30}}
374 \put(
175,
540)
{\line(
0,-
1)
{ 30}}
375 \multiput(
145,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
376 \multiput(
115,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
377 \multiput(
85,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
378 \put(
55,
540)
{\line(
0,-
1)
{ 30}}
379 \multiput(
70,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
380 \multiput(
100,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
381 \multiput(
130,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
382 \multiput(
160,
540)(
0.00000,-
8.57143)
{4}{\line(
0,-
1)
{ 4.286}}
383 \put(
345,
470)
{\makebox(
0,
0)
[lb
]{\raisebox{0pt
}[0pt
][0pt
]{\twltt $F$
}}}
384 \put(
110,
470)
{\makebox(
0,
0)
[lb
]{\raisebox{0pt
}[0pt
][0pt
]{\twltt $E$
}}}
385 \put(
40,
470)
{\makebox(
0,
0)
[lb
]{\raisebox{0pt
}[0pt
][0pt
]{\twltt $S$
}}}
386 \put(
505,
545)
{\makebox(
0,
0)
[lb
]{\raisebox{0pt
}[0pt
][0pt
]{\twltt $
0$
}}}
387 \put(
160,
545)
{\makebox(
0,
0)
[lb
]{\raisebox{0pt
}[0pt
][0pt
]{\twltt $
23$
}}}
388 \put(
40,
545)
{\makebox(
0,
0)
[lb
]{\raisebox{0pt
}[0pt
][0pt
]{\twltt $
31$
}}}
389 \put(
140,
400)
{\makebox(
0,
0)
[lb
]{\raisebox{0pt
}[0pt
][0pt
]{\twltt $Value=(-
1)^
{S
}(
2^
{E-
127})(
1.F)$
}}}
390 \put(
505,
545)
{\makebox(
0,
0)
[lb
]{\raisebox{0pt
}[0pt
][0pt
]{\twltt $
0$
}}}
391 \put(
160,
545)
{\makebox(
0,
0)
[lb
]{\raisebox{0pt
}[0pt
][0pt
]{\twltt $
23$
}}}
392 \put(
40,
545)
{\makebox(
0,
0)
[lb
]{\raisebox{0pt
}[0pt
][0pt
]{\twltt $
31$
}}}
393 \put(
140,
400)
{\makebox(
0,
0)
[lb
]{\raisebox{0pt
}[0pt
][0pt
]{\twltt $Value=(-
1)^
{S
}(
2^
{E-
127})(
1.F)$
}}}
396 \caption{IEEE single-precision floating-point format
}
400 \subsection{Specification of the look-up table
}
401 To calculate the function $
1/
\sqrt{x
}$ using the previously mentioned
402 iteration scheme, it is clear that the first estimation of the solution must
403 be accurate enough to get precise results. The requirements for the
406 \item Maximum possible accuracy with the used IEEE format
407 \item Use only one iteration step for maximum speed
410 The first requirement states that the result of $
1/
\sqrt{x
}$ may have a
411 relative error $
\varepsilon_{r
}$ equal to the $
\varepsilon_{r
}$ of a IEEE
32
412 bit single-precision floating-point number. From this, the $
1/
\sqrt{x
}$
413 of the initial approximation can be derived, rewriting the definition of
414 the relative error for succeeding steps (
\eqnref{epsr
}):
416 \frac{\varepsilon_{n
}}{y
} =
417 \sqrt{\varepsilon_ {r_
{n+
1}} \frac{2f'
}{yf''
}}
419 So for the look-up table the needed accuracy is:
421 \frac{\Delta Y
}{Y
} =
\sqrt{\frac{2}{3} 2^
{-
24}}
424 which defines the width of the table that must be $
\geq 13$ bit.
426 At this point the relative error, $
\varepsilon_{r_
{n
}}$, of the look-up table
427 is known. From this the maximum relative error in the argument can be
428 calculated as follows. The absolute error $
\Delta x$ is defined as:
430 \Delta x
\equiv \frac{\Delta Y
}{Y'
}
434 \frac{\Delta x
}{Y
} =
\frac{\Delta Y
}{Y
} (Y')^
{-
1}
438 \Delta x = constant
\frac{Y
}{Y'
}
440 For the $
1/
\sqrt{x
}$ function, $Y / Y'
\sim x$ holds, so
441 $
\Delta x / x = constant$. This is a property of the used floating-point
442 representation as earlier mentioned. The needed accuracy of the argument of the
443 look-up table follows from:
445 \frac{\Delta x
}{x
} = -
2 \frac{\Delta Y
}{Y
}
447 So, using the floating-point accuracy (
\eqnref{accy
}):
449 \frac{\Delta x
}{x
} = -
2 \sqrt{\frac{2}{3} 2^
{-
24}}
451 This defines the length of the look-up table which should be $
\geq 12$ bit.
453 \subsection{Separate exponent and fraction computation
}
454 The used IEEE
32 bit single-precision floating-point format specifies
455 that a number is represented by a exponent and a fraction. The previous
456 section specifies for every possible floating-point number the look-up table
457 length and width. Only the size of the fraction of a floating-point number
458 defines the accuracy. The conclusion from this can be that the size of the
459 look-up table is length of look-up table, earlier specified, times the size of
460 the exponent ($
2^
{12}2^
{8},
1Mb$). The $
1/
\sqrt{x
}$ function has the
461 property that the exponent is independent of the fraction. This becomes clear
462 if the floating-point representation is used. Define:
464 x
\equiv (-
1)^
{S
}(
2^
{E-
127})(
1.F)
467 See
\figref{ieee
}, where $
0 \leq S
\leq 1$, $
0 \leq E
\leq 255$,
468 $
1 \leq 1.F <
2$ and $S$, $E$, $F$ integer (normalization conditions).
469 The sign bit ($S$) can be omitted because $
1/
\sqrt{x
}$ is only defined
470 for $x >
0$. The $
1/
\sqrt{x
}$ function applied to $x$ results in:
472 y(x) =
\frac{1}{\sqrt{x
}}
476 y(x) =
\frac{1}{\sqrt{(
2^
{E-
127})(
1.F)
}}
478 This can be rewritten as:
480 y(x) = (
2^
{E-
127})^
{-
1/
2}(
1.F)^
{-
1/
2}
485 (
2^
{E'-
127})
\equiv (
2^
{E-
127})^
{-
1/
2}
488 1.F'
\equiv (
1.F)^
{-
1/
2}
490 Then $
\frac{1}{\sqrt{2}} <
1.F'
\leq 1$ holds, so the condition
491 $
1 \leq 1.F' <
2$, which is essential for normalized real representation, is
492 not valid anymore. By introducing an extra term, this can be corrected.
493 Rewrite the $
1/
\sqrt{x
}$ function applied to floating-point numbers (
\eqnref{yx
}) as:
495 y(x) = (
2^
{\frac{127-E
}{2}-
1}) (
2(
1.F)^
{-
1/
2})
499 (
2^
{E'-
127})
\equiv (
2^
{\frac{127-E
}{2}-
1})
503 1.F'
\equiv 2(
1.F)^
{-
1/
2}
506 Then $
\sqrt{2} <
1.F
\leq 2$ holds. This is not the exact valid range as
507 defined for normalized floating-point numbers in
\eqnref{fpdef
}.
508 The value $
2$ causes the problem. By mapping this value on the nearest
509 representation $<
2$, this can be solved. The small error that is introduced
510 by this approximation is within the allowable range.
512 The integer representation of the exponent is the next problem. Calculating
513 $(
2^
{\frac{127-E
}{2}-
1})$ introduces a fractional result if $(
127-E) = odd$.
514 This is again easily accounted for by splitting up the calculation into an
515 odd and an even part. For $(
127-E) = even$ $E'$ in equation (
\eqnref{exp
})
516 can be exactly calculated in integer arithmetic as a function of $E$.
518 E' =
\frac{127-E
}{2} +
126
521 For $(
127-E) = odd$ equation (
\eqnref{yx
}) can be rewritten as:
523 y(x) = (
2^
{\frac{127-E-
1}{2}})(
\frac{1.F
}{2})^
{-
1/
2}
527 E' =
\frac{126-E
}{2} +
127
529 which also can be calculated exactly in integer arithmetic.
530 {\bf Note
} that the fraction is automatically corrected for its range earlier
531 mentioned, so the exponent does not need an extra correction.
533 The conclusions from this are:
535 \item The fraction and exponent look-up table are independent. The fraction
536 look-up table exists of two tables (odd and even exponent) so the odd/even
537 information of the exponent (lsb bit) has to be used to select the right
539 \item The exponent table is an
256 x
8 bit table, initialized for $odd$
540 and $even$.
% as specified before
543 \subsection{Implementation
}
544 The look-up tables can be generated by a small C program, which uses
545 floating-point numbers and operations with IEEE
32 bit single-precision format.
546 Note that because of the $odd$/$even$ information that is needed, the
547 fraction table is twice the size earlier specified (
13 bit i.s.o.
12 bit).
548 %\figref{expgen} in the appendix shows how to fill the exponent table,
549 %\figref{fractgen} shows how to fill the fraction table.
551 The function according to~
\eqnref{nr
} has to be implemented.
552 Applied to the $
1/
\sqrt{x
}$ function, equation~
\eqnref{inversef
} leads to:
554 f = a -
\frac{1}{y^
{2}}
562 y_
{n+
1} = y_
{n
} -
\frac{ a -
\frac{1}{y^
{2}_
{n
}} }{ \frac{2}{y^
{3}_
{n
}} }
566 y_
{n+
1} =
\frac{y_
{n
}}{2} (
3 - a y^
{2}_
{n
})
568 Where $y_
{0}$ can be found in the look-up tables, and $y_
{1}$ gives the result
569 to the maximum accuracy.
570 %In \figref{as_implementation}) the assembler implementation is given.
571 It is clear that only one iteration extra (in double
572 precision) is needed for a double-precision result.
574 % LocalWords: Virial virial triclinic intra mol mshift shiftvec sqrt SPC lj yf
575 % LocalWords: coul Fortran SGI AMD Raphson IEEE taylor epsr accy ieee yx fpdef
576 % LocalWords: lsb nr inversef src formulae GROMACS