4 These document fixes for issues that have been fixed for the 2016
5 release, but which have not been back-ported to other branches.
7 Fixed two problems related to restarts for velocity-Verlet
8 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
9 The first problem is more serious; in addition to causing problems
10 with restarts in most cases for velocity-Verlet integrators plus either
11 Berendsen or v-rescale temperature-coupling algorithms, the
12 temperature coupling code was called twice. This made the distribution of
13 kinetic energies too broad (but with the correct average).
14 Other algorithm combinations were unaffected.
16 In the second problem, the initial step after restarts with velocity-Verlet
17 integrators and either Berendsen or v-rescale temperature-coupling algorithms
18 had too high a pressure because they used an empty virial matrix that
19 was only filled with MTTK pressure control. The effects of this bug were
20 very small; it only affected the volume integration for one step on restarts.
24 Fixed Verlet buffer calculation with nstlist=1
25 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
26 Under rare circumstances the Verlet buffer calculation code was
27 called with nstlist=1, which caused a division by zero. The division
28 by zero is now avoided.
29 Furthermore, grompp now also determines and prints the Verlet buffer
30 sizes with nstlist=1, which provider the user information and adds
35 Fixed large file issue on 32-bit platforms
36 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
37 At some point gcc started to issue a warning instead of a fatal error
38 for the checking code; fixed to really generate an error now.
42 Avoided using abort() for fatal errors
43 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
44 This avoids situations that produce useless core dumps.
48 Fixed possible division by zero in polarization code
49 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
51 Avoided numerical overflow with overlapping atoms in Verlet scheme
52 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
53 The Verlet-scheme kernels did not allow overlapping atoms, even if
54 they were not interacting (in contrast to the group kernels). Fixed by
55 clamping the interaction distance so it can not become smaller than
56 ~6e-4 in single and ~1e-18 in double, and when this number is later
57 multiplied by zero parameters it will not influence forces. The
58 clamping should never affect normal interactions; mdrun would
59 previously crash for distances that were this small. On Haswell, RF
60 and PME kernels get 3% and 1% slower, respectively. On CUDA, RF and
61 PME kernels get 1% and 2% faster, respectively.
66 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
67 The check in the pull code for COM distances close to half the box
68 was too strict for directional pulling. Now dimensions orthogonal
69 to the pull vector are no longer checked. (The check was actually
70 not strict enough for directional pulling along x or y in triclinic
71 units cells, but that is a corner case.)
72 Furthermore, the direction-periodic hint is now only printed with
77 Add detection for ARMv7 cycle counter support
78 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
79 ARMv7 requires special kernel settings to allow cycle
80 counters to be read. This change adds a cmake setting
81 to enable/disable counters. On all architectures but ARMv7
82 it is enabled by default, and on ARMv7 we run a small test
83 program to see if the can be executed successfully. When
84 cross-compiling to ARMv7 counters will be disabled, but
85 either choice can be overridden by setting a value for
86 GMX_CYCLECOUNTERS in cmake.
90 Introduced fatal error for too few frames in ``gmx dos``
91 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
92 To prevent ``gmx dos`` from crashing with an incomprehensible error
93 message when there are too few frames, test for this.
97 Properly reset CUDA application clocks
98 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
99 We now store the application clock values we read when starting mdrun
100 and reset to these values, but only when clocks have not been changed
101 (by another process) in the meantime.
105 Fixed replica-exchange debug output to all go to the debug file
106 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
107 When ``mdrun -debug`` was selected with replica exchange, some of the
108 order description was printed to mdrun's log file, but it looks like the
109 actual numbers were being printed to the debug log. This puts them
110 both in the debug log.
112 Fixed gmx mdrun -membed to always run on a single rank
113 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
114 This used to give a fatal error if default thread-MPI mdrun had chosen
115 more than one rank, but it will now correctly choose to use a single rank.
117 Fixed issues with using int for number of simulation steps
118 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
119 Mostly we use a 64-bit integer, but we messed up a few
122 During mdrun -rerun, edr writing complained about the negative step
123 number, implied it might be working around it, and threatened to
124 crash, which it can't do. Silenced the complaint during writing,
125 and reduced the scope of the message when reading.
127 Fixed TNG wrapper routines to pass a 64-bit integer like they should.
129 Made various infrastructure use gmx_int64_t for consistency, and noted
130 where in a few places the practical range of the value stored in such
131 a type is likely to be smaller. We can't extend the definition of XTC
132 or TRR, so there is no proper solution available. TNG is already good,
137 Fixed trr magic-number reading
138 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
139 The trr header-reading routine returned an "OK" value even if the
140 magic number was wrong, which might lead to chaotic results
141 everywhere. This led to problems if other code (e.g. cpptraj)
142 mistakenly wrote a wrong-endian trr file, which was then used with
143 GROMACS. (This should never be a thing for XDR files, which are
144 defined to be big endian, but such code has existed.)
148 Changed to use only legal characters in OpenCL cache filename
149 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
150 The option to cache JIT-compiled OpenCL short-ranged kernels needed to
151 be hardened, so that mdrun would write files whose names would usually
152 be specific to the device, but also only contain filenames that would
153 work everywhere, ie only alphanumeric characters from the current
156 Fixes for bugs introduced during development
157 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
159 These document fixes for issues that were identified as having been
160 introduced into the release-2016 branch since it diverged from
161 release-5-1. These will not appear in the final release notes, because
162 no formal release is thought to have had the problem. Of course, the
163 Redmine issues remain available should further discussion arise.
165 Fixed bug in v-rescale thermostat & replica exchange
166 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
168 Commit 2d0247f6 made random numbers for the v-rescale thermostat that
169 did not vary over MD steps, and similarly the replica-exchange random
170 number generator was being reset in the wrong place.
174 Fixed vsite bug with MPI+OpenMP
175 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
176 The recent commit b7e4f30d caused non-local virtual sites not be
177 treated when using OpenMP. This means their coordinates lagged one
178 step behind and their forces are not spread to the atoms, leading
179 to small errors in the forces. Note that non-local virtual sites are
180 only used when local virtual sites use them as a constructing atom;
181 the most common case is a C/N in a CH3/NH3 group with vsite H's.
182 Also added a check on the vsite count for debug builds.
186 Fixed some thread affinity cases
187 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
188 Fixed one deadlock in newly refactored thread-affinity code, which
189 happened with automatic pinning, if only part of the nodes were full.
191 There is one deadlock still theoretically possible: if thread-MPI
192 reports that setting the affinity is not possible only on a subset of
193 ranks, the code deadlocks. This has always been there and might never
194 happen, so it is not fixed here.
196 Removed OpenMP overhead at high parallelization
197 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
198 Commit 6d98622d introduced OpenMP parallelization for for loops
199 clearing rvecs of increasing rvecs. For small numbers of atoms per
200 MPI rank this can increase the cost of the loop by up to a factor 10.
201 This change disables OpenMP parallelization at low atom count.
203 Removed std::thread::hardware_concurrency()
204 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
205 We should not use std::thread::hardware_concurrency() for determining
206 the logical processor count, since it only provides a hint.
207 Note that we still have 3 different sources for this count left.
209 Added support for linking against external TinyXML-2
210 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
211 This permits convenient packaging of GROMACS by distributions, but
212 it got lost from gerrit while rebasing.
216 Fixed data race in hwinfo with thread-MPI
217 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
220 Fixes for Power7 big-endian
221 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
222 Now compiles and passes all tests in both double and single precision
223 with gcc 4.9.3, 5.4.0 and 6.1.0 for big-endian VSX.
225 The change for the code in incrStoreU and decrStoreU addresses an
226 apparent regression in 6.1.0, where the compiler thinks the type
227 returned by vec_extract is a pointer-to-float, but my attempts a
228 reduced test case haven't reproduced the issue.
230 Added some test cases that might hit more endianness cases in future.
232 We have not been able to test this on little-endian Power8; there is
233 a risk the gcc-specific permutations could be endian-sensitive. We'll
234 test this when we have hardware access, or if somebody runs the tests
240 Reduce hwloc & cpuid test requirements
241 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
242 On some non-x86 linux platforms hwloc does not report
243 caches, which means it will fail our strict test
244 requirements of full topology support. There is no
245 problem whatsoever with this, so we reduce the
246 test to only require basic support from hwloc - this
247 is still better than anything we can get ourselves.
248 Similarly for CPUID, it is not an error for an
249 architecture to not provide any of the specific flags
250 we have defined, so avoid marking it as such.
254 Work around compilation issue with random test on 32-bit machines
255 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
256 gcc 4.8.4 running on 32-bit Linux fails a few
257 tests for random distributions. This seems
258 to be caused by the compiler doing something
259 strange (that can lead to differences in the lsb)
260 when we do not use the result as floating-point
261 values, but rather do exact binary comparisions.
262 This is valid C++, and bad behaviour of the
263 compiler (IMHO), but technically it is not required
264 to produce bitwise identical results at high
265 optimization. However, by using floating-point
266 tests with zero ULP tolerance the problem
271 Updated ``gmx wham`` for the new pull setup
272 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
273 This bring ``gmx wham`` up to date with the new pull setup where the pull
274 type and geometry can now be set per coordinate and the pull
275 coordinate has changed and is more configurable.
277 Fix membed with partial revert of 29943f
278 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
279 The membrane embedding algorithm must be initialized before
280 we call init_forcerec(), so it cannot trivially be moved into
281 do_md(). This has to be cleaned up anyway for release-2017
282 since we will remove the group scheme be then, but for now
283 this fix will allow us have the method working in release-2016.