* Initial implementation of C++ API
* Add kernel interface and information for API generation
* API updates for updating electrode parameters
* Add serialization proxy for ConstantPotentialForce
* Update file headers
* Add CG error tolerance and fix units on getCharges() return value
* Initial implementation of matrix solver
* Fixes and conjugate gradient solver
* Try to fix Linux and Windows builds
* Make sure charge constraint target is on total charge
* Restore handling of exceptions like NonbondedForce since they won't involve electrode atoms
* Ameliorate numerical instability in constrained conjugate gradient
* Fix uninitialized pointers, memory leak, and style
* Set CG tolerance units in Python API
* Test ConstantPotentialForce serialization
* Read/write ExceptionsUsePeriodicBoundaryConditions as bool
* Improve constrained conjugate gradient robustness to roundoff error accumulation
* Recompute matrix if electrode atoms move due to setPositions()
* Tolerance is now in gradient (potential) units again
* Add neutralizing background correction
* Add Python API tests
* Fixes for CG and nonbonded exceptions
* Add initial tests checking against existing NonbondedForce behavior
* Expand test suite and fix some implementation issues
* Add additional tests using larger reference system
* Add Gaussian test
* Finish test against reference computation
* CPU platform implementation
* Fixes for compilation on some platforms
* Fixes for constant potential with AVX/AVX2
* Test linking CPU PME library to constant potential test directly
* Older SWIG versions don't support Python set to C++ set conversion
* Add user guide entry
* Increase speed of reference test
* Conditional building constant potential CPU test is unreliable
* Debugging
* Miscellaneous fixes and improvements for CI
* Cache charges so solver will not run if system and coordinates have not changed
* Preconditioner flag, stability, and automatic detection improvements
* Add GPU platform-specific constant potential kernel classes
* PME and device-host I/O changes to support constant potential
* Initial common constant potential implementation
* Constant potential fixes:
* Fix preconditioner PME position/charge save/restore logic
* Fix reduction synchronization in constant potential solver kernels
* Add double-float accumulation for conjugate gradient solver when
double unsupported by hardware
* Improve conditioning of a test system, and make sure particles are in or
out of cutoff for consistency and ease of comparing between platforms
* Reorder guess charges for CG when atom reordering changes positions
* Remove PME queue for now
* Trying to debug optimized direct space derivative kernel
* Remove extraneous debugging lines
* Style updates; just make CPU preconditioner double precision
* Debugging updated optimized direct derivatives kernel for all but OpenCL CPU
* OpenCL CPU implementation of direct space derivatives, and cleanup
* Try to make test even shorter to not time out on CI
* Temporary - Debugging
* Debugging
* Debugging
* Debugging
* Debugging
* Remove debugging code and fix reduction synchronization
* Fix other reductions
* Debugging - are tests hanging or just slow on CI?
* Debugging
* Debugging
* Fix macro for case when double precision is available on hardware
* Remove changes for debugging again
* Try to improve matrix solver cache locality by uploading transpose
* Fixes for atom ordering and periodic images
* Can't rely on reorder listener for cell offset updates
* Test reducing number of contexts and timing for CI
* Debugging
* Remove timing code and revert debugging changes
* Matrix solver and plasma term optimizations
* Reduce CG solver kernel calls and downloads
* Don't read back convergence flag from global memory
* Update PME due to refactoring in master branch
* Faster matrix solver (1st step)
* Faster matrix solver for CUDA
* Faster matrix solver compatibility with non-CUDA platforms
* Matrix solver fixes
* Use warp shuffle reductions when possible
* Attempt to work around intermittent compiler crash in Intel CPU OpenCL
* Optimize CG solver kernel 1
* Rework CG solver so some kernels can use more than 1 block
* Don't run out of shared memory
* Asynchronously download convergence flag while clearing buffers
---------
Co-authored-by: Evan Pretti <pretti@sh03-17n15.int>