* Make reference/CPU minimizer into a kernel
* Add per-platform support for GPU minimization
* Initial implementation of GPU minimization
* Fixes
* Increase robustness when initial gradient is huge
* Handle overflow leading to non-finite values gracefully
* Handle large forces in single precision more robustly
* Optimize kernels
* Fix kernel launch size
* Update banner years
* Don't create MinimizeKernel until first minimization requested
* Make some compile-time constants into kernel arguments
* Consolidate scale calculation kernel
* Condense alpha/beta reduction kernels using atomics
* Condense line search dot kernels with reductions
* Remove a download, and download grad norm separately
* Asynchronously check lbfgs convergence condition
* Restructure line search to avoid download waiting
* Start line search preemptively in case CPU evaluation is not needed
* In rare cases, constraint error might not decrease after one optimization round
* Better handling of unsupported 64-bit atomics, use FLT_MAX
* Pick gradient mode based on GPU vs. CPU evaluation
* Rework getDiff/getScale reduction, remove reduceBuffer
* Older CUDA might not like float hex literals
* Fix error in a comment
* API for querying devices
* CUDA and HIP implementations of getDevices()
* Fix test failures
* Fix test failures
* CUDA returns correct devices even if no context has been created
* Return a single device for Reference and CPU
* Fix CI failure
* Basic LCPO support
* Add basic test for LCPO from a prmtop file
* API for LCPOForce
* Started LCPO reference implementation
* Finished reference forces & test cases
* Use other test for finite difference since grid might have discontinuous forces
* Reference platform formatting
* Initial implementation of CPU platform
* Bugfixes
* More vectorization and improve neighbor list query speed
* Parallelize part of neighbor search
* Check box size for LCPO with periodic boundary conditions
* Fixes for updating parameters in context
* GBSAOBCForce doesn't use first & last indices for updates, so no need for this optimization here
* Changes to neighbor checking and optimization
* Fixes and minor changes
* Add global surface tension parameter
* Only process half of the pairs in the neighbor list
* Remove unnecessary checks
* Initial version of common platform implementation
* Asynchronously download neighbor list size
* Debugging
* Do pair precomputation in copyPairsToNeighborList
* Recompute interactions instead of scanning neighbor list in inner loop
* Condense position array before computations
* Also make neighbor count download asynchronous on device
* Fixes for kernel launching
* Topology-based LCPO parameter assignment
* Fixes, and use test system for LCPO with nucleic acids
* Always raise instead of warn when LCPO parameters can't be assigned
* Use Amber convention for phosphates
* Initial implementation of PythonForce
* Continuing implementation of PythonForce
* Tests for PythonForce
* Fix memory leaks
* Documentation for PythonForce
* Fixed incorrect return type
* Fix compilation error on Python older than 3.12
* Handle all dtypes
* Optimizations to PythonForce
* Optimized getPositions()
* Test all platforms
* Fix test failures
* Initial implementation of C++ API
* Add kernel interface and information for API generation
* API updates for updating electrode parameters
* Add serialization proxy for ConstantPotentialForce
* Update file headers
* Add CG error tolerance and fix units on getCharges() return value
* Initial implementation of matrix solver
* Fixes and conjugate gradient solver
* Try to fix Linux and Windows builds
* Make sure charge constraint target is on total charge
* Restore handling of exceptions like NonbondedForce since they won't involve electrode atoms
* Ameliorate numerical instability in constrained conjugate gradient
* Fix uninitialized pointers, memory leak, and style
* Set CG tolerance units in Python API
* Test ConstantPotentialForce serialization
* Read/write ExceptionsUsePeriodicBoundaryConditions as bool
* Improve constrained conjugate gradient robustness to roundoff error accumulation
* Recompute matrix if electrode atoms move due to setPositions()
* Tolerance is now in gradient (potential) units again
* Add neutralizing background correction
* Add Python API tests
* Fixes for CG and nonbonded exceptions
* Add initial tests checking against existing NonbondedForce behavior
* Expand test suite and fix some implementation issues
* Add additional tests using larger reference system
* Add Gaussian test
* Finish test against reference computation
* CPU platform implementation
* Fixes for compilation on some platforms
* Fixes for constant potential with AVX/AVX2
* Test linking CPU PME library to constant potential test directly
* Older SWIG versions don't support Python set to C++ set conversion
* Add user guide entry
* Increase speed of reference test
* Conditional building constant potential CPU test is unreliable
* Debugging
* Miscellaneous fixes and improvements for CI
* Cache charges so solver will not run if system and coordinates have not changed
* Preconditioner flag, stability, and automatic detection improvements
* Add GPU platform-specific constant potential kernel classes
* PME and device-host I/O changes to support constant potential
* Initial common constant potential implementation
* Constant potential fixes:
* Fix preconditioner PME position/charge save/restore logic
* Fix reduction synchronization in constant potential solver kernels
* Add double-float accumulation for conjugate gradient solver when
double unsupported by hardware
* Improve conditioning of a test system, and make sure particles are in or
out of cutoff for consistency and ease of comparing between platforms
* Reorder guess charges for CG when atom reordering changes positions
* Remove PME queue for now
* Trying to debug optimized direct space derivative kernel
* Remove extraneous debugging lines
* Style updates; just make CPU preconditioner double precision
* Debugging updated optimized direct derivatives kernel for all but OpenCL CPU
* OpenCL CPU implementation of direct space derivatives, and cleanup
* Try to make test even shorter to not time out on CI
* Temporary - Debugging
* Debugging
* Debugging
* Debugging
* Debugging
* Remove debugging code and fix reduction synchronization
* Fix other reductions
* Debugging - are tests hanging or just slow on CI?
* Debugging
* Debugging
* Fix macro for case when double precision is available on hardware
* Remove changes for debugging again
* Try to improve matrix solver cache locality by uploading transpose
* Fixes for atom ordering and periodic images
* Can't rely on reorder listener for cell offset updates
* Test reducing number of contexts and timing for CI
* Debugging
* Remove timing code and revert debugging changes
* Matrix solver and plasma term optimizations
* Reduce CG solver kernel calls and downloads
* Don't read back convergence flag from global memory
* Update PME due to refactoring in master branch
* Faster matrix solver (1st step)
* Faster matrix solver for CUDA
* Faster matrix solver compatibility with non-CUDA platforms
* Matrix solver fixes
* Use warp shuffle reductions when possible
* Attempt to work around intermittent compiler crash in Intel CPU OpenCL
* Optimize CG solver kernel 1
* Rework CG solver so some kernels can use more than 1 block
* Don't run out of shared memory
* Asynchronously download convergence flag while clearing buffers
---------
Co-authored-by: Evan Pretti <pretti@sh03-17n15.int>
* Replace C++ code examples in Python API docs
* Use newer Python for building docs
* Remove sphinx version pin
* Add Python versions of code examples
* Minor edit
* Add Python version of one more code example
* Began implementing QTBIntegrator
* Adaptation and deconvolution
* Continuing reference implementation
* Continuing to implement QTBIntegrator
* Use common thread pool
* More tests, documentation, and threading
* Fix segfault
* Serialize adapted friction when creating a State
* Beginning of GPU implementation
* Added missing files
* Bug fixes
* Fixed inverse FFT
* Continuing GPU implementation
* Checkpointing
* Bug fixes
* Test cases run faster
* Changes needed for latest main branch
* Minor optimizations
* Documentation
* Fixed atom reordering
* Added parahydrogen test case
* Workaround for bug in Microsoft's compiler
* Added a Python test
* Normalize kernel in deconvolution
* Minor documentation improvements
* Added computeCurrentPressure() to MonteCarloBarostat
* Use instantaneous temperature to compute pressure
* Added computeCurrentPressure() to MonteCarloAnisotropicBarostat
* Added computeCurrentPressure() to MonteCarloMembraneBarostat
* Fixed compilation error
* Fixed error in typemap
* Added documentation on computing pressure
* Fixed CUDA compilation errors
* Made test case more robust
* Made a test case more robust
* Added computeCurrentPressure() to MonteCarloFlexibleBarostat
* Fixed compilation error
* More documentation on computing pressure
* Created DPDIntegrator class
* Reference implementation of DPDIntegrator
* Build neighbor list for DPDIntegrator
* Minor fixes
* Documentation for DPDIntegrator
* Python API for DPDIntegrator
* Preliminary OpenCL implementation of DPDIntegrator
* Enable USE_PERIODIC
* Use updated positions in DPD thermostat
* Working on neighbor list for OpenCL DPDIntegrator
* ReorderListener for particle types
* Serialization for DPDIntegrator
* CUDA implementation of DPDIntegrator
* HIP implementation of DPDIntegrator
* Fixed compile error in Python wrapper
* Fixed compile error in wrappers
* Fixed uninitialized memory in reference neighbor list
* Added DPDIntegrator to C++ API docs
* Fixed incorrect launch size
* Fixed nan in DPD random number generator
* Minor optimizations
* Improved load balancing
* Fixed an indexing error
* Neighbor list uses the maximum cutoff of any force
* Fixed HIP compilation error
* Fixed access to invalid memory
* Added test case for diffusion coefficient
* Try to debug segfaults on CI
* Debugging
* Debugging
* Debugging
* Debugging
* Debugging
* Debugging
* Possible fix
* Debugging
* Debugging
* Debugging
* Use correct block size on CPU OpenCL
* Workaround for bug in Intel's OpenCL for CPUs
* Removed an unnecessary define
* Removed debugging code
* Include Dart
* More Intel workarounds
* Workaround for error in NVIDIA OpenCL
* Optimize CustomNonbondedForce.updateParametersInContext()
* Optimized uploading changed values to GPU
* Optimized updateParametersInContext() for lots of bonded forces
* Optimized updateParametersInContext() for CustomExternalForce
* Optimized updateParametersInContext() for NonbondedForce
* Code changes for HIP platform
* Draft integration of the Alchemical Transfer Method (ATM) plugin
* Attempt to store and retrieve forces--does not compile
* Implement addForce()/getForce() methods
* Throw exception when specifying properties without a Platform (#4130)
* Fixed DOF calculation for NoseHooverIntegrator (#4128)
* Fix variance in documentation of VerletIntegrator (#4138)
* Python API for ATMForce
* Fixed compilation error
* Minor cleanup of formatting and documentation
* Files for ATMForce test cases
* More cleanup
* Removed variable groups
* Test ATMForce with two particles
* More tests for ATMForce plus fixes
* Added missing header
* Rework interface to pass displacements as vector of parameters
* Revert "Rework interface to pass displacements as vector of parameters"
This reverts commit 5e092031f3.
* Test with nonbonded force
* Allow energy expression to be customized
* Optional displacements at the initial state
* Fixed compilation error build C wrapper
* Address edge case of default energy expression
* Consistent naming of the variables of the displacement states
* Test of soft core function of the default energy expression
* Mark addForce() as taking ownership
* initial python test for ATMForce
* Test custom expressions
* Expanded C++ API documentation for ATMForce
* Energy parameter derivatives
* Serialization for ATMForce
* Documentation, cleanup, and fixes
* Fixed typos
* getPerturbationEnergy() computes energy
* Another test case
* Minor edits
---------
Co-authored-by: Peter Eastman <peastman@stanford.edu>
Co-authored-by: Michael Plainer <plainer@ymail.com>
* DrudeLangevinIntegrator has getSystemTemperature()
* DrudeNoseHooverIntegrator has getSystemTemperature()
* StateDataReporter reports system temperature for Drude systems
* Fixed incorrect return type
* Created MonteCarloFlexibleBarostat
* Improved test cases
* Documentation for MonteCarloFlexibleBarostat
* Added a missing include
* Serialization for MonteCarloFlexibleBarostat
* Added MonteCarloFlexibleBarostat to C++ API docs index
* Added citation for MonteCarloFlexibleBarostat
* Proof of concept matching current behaviour with Breathe
* Reorganise C++ API docs to work without autosummary
* Revert to Sphinx-native search
* Move remaining pip deps to conda
* Remove unnecessary lunrsearch templates
* Remove lunrsearch from cmake
* Tidy up layout of API docs
* Get code blocks to work with Breathe
Breathe doesn't seem to support the Doxygen <preformatted> tag.
It does support the <code> tag, but better yet it supports using
<verbatim> tags to embed rst into docstrings. This commit changes
all <pre> tags to use RST verbatim, and updates the Python
documentation machinery to support it too.
* Clarified some language
* Have doxygen exclude undocumented classes and have sphinx fail on warnings for C++ API docs
* List custom forces and integrators last
* Add breathe to documentation CI
* Typo
* Fix link to custom forces in extras.rst
* Have Breathe process only public classes
* Strip OpenMM:: prefix from rst files to avoid colons in links
* Remove existing private classes from EXCLUDE_SYMBOLS
* Add comment to C++ cmake describing why we promote warnings to errors
* Revise documentation build instructions for new dependencies
* Convert Nose-Hoover into LF middle scheme by copying NH kernels
* Rebrand VelocityVerletIntegrator as NoseHooverIntegrator
* Consolidate NH tests
* NoseHooverChainKernel begone
* Make Windows builds happy
* Add missing header for Windows build
* Fix mistake in CommonKernels header
* Add 6th Yoshida-Suzuki and make it the default