mirror of
https://github.com/openmm/openmm
synced 2026-06-03 06:39:48 +09:00
814 lines
41 KiB
ReStructuredText
814 lines
41 KiB
ReStructuredText
.. role:: code
|
||
.. raw:: html
|
||
|
||
<style> .code {font-family:monospace;} </style>
|
||
<style> .caption {text-align:center;} </style>
|
||
|
||
.. highlight:: c++
|
||
|
||
Introduction
|
||
############
|
||
|
||
This guide describes the internal architecture of the OpenMM library. It is
|
||
targeted at developers who want to add features to OpenMM, either by modifying
|
||
the core library directly or by writing plugins. If you just want to write
|
||
applications that use OpenMM, you do not need to read this guide; the Users
|
||
Manual tells you everything you need to know. This guide is *only* for
|
||
people who want to contribute to OpenMM itself.
|
||
|
||
It is organized as follows:
|
||
|
||
* Chapter :ref:`the-core-library` describes the architecture of the core OpenMM library. It
|
||
discusses how the high level and low level APIs relate to each other, and the
|
||
flow of execution between them.
|
||
* Chapter :ref:`writing-plugins` describes in detail how to write a plugin. It focuses on the two
|
||
most common types of plugins: those which define new Forces, and those which
|
||
implement new Platforms.
|
||
* Chapter :ref:`the-reference-platform` discusses the architecture of the reference Platform, providing
|
||
information relevant to writing reference implementations of new features.
|
||
* Chapter :ref:`the-cpu-platform` discusses the architecture of the CPU Platform, providing
|
||
information relevant to writing CPU implementations of new features.
|
||
* Chapter :ref:`the-opencl-platform` discusses the architecture of the OpenCL Platform, providing
|
||
information relevant to writing OpenCL implementations of new features.
|
||
* Chapter :ref:`the-cuda-platform` discusses the architecture of the CUDA Platform, providing
|
||
information relevant to writing CUDA implementations of new features.
|
||
|
||
|
||
This guide assumes you are already familiar with the public API and how to use
|
||
OpenMM in applications. If that is not the case, you should first read the
|
||
Users Manual and work through some of the example programs. Pay especially
|
||
close attention to the “Introduction to the OpenMM Library” chapter, since it
|
||
introduces concepts that are important in understanding this guide.
|
||
|
||
|
||
.. _the-core-library:
|
||
|
||
The Core Library
|
||
################
|
||
|
||
OpenMM is based on a layered architecture, as shown in the following diagram:
|
||
|
||
.. figure:: ../images/ArchitectureLayers.jpg
|
||
:align: center
|
||
:width: 100%
|
||
|
||
:autonumber:`Figure,Architecture Layers`\ : OpenMM architecture
|
||
|
||
The public API layer consists of the classes you access when using OpenMM in an
|
||
application: System; Force and its subclasses; Integrator and its subclasses;
|
||
and Context. These classes define a public interface but do no computation.
|
||
|
||
The next layer down consists of “implementation” classes that mirror the public
|
||
API classes: ContextImpl, ForceImpl, and a subclass of ForceImpl for each
|
||
subclass of Force (HarmonicBondForceImpl, NonbondedForceImpl, etc.). These
|
||
objects are created automatically when you create a Context. They store
|
||
information related to a particular simulation, and define methods for
|
||
performing calculations.
|
||
|
||
Note that, whereas a Force is logically “part of” a System, a ForceImpl is
|
||
logically “part of” a Context. (See :numref:`Figure,API Relationships`\ .) If you create many Contexts
|
||
for simulating the same System, there is still only one System and only one copy
|
||
of each Force in it. But there will be separate ForceImpls for each Context,
|
||
and those ForceImpls store information related to their particular Contexts.
|
||
|
||
|
||
.. figure:: ../images/SystemContextRelationships.jpg
|
||
:align: center
|
||
|
||
:autonumber:`Figure,API Relationships`\ : Relationships between public API and implementation layer objects
|
||
|
||
Also note that there is no “IntegratorImpl” class, because it is not needed.
|
||
Integrator is already specific to one Context. Many Contexts can all simulate
|
||
the same System, but each of them must have its own Integrator, so information
|
||
specific to one simulation can be stored directly in the Integrator.
|
||
|
||
The next layer down is the OpenMM Low Level API (OLLA). The important classes
|
||
in this layer are: Platform; Kernel; KernelImpl and its subclasses; and
|
||
KernelFactory. A Kernel is just a reference counted pointer to a KernelImpl;
|
||
the real work is done by KernelImpl objects (or more precisely, by instances of
|
||
its subclasses). A KernelFactory creates KernelImpl objects, and a Platform
|
||
ties together a set of KernelFactories, as well as defining information that
|
||
applies generally to performing computations with that Platform.
|
||
|
||
All of these classes (except Kernel) are abstract. A particular Platform
|
||
provides concrete subclasses of all of them. For example, the reference
|
||
platform defines a Platform subclass called ReferencePlatform, a KernelFactory
|
||
subclass called ReferenceKernelFactory, and a concrete subclass of each abstract
|
||
KernelImpl type: ReferenceCalcNonbondedForceKernel extends
|
||
CalcNonbondedForceKernel (which in turn extends KernelImpl),
|
||
ReferenceIntegrateVerletStepKernel extends IntegrateVerletStepKernel, and so on.
|
||
|
||
We can understand this better by walking through the entire sequence of events
|
||
that takes place when you create a Context. As an example, suppose you create a
|
||
System; add a NonbondedForce to it; create a VerletIntegrator; and then create a
|
||
Context for them using the reference Platform. Here is what happens.
|
||
|
||
#. The Context constructor creates a ContextImpl.
|
||
#. The ContextImpl calls :code:`createImpl()` on each Force in the System,
|
||
which creates an instance of the appropriate ForceImpl subclass.
|
||
#. The ContextImpl calls :code:`contextCreated()` on the Platform(), which
|
||
in turn calls :code:`setPlatformData()` on the ContextImpl. This allows
|
||
Platform-specific information to be stored in a ContextImpl. Every Platform has
|
||
its own mechanism for storing particle masses, constraint definitions, particle
|
||
positions, and so on. ContextImpl therefore allows the Platform to create an
|
||
arbitrary block of data and store it where it can be accessed by that Platform’s
|
||
kernels.
|
||
#. The ContextImpl calls :code:`createKernel()` on the Platform several
|
||
times to get instances of various kernels that it needs:
|
||
CalcKineticEnergyKernel, ApplyConstraintsKernel, etc.
|
||
|
||
#. For each kernel, the Platform looks up which KernelFactory has been
|
||
registered for that particular kernel. In this case, it will be a
|
||
ReferenceKernelFactory.
|
||
#. It calls :code:`createKernelImpl()` on the KernelFactory, which
|
||
creates and returns an instance of an appropriate KernelImpl subclass:
|
||
ReferenceCalcKineticEnergyKernel, ReferenceApplyConstraintsKernel, etc.
|
||
|
||
#. The ContextImpl loops over all of its ForceImpls and calls
|
||
:code:`initialize()` on each one.
|
||
|
||
#. Each ForceImpl asks the Platform to create whatever kernels it needs. In
|
||
this example, NonbondedForceImpl will request a CalcNonbondedForceKernel, and
|
||
get back a ReferenceCalcNonbondedForceKernel.
|
||
|
||
#. The ContextImpl calls :code:`initialize()` on the Integrator which, like
|
||
the other objects, requests kernels from the Platform. In this example,
|
||
VerletIntegrator requests an IntegrateVerletStepKernel and gets back a
|
||
ReferenceIntegrateVerletStepKernel.
|
||
|
||
|
||
At this point, the Context is fully initialized and ready for doing computation.
|
||
Reference implementations of various KernelImpls have been created, but they are
|
||
always referenced through abstract superclasses. Similarly, data structures
|
||
specific to the reference Platform have been created and stored in the
|
||
ContextImpl, but the format and content of these structures is opaque to the
|
||
ContextImpl. Whenever it needs to access them (for example, to get or set
|
||
particle positions), it does so through a kernel (UpdateStateDataKernel in this
|
||
case).
|
||
|
||
Now suppose that you call :code:`step()` on the VerletIntegrator. Here is
|
||
what happens to execute each time step.
|
||
|
||
#. The VerletIntegrator calls :code:`updateContextState()` on the
|
||
ContextImpl. This gives each Force an opportunity to modify the state of the
|
||
Context at the start of each time step.
|
||
|
||
#. The ContextImpl loops over its ForceImpls and calls
|
||
:code:`updateContextState()` on each one. In this case, our only ForceImpl is
|
||
a NonbondedForceImpl, which returns without doing anything. On the other hand,
|
||
if we had an AndersenThermostat in our System, its ForceImpl would invoke a
|
||
kernel to modify particle velocities.
|
||
|
||
#. The VerletIntegrator calls :code:`calcForcesAndEnergy()` on the
|
||
ContextImpl to request that the forces be computed.
|
||
|
||
#. The ContextImpl calls :code:`beginComputation()` on its
|
||
CalcForcesAndEnergyKernel. This initializes all the forces to zero and does any
|
||
other initialization the Platform requires before forces can be computed. For
|
||
example, some Platforms construct their nonbonded neighbor lists at this point.
|
||
#. The ContextImpl loops over its ForceImpls and calls
|
||
:code:`calcForcesAndEnergy()` on each one. In this case, we have a
|
||
NonbondedForceImpl which invokes its CalcNonbondedForceKernel to compute forces.
|
||
#. Finally, the ContextImpl calls :code:`finishComputation()` on its
|
||
CalcForcesAndEnergyKernel. This does any additional work needed to determine
|
||
the final forces, such as summing the values from intermediate buffers.
|
||
|
||
#. Finally, the VerletIntegrator invokes its IntegrateVerletStepKernel. This
|
||
takes the forces, positions, and velocities that are stored in a Platform-
|
||
specific format in the ContextImpl, uses them to compute new positions and
|
||
velocities, and stores them in the ContextImpl.
|
||
|
||
|
||
.. _writing-plugins:
|
||
|
||
Writing Plugins
|
||
###############
|
||
|
||
A plugin is a dynamic library that adds new features to OpenMM. It is typically
|
||
stored in the :code:`lib/plugins` directory inside your OpenMM installation,
|
||
and gets loaded along with all other plugins when the user calls
|
||
::
|
||
|
||
Platform::loadPluginsFromDirectory(Platform::getDefaultPluginsDirectory());
|
||
|
||
It is also possible to load plugins from a different directory, or to load them
|
||
individually by calling :code:`Platform::loadPluginLibrary()`\ .
|
||
|
||
Every plugin must implement two functions that are declared in the
|
||
PluginInitializer.h header file:
|
||
::
|
||
|
||
extern "C" void registerPlatforms();
|
||
extern "C" void registerKernelFactories();
|
||
|
||
When a plugin is loaded, these two functions are invoked to register any
|
||
Platforms and KernelFactories defined by the plugin. When many plugins are
|
||
loaded at once by calling :code:`Platform::loadPluginsFromDirectory()`\ ,
|
||
:code:`registerPlatforms()` is first called on all of them, then
|
||
:code:`registerKernelFactories()` is called on all of them. This allows one
|
||
plugin to define a Platform, and a different plugin to add KernelFactories to
|
||
it; the Platform is guaranteed to be registered by the first plugin before the
|
||
second plugin tries to add its KernelFactories, regardless of what order the
|
||
plugins happen to be loaded in.
|
||
|
||
Creating New Platforms
|
||
**********************
|
||
|
||
One common type of plugin defines a new Platform. There are three such plugins
|
||
that come with OpenMM: one for the CPU Platform, one for the CUDA Platform, and
|
||
one for the OpenCL Platform.
|
||
|
||
To define a new Platform, you must create subclasses of the various abstract
|
||
classes in the OpenMM Low Level API: a subclass of Platform, one or more
|
||
subclasses of KernelFactory, and a subclass of each KernelImpl. That is easy to
|
||
say, but a huge amount of work to actually do. There are many different
|
||
algorithms involved in computing forces, enforcing constraints, performing
|
||
integration, and so on, all of which together make up a Platform. Of course,
|
||
there is no requirement that every Platform must implement every possible
|
||
feature. If you do not provide an implementation of a particular kernel, it
|
||
simply means your Platform cannot be used for any simulation that requires that
|
||
kernel; if a user tries to do so, an exception will be thrown.
|
||
|
||
Your plugin’s :code:`registerPlatforms()` function should create an instance
|
||
of your Platform subclass, then register it by calling
|
||
:code:`Platform::registerPlatform()`\ . You also must register the
|
||
KernelFactory for each kernel your Platform supports. This can be done in the
|
||
:code:`registerKernelFactories()` function, or more simply, directly in the
|
||
Platform’s constructor. You can use as many different KernelFactories as you
|
||
want for different kernels, but usually it is simplest to use a single
|
||
KernelFactory for all of them. The support for multiple KernelFactories exists
|
||
primarily to let plugins add new features to existing Platforms, as described in
|
||
the next section.
|
||
|
||
Creating New Forces
|
||
*******************
|
||
|
||
Another common type of plugin defines new Forces and provides implementations of
|
||
them for existing Platforms. (Defining new Integrators is not specifically
|
||
discussed here, but the process is very similar.) There are two such plugins
|
||
that come with OpenMM. They implement the AMOEBA force field and Drude
|
||
oscillators, respectively.
|
||
|
||
As an example, suppose you want to create a new Force subclass called
|
||
StringForce that uses the equations of String Theory to compute the interactions
|
||
between particles. You want to provide implementations of it for all four
|
||
standard platforms: Reference, CPU, CUDA, and OpenCL.
|
||
|
||
The first thing to realize is that this *cannot* be done with only a plugin
|
||
library. Plugins are loaded dynamically at runtime, and they relate to the low
|
||
level API; but you must also provide a public API. Users of your class need to
|
||
create StringForce objects and call methods on them. That means providing a
|
||
header file with the class declaration, and a (non-plugin) library with the
|
||
class definition to link their code against. The implementations for particular
|
||
Platforms can be in plugins, but the public API class itself cannot. Or to put
|
||
it differently, the full “plugin” (from the user’s perspective) consists of
|
||
three parts: the library OpenMM loads at runtime (which is what OpenMM considers
|
||
to be the “plugin”), a second library for users to link their code against, and
|
||
a header file for them to include in their source code.
|
||
|
||
To define the API, you will need to create the following classes:
|
||
|
||
#. StringForce. This is the public API for your force, and users will directly
|
||
link against the library containing it.
|
||
#. StringForceImpl. This is the ForceImpl subclass corresponding to
|
||
StringForce. It should be defined in the same library as StringForce, and
|
||
StringForce’s :code:`createImpl()` method should create an instance of it.
|
||
#. CalcStringForceKernel. This is an abstract class that extends KernelImpl,
|
||
and defines the API by which StringForceImpl invokes its kernel. You only need
|
||
to provide a header file for it, not an implementation; those will be provided
|
||
by Platforms.
|
||
|
||
|
||
Now suppose you are writing the OpenCL implementation of StringForce. Here are
|
||
the classes you need to write:
|
||
|
||
#. OpenCLCalcStringForceKernel. This extends CalcStringForceKernel and provides
|
||
implementations of its virtual methods. The code for this class will probably
|
||
be very complicated (and if it actually works, worth a Nobel Prize). It may
|
||
execute many different GPU kernels and create its own internal data structures.
|
||
But those details are entirely internal to your own code. As long as this class
|
||
implements the virtual methods of CalcStringForceKernel, you can do anything you
|
||
want inside it.
|
||
#. OpenCLStringForceKernelFactory. This is a KernelFactory subclass that knows
|
||
how to create instances of OpenCLCalcStringForceKernel.
|
||
|
||
|
||
Both of these classes should be packaged into a dynamic library (.so on Linux,
|
||
.dylib on Mac, .dll on Windows) that can be loaded as a plugin. This library
|
||
must also implement the two functions from PluginInitializer.h.
|
||
:code:`registerPlatforms()` will do nothing, since this plugin does not
|
||
implement any new Platforms. :code:`registerKernelFactories()` should call
|
||
\ :code:`Platform::getPlatformByName("OpenCL")` to get the OpenCL Platform,
|
||
then create a new OpenCLStringForceKernelFactory and call
|
||
:code:`registerKernelFactory()` on the Platform to register it. If the OpenCL
|
||
Platform is not available, you should catch the exception then return without
|
||
doing anything. Most likely this means there is no OpenCL runtime on the
|
||
computer your code is running on.
|
||
|
||
|
||
.. _the-reference-platform:
|
||
|
||
The Reference Platform
|
||
######################
|
||
|
||
The reference Platform is written with simplicity and clarity in mind, not
|
||
performance. (It is still not always as simple or clear as one might hope, but
|
||
that is the goal.) When implementing a new feature, it is recommended to create
|
||
the reference implementation first, then use that as a model for the versions in
|
||
other Platforms.
|
||
|
||
The reference Platform represents all floating point numbers with the type
|
||
RealOpenMM, which is defined in SimTKOpenMMRealType.h. This allows the entire
|
||
platform to be compiled in either single or double precision. By default it is
|
||
double precision, but it can be changed by modifying one flag at the top of that
|
||
file. The same file also defines lots of numerical constants and mathematical
|
||
functions, so the correct precision version will always be used. Vector
|
||
quantities (positions, velocities, etc.) are represented by RealVec objects.
|
||
This class is identical to Vec3, except that its components are of type
|
||
RealOpenMM instead of double.
|
||
|
||
When using the reference Platform, the “platform-specific data” stored in
|
||
ContextImpl is of type ReferencePlatform::PlatformData, which is declared in
|
||
ReferencePlatform.h. Several of the fields in this class are declared as void*
|
||
to avoid having to include SimTKOpenMMRealType.h in ReferencePlatform.h. If you
|
||
look in ReferenceKernels.cpp, you will find code for extracting the correct
|
||
values of these fields. For example:
|
||
::
|
||
|
||
static vector<RealVec>& extractPositions(ContextImpl& context) {
|
||
ReferencePlatform::PlatformData* data =
|
||
reinterpret_cast<ReferencePlatform::PlatformData*>(context.getPlatformData());
|
||
return *((vector<RealVec>*) data->positions);
|
||
}
|
||
|
||
The PlatformData’s vector of forces contains one element for each particle. At
|
||
the start of each force evaluation, all elements of it are set to zero. Each
|
||
Force adds its own contributions to the vector, so that at the end, it contains
|
||
the total force acting on each particle.
|
||
|
||
There are a few additional classes that contain useful static methods.
|
||
SimTKOpenMMUtilities has various utility functions, of which the most important
|
||
is a random number generator. ReferenceForce provides methods for calculating
|
||
the displacement between two positions, optionally taking periodic boundary
|
||
conditions into account.
|
||
|
||
|
||
.. _the-cpu-platform:
|
||
|
||
The CPU Plaform
|
||
###############
|
||
|
||
CpuPlatform is a subclass of ReferencePlatform. It provides optimized versions
|
||
of a small number of kernels, while using the reference implementations for all
|
||
the others. Any kernel implementation written for the reference Platform will
|
||
work equally well with the CPU platform. Of course, if that kernel happens to
|
||
be a performance bottleneck, you will probably want to write an optimized
|
||
version of it. But many kernels have negligible effect on performance, and for
|
||
these you can just use the same implementation for both platforms.
|
||
|
||
If you choose to do that, you can easily support both platforms with a single
|
||
plugin library. Just implement :code:`registerKernelFactories()` like this:
|
||
::
|
||
|
||
extern "C" void registerKernelFactories() {
|
||
for (int i = 0; i < Platform::getNumPlatforms(); i++) {
|
||
Platform& platform = Platform::getPlatform(i);
|
||
if (dynamic_cast<ReferencePlatform*>(&platform) != NULL) {
|
||
// Create and register your KernelFactory.
|
||
}
|
||
}
|
||
}
|
||
|
||
The loop identifies every ReferencePlatform, either an instance of the base
|
||
class or of a subclass, and registers a KernelFactory for every one.
|
||
|
||
|
||
.. _the-opencl-platform:
|
||
|
||
The OpenCL Platform
|
||
###################
|
||
|
||
The OpenCL Platform is much more complicated than the reference Platform. It
|
||
also provides many more tools to simplify your work, but those tools themselves
|
||
can be complicated to use correctly. This chapter will attempt to explain how
|
||
to use some of the most important ones. It will *not* teach you how to
|
||
program with OpenCL. There are many tutorials on that subject available
|
||
elsewhere, and this guide assumes you already understand it.
|
||
|
||
Overview
|
||
********
|
||
|
||
When using the OpenCL Platform, the “platform-specific data” stored in
|
||
ContextImpl is of type OpenCLPlatform::PlatformData, which is declared in
|
||
OpenCLPlatform.h. The most important field of this class is :code:`contexts`
|
||
, which is a vector of OpenCLContexts. (There is one OpenCLContext for each
|
||
device you are using. The most common case is that you are running everything
|
||
on a single device, in which case there will be only one OpenCLContext.
|
||
Parallelizing computations across multiple devices is not discussed here.) The
|
||
OpenCLContext stores most of the important information about a simulation:
|
||
positions, velocities, forces, an OpenCL CommandQueue used for executing
|
||
kernels, workspace buffers of various sorts, etc. It provides many useful
|
||
methods for compiling and executing kernels, clearing and reducing buffers, and
|
||
so on. It also provides access to three other important objects: the
|
||
OpenCLIntegrationUtilities, OpenCLNonbondedUtilities, and OpenCLBondedUtilities.
|
||
These are discussed below.
|
||
|
||
Allocation of device memory is generally done through the OpenCLArray class. It
|
||
takes care of much of the work of memory management, and provides a simple
|
||
interface for transferring data between host and device memory.
|
||
|
||
Every kernel is specific to a particular OpenCLContext, which in turn is
|
||
specific to a particular OpenMM::Context. This means that kernel source code
|
||
can be customized for a particular simulation. For example, values such as the
|
||
number of particles can be turned into compile-time constants, and specific
|
||
versions of kernels can be selected based on the device being used or on
|
||
particular aspects of the system being simulated.
|
||
:code:`OpenCLContext::createProgram()` makes it easy to specify a list of
|
||
preprocessor definitions to use when compiling a kernel.
|
||
|
||
The normal way to execute a kernel is by calling :code:`executeKernel()` on
|
||
the OpenCLContext. It allows you to specify the total number of work-items to
|
||
execute, and optionally the size of each work-group. (If you do not specify a
|
||
work-group size, it uses 64 as a default.) The number of work-groups to launch
|
||
is selected automatically based on the work-group size, the total number of
|
||
work-items, and the number of compute units in the device it will execute on.
|
||
|
||
Numerical Precision
|
||
*******************
|
||
|
||
The OpenCL platform supports three precision modes:
|
||
|
||
#. **Single**\ : All values are stored in single precision, and nearly all
|
||
calculations are done in single precision. The arrays of positions, velocities,
|
||
forces, and energies (returned by the OpenCLContext’s :code:`getPosq()`\ ,
|
||
:code:`getVelm()`\ , :code:`getForce()`\ , :code:`getForceBuffers()`\ , and
|
||
:code:`getEnergyBuffer()` methods) are all of type :code:`float4` (or
|
||
:code:`float` in the case of :code:`getEnergyBuffer()`\ ).
|
||
#. **Mixed**\ : Forces are computed and stored in single precision, but
|
||
integration is done in double precision. The velocities have type
|
||
:code:`double4`\ . The positions are still stored in single precision to avoid
|
||
adding overhead to the force calculations, but a second array of type
|
||
:code:`float4` is created to store “corrections” to the positions (returned by
|
||
the OpenCLContext’s getPosqCorrection() method). Adding the position and the
|
||
correction together gives the full double precision position.
|
||
#. **Double**\ : Positions, velocities, forces, and energies are all stored in
|
||
double precision, and nearly all calculations are done in double precision.
|
||
|
||
|
||
You can call :code:`getUseMixedPrecision()` and
|
||
:code:`getUseDoublePrecision()` on the OpenCLContext to determine which mode
|
||
is being used. In addition, when you compile a kernel by calling
|
||
:code:`createKernel()`\ , it automatically defines two types for you to make it
|
||
easier to write kernels that work in any mode:
|
||
|
||
#. :code:`real` is defined as :code:`float` in single or mixed precision
|
||
mode, :code:`double` in double precision mode.
|
||
#. :code:`mixed` is defined as :code:`float` in single precision mode,
|
||
:code:`double` in mixed or double precision mode.
|
||
|
||
|
||
It also defines vector versions of these types (\ :code:`real2`\ ,
|
||
:code:`real4`\ , etc.).
|
||
|
||
.. _computing-forces:
|
||
|
||
Computing Forces
|
||
****************
|
||
|
||
When forces are computed, they are stored in multiple buffers. This is done to
|
||
enable multiple work-items or work-groups to compute forces on the same particle
|
||
at the same time; as long as each one writes to a different buffer, there is no
|
||
danger of race conditions. At the start of a force calculation, all forces in
|
||
all buffers are set to zero. Each Force is then free to add its contributions
|
||
to any or all of the buffers. Finally, the buffers are summed to produce the
|
||
total force on each particle.
|
||
|
||
The size of each buffer is equal to the number of particles, rounded up to the
|
||
next multiple of 32. Call :code:`getPaddedNumAtoms()` on the OpenCLContext
|
||
to get that number. The actual force buffers are obtained by calling
|
||
:code:`getForceBuffers()`\ . The first *n* entries (where *n* is the
|
||
padded number of atoms) represent the first force buffer, the next *n*
|
||
represent the second force buffer, and so on. More generally, the *i*\ ’th
|
||
force buffer’s contribution to the force on particle *j* is stored in
|
||
element :code:`i*context.getPaddedNumAtoms()+j`\ .
|
||
|
||
Depending on the device, a buffer may also be created that stores contributions
|
||
to the forces in 64 bit fixed point format. On devices that support atomic
|
||
operations on 64 bit integers in global memory, this can be a more efficient way
|
||
of accumulating forces than using a large number of force buffers. To convert a
|
||
value from floating point to fixed point, multiply it by 0x100000000 (2\ :sup:`32`\ ),
|
||
then cast it to a :code:`long`\ . The fixed point buffer is
|
||
ordered differently from the others. For atom *i*\ , the x component of its
|
||
force is stored in element :code:`i`\ , the y component in element
|
||
:code:`i+context.getPaddedNumAtoms()`\ , and the z component in element
|
||
:code:`i+2*context.getPaddedNumAtoms()`\ .
|
||
|
||
The potential energy is also accumulated in a set of buffers, but this one is
|
||
simply a list of floating point values. All of them are set to zero at the
|
||
start of a computation, and they are summed at the end of the computation to
|
||
yield the total energy.
|
||
|
||
The OpenCL implementation of each Force object should define a subclass of
|
||
OpenCLForce, and register an instance of it by calling :code:`addForce()` on
|
||
the OpenCLContext. This serves two purposes:
|
||
|
||
#. It reports how many force buffers are required when calculating this
|
||
particular Force. The OpenCLContext sets the size of its force buffer array
|
||
based on the largest number of buffers required by any Force.
|
||
#. It implements methods for determining whether particular particles or groups
|
||
of particles are identical. This is important when reordering particles, and is
|
||
discussed below.
|
||
|
||
|
||
Nonbonded Forces
|
||
****************
|
||
|
||
Computing nonbonded interactions efficiently is a complicated business in the
|
||
best of cases. It is even more complicated on a GPU. Furthermore, the
|
||
algorithms must vary based on the type of processor being used, whether there is
|
||
a distance cutoff, and whether periodic boundary conditions are being applied.
|
||
|
||
The OpenCLNonbondedUtilities class tries to simplify all of this. To use it you
|
||
need provide only a piece of code to compute the interaction between two
|
||
particles. It then takes responsibility for generating a neighbor list, looping
|
||
over interacting particles, loading particle parameters from global memory, and
|
||
writing the forces and energies to the appropriate buffers. All of these things
|
||
are done using an algorithm appropriate to the processor you are running on and
|
||
high level aspects of the interaction, such as whether it uses a cutoff and
|
||
whether particular particle pairs need to be excluded.
|
||
|
||
Of course, this system relies on certain assumptions, the most important of
|
||
which is that the Force can be represented as a sum of independent pairwise
|
||
interactions. If that is not the case, things become much more complicated.
|
||
You may still be able to use features of OpenCLNonbondedUtilities, but you
|
||
cannot use the simple mechanism outlined above. That is beyond the scope of
|
||
this guide.
|
||
|
||
To define a nonbonded interaction, call :code:`addInteraction()` on the
|
||
OpenCLNonbondedUtilities, providing a block of OpenCL source code for computing
|
||
the interaction. This block of source code will be inserted into the middle of
|
||
an appropriate kernel. At the point where it is inserted, various variables
|
||
will have been defined describing the interaction to compute:
|
||
|
||
#. :code:`atom1` and :code:`atom2` are the indices of the two
|
||
interacting particles.
|
||
#. :code:`r`\ , :code:`r2`\ , and :code:`invR` are the distance *r*
|
||
between the two particles, *r*\ :sup:`2`\ , and 1/\ *r* respectively.
|
||
#. :code:`isExcluded` is a :code:`bool` specifying whether this pair of
|
||
particles is marked as an excluded interaction. (Excluded pairs are not skipped
|
||
automatically, because in some cases they still need to be processed, just
|
||
differently from other pairs.)
|
||
#. :code:`posq1` and :code:`posq2` are :code:`real4`\ s containing the
|
||
positions (in the xyz fields) and charges (in the w fields) of the two
|
||
particles.
|
||
#. Other per-particle parameters may be specified, as described below.
|
||
|
||
|
||
The following preprocessor macros will also have been defined:
|
||
|
||
#. :code:`NUM_ATOMS` is the total number of particles in the system.
|
||
#. :code:`PADDED_NUM_ATOMS` is the padded number of particles in the system.
|
||
#. :code:`USE_CUTOFF` is defined if and only if a cutoff is being used
|
||
#. :code:`USE_PERIODIC` is defined if and only if periodic boundary
|
||
conditions are being used.
|
||
#. :code:`CUTOFF` and :code:`CUTOFF_SQUARED` are the cutoff distance and
|
||
its square respectively (but only defined if a cutoff is being used).
|
||
|
||
|
||
Finally, two output variables will have been defined:
|
||
|
||
#. You should add the energy of the interaction to :code:`tempEnergy`\ .
|
||
#. You should add the derivative of the energy with respect to the inter-particle
|
||
distance to :code:`dEdR`\ .
|
||
|
||
|
||
You can also define arbitrary per-particle parameters by calling
|
||
:code:`addParameter()` on the OpenCLNonbondedUtilities. You provide an array
|
||
in device memory containing the set of values, and the values for the two
|
||
interacting particles will be loaded and stored into variables called
|
||
:code:`<name>1` and :code:`<name>2`\ , where <name> is the name you specify
|
||
for the parameter. Note that nonbonded interactions are not computed until
|
||
after :code:`calcForcesAndEnergy()` has been called on every ForceImpl, so
|
||
it is possible to make the parameter values change with time by modifying them
|
||
inside :code:`calcForcesAndEnergy()`\ . Also note that the length of the
|
||
array containing the parameter values must equal the *padded* number of
|
||
particles in the system.
|
||
|
||
Finally, you can specify arbitrary other memory objects that should be passed as
|
||
arguments to the interaction kernel by calling :code:`addArgument()`\ . The
|
||
rest of the kernel ignores these arguments, but you can make use of them in your
|
||
interaction code.
|
||
|
||
Consider a simple example. Suppose we want to implement a nonbonded interaction
|
||
of the form *E*\ =\ *k*\ :sub:`1`\ *k*\ :sub:`2`\ *r*\ :sup:`2`\ ,
|
||
where *k* is a per-particle parameter. First we create a parameter as
|
||
follows
|
||
::
|
||
|
||
nb.addParameter(OpenCLNonbondedUtilities::ParameterInfo("kparam", "float", 1,
|
||
sizeof(cl_float), kparam->getDeviceBuffer()));
|
||
|
||
where :code:`nb` is the OpenCLNonbondedUtilities for the context. Now we
|
||
call :code:`addInteraction()` to define an interaction with the following
|
||
source code:
|
||
::
|
||
|
||
#ifdef USE_CUTOFF
|
||
if (!isExcluded && r2 < CUTOFF_SQUARED) {
|
||
#else
|
||
if (!isExcluded) {
|
||
#endif
|
||
tempEnergy += kparam1*kparam2*r2;
|
||
dEdR += 2*kparam1*kparam2*r;
|
||
}
|
||
|
||
An important point is that this code is executed for every pair of particles in
|
||
the *padded* list of atoms. This means that some interactions involve
|
||
padding atoms, and should not actually be included. You might think, then, that
|
||
the above code is incorrect and we need another check to filter out the extra
|
||
interactions:
|
||
::
|
||
|
||
if (atom1 < NUM_ATOMS && atom2 < NUM_ATOMS)
|
||
|
||
This is not necessary in our case, because the :code:`isExcluded` flag is
|
||
always set for interactions that involve a padding atom. If our force did not
|
||
use excluded interactions (and so did not check :code:`isExcluded`\ ), then we
|
||
would need to add this extra check. Self interactions are a similar case: we do
|
||
not check for :code:`(atom1 == atom2)` because the exclusion flag prevents
|
||
them from being processed, but for some forces that check is necessary.
|
||
|
||
Bonded Forces
|
||
*************
|
||
|
||
Just as OpenCLNonbondedUtilities simplifies the task of creating nonbonded
|
||
interactions, OpenCLBondedUtilities simplifies the process for many types of
|
||
bonded interactions. A “bonded interaction” means one that is applied to small,
|
||
fixed groups of particles. This includes bonds, angles, torsions, etc. The
|
||
important point is that the list of particles forming a “bond” is known in
|
||
advance and does not change with time.
|
||
|
||
Using OpenCLBondedUtilities is very similar to the process described above. You
|
||
provide a block of OpenCL code for evaluating a single interaction. This block
|
||
of code will be inserted into the middle of a kernel that loops over all
|
||
interactions and evaluates each one. At the point where it is inserted, the
|
||
following variables will have been defined describing the interaction to
|
||
compute:
|
||
|
||
#. :code:`index` is the index of the interaction being evaluated.
|
||
#. :code:`atom1`\ , :code:`atom2`\ , ... are the indices of the interacting
|
||
particles.
|
||
#. :code:`pos1`\ , :code:`pos2`\ , ... are :code:`real4`\ s containing the
|
||
positions (in the xyz fields) of the interacting particles.
|
||
|
||
|
||
A variable called :code:`energy` will have been defined for accumulating the
|
||
total energy of all interactions. Your code should add the energy of the
|
||
interaction to it. You also should define :code:`real4` variables called
|
||
:code:`force1`\ , :code:`force2`\ , ... and store the force on each atom into
|
||
them.
|
||
|
||
As a simple example, the following source code implements a pairwise interaction
|
||
of the form *E*\ =\ *r*\ :sup:`2`\ :
|
||
::
|
||
|
||
real4 delta = pos2-pos1;
|
||
energy += delta.x*delta.x + delta.y*delta.y + delta.z*delta.z;
|
||
real4 force1 = 2.0f*delta;
|
||
real4 force2 = -2.0f*delta;
|
||
|
||
To use it, call :code:`addInteraction()` on the Context’s
|
||
OpenCLBondedUtilities object. You also provide a list of the particles involved
|
||
in every bonded interaction.
|
||
|
||
Exactly as with nonbonded interactions, you can call :code:`addArgument()`
|
||
to specify arbitrary memory objects that should be passed as arguments to the
|
||
interaction kernel. These might contain per-bond parameters (use
|
||
:code:`index` to look up the appropriate element) or any other information you
|
||
want.
|
||
|
||
Reordering of Particles
|
||
***********************
|
||
|
||
Nonbonded calculations are done a bit differently in the OpenCL Platform than in
|
||
most CPU based codes. In particular, interactions are computed on blocks of 32
|
||
particles at a time (which is why the number of particles needs to be padded to
|
||
bring it up to a multiple of 32), and the neighbor list actually lists pairs of
|
||
\ *blocks*\ , not pairs of individual particles, that are close enough to
|
||
interact with each other.
|
||
|
||
This only works well if sequential particles tend to be close together so that
|
||
blocks are spatially compact. This is generally true of particles in a
|
||
macromolecule, but it is not true for solvent molecules. Each water molecule,
|
||
for example, can move independently of other water molecules, so particles that
|
||
happen to be sequential in whatever order the molecules were defined in need not
|
||
be spatially close together.
|
||
|
||
The OpenCL Platform addresses this by periodically reordering particles so that
|
||
sequential particles are close together. This means that what the OpenCL
|
||
Platform calls particle *i* need not be the same as what the System calls
|
||
particle *i*\ .
|
||
|
||
This reordering is done frequently, so it must be very fast. If all the data
|
||
structures describing the structure of the System and the Forces acting on it
|
||
needed to be updated, that would make it prohibitively slow. The OpenCL
|
||
Platform therefore only reorders particles in ways that do not alter any part of
|
||
the System definition. In practice, this means exchanging entire molecules; as
|
||
long as two molecules are truly identical, their positions and velocities can be
|
||
exchanged without affecting the System in any way.
|
||
|
||
Every Force can contribute to defining the boundaries of molecules, and to
|
||
determining whether two molecules are identical. This is done through the
|
||
OpenCLForceInfo it adds to the OpenCLContext. It can specify two types of
|
||
information:
|
||
|
||
#. Given a pair of particles, it can say whether those two particles are
|
||
identical (as far as that Force is concerned). For example, a Force object
|
||
implementing a Coulomb force would check whether the two particles had equal
|
||
charges.
|
||
#. It can define *particle groups*\ . The OpenCL Platform will ensure that
|
||
all the particles in a group are part of the same molecule. It also can specify
|
||
whether two groups are identical to each other. For example, in a Force
|
||
implementing harmonic bonds, each group would consist of the two particles
|
||
connected by a bond, and two groups would be identical if they had the same
|
||
spring constants and equilibrium lengths.
|
||
|
||
|
||
Integration Utilities
|
||
*********************
|
||
|
||
The OpenCLContext’s OpenCLIntegrationUtilities provides features that are used
|
||
by many integrators. The two most important are random number generation and
|
||
constraint enforcement.
|
||
|
||
If you plan to use random numbers, you should call
|
||
:code:`initRandomNumberGenerator()` during initialization, specifying the
|
||
random number seed to use. Be aware that there is only one random number
|
||
generator, even if multiple classes make use of it. If two classes each call
|
||
:code:`initRandomNumberGenerator()` and request different seeds, an exception
|
||
will be thrown. If they each request the same seed, the second call will simply
|
||
be ignored.
|
||
|
||
For efficiency, random numbers are generated in bulk and stored in an array in
|
||
device memory, which you can access by calling :code:`getRandom()`\ . Each
|
||
time you need to use a block of random numbers, call
|
||
:code:`prepareRandomNumbers()`\ , specifying how many values you need. It will
|
||
register that many values as having been used, and return the index in the array
|
||
at which you should start reading values. If not enough unused values remain in
|
||
the array, it will generate a new batch of random values before returning.
|
||
|
||
To apply constraints, simply call :code:`applyConstraints()`\ . For numerical
|
||
accuracy, the constraint algorithms do not work on particle positions directly,
|
||
but rather on the *displacements* taken by the most recent integration step.
|
||
These displacements must be stored in an array which you can get by calling
|
||
:code:`getPosDelta()`\ . That is, the constraint algorithms assume the actual
|
||
(unconstrained) position of each particle equals the position stored in the
|
||
OpenCLContext plus the delta stored in the OpenCLIntegrationUtilities. It then
|
||
modifies the deltas so that all distance constraints are satisfied. The
|
||
integrator must then finish the time step by adding the deltas to the positions
|
||
and storing them into the main position array.
|
||
|
||
|
||
.. _the-cuda-platform:
|
||
|
||
The CUDA Platform
|
||
#################
|
||
|
||
The CUDA platform is very similar to the OpenCL platform, and most of the
|
||
previous chapter applies equally well to it, just changing “OpenCL” to “Cuda” in
|
||
class names. There are a few differences worth noting.
|
||
|
||
Compiling Kernels
|
||
*****************
|
||
|
||
Like the OpenCL platform, the CUDA platform compiles all its kernels at runtime.
|
||
Unlike OpenCL, CUDA does not have built in support for runtime compilation.
|
||
OpenMM therefore needs to implement this itself by writing the source code out
|
||
to disk, invoking the nvcc compiler as a separate process, and then loading the
|
||
compiled kernel in from disk.
|
||
|
||
For the most part, you can ignore all of this. Just call
|
||
:code:`createModule()` on the CudaContext, passing it the CUDA source code.
|
||
It takes care of the details of compilation and loading, returning a CUmodule
|
||
object when it is done. You can then call :code:`getKernel()` to look up
|
||
individual kernels in the module (represented as CUfunction objects) and
|
||
:code:`executeKernel()` to execute them.
|
||
|
||
The CUDA platform does need two things to make this work: a directory on disk
|
||
where it can write out temporary files, and the path to the nvcc compiler.
|
||
These are specified by the “CudaTempDirectory” and “CudaCompiler” properties
|
||
when you create a new Context. It often can figure out suitable values for them
|
||
on its own, but sometimes it needs help. See the “Platform-Specific Properties”
|
||
chapter of the Users Manual for details.
|
||
|
||
Accumulating Forces
|
||
*******************
|
||
|
||
The OpenCL platform, as described in Section :ref:`computing-forces`\ , uses two types of buffers for
|
||
accumulating forces: a set of floating point buffers, and a single fixed point
|
||
buffer. In contrast, the CUDA platform uses *only* the fixed point buffer
|
||
(represented by the CUDA type :code:`long` :code:`long`\ ). This means
|
||
the CUDA platform only works on devices that support 64 bit atomic operations
|
||
(compute capability 1.2 or higher).
|
||
|