Software

PeriodicFMM

PeriodicFMM is a software package that implements the flexibly periodization method for kernel independent FMM described in these two papers:

The package provides Stokes single layer (Stokelet) kernel FMM for the following cases:

Geometry Boundary Conditions
3D Non-, Singly-, Doubly-, and Triply-Periodic
3D above a no-slip wall Non-, Singly-, and Doubly-Periodic

This package is based on the ‘new_BC’ branch of the massively parallel KIFMM package PVFMM originally developed by Dr. Dhairya Malhotra. This package provides native interfaces in C++ with full OpenMP + MPI support and scales beyond thousands of cores. Kernels are hand-written in AVX/AVX2 SIMD instructions to improve speed. Native C++ interfaces are provided, with C/Fortran/Python wrappers. The wrappers are implemented with the help of Dr. Florencio Balboa Usabiaga.

STKFMM

STKFMM is a software package that provides Laplace/Stokes kernels for boundary integral methods. Single layer and double layer kernels, together with their gradients on target points can be computed together. It has many features to facilitate modern large scale boundary value PDE solvers:

1. Convenient. All PVFMM data structures are wrapped in a single class, with scaling functions to fit source/target points into the unit cubic box required by PVFMM.
2. Flexible. Multiple kernels can be activated simultaneously
3. Efficient. Single Layer and Double Layer potentials are simultaneously calculated through a single octree. M2M, M2L, L2L operations are combined into single layer operations only.
4. Optimized. All kernels are hand-written with AVX intrinsic instructions.
5. (To be implemented). Singly, doubly and triply periodicity in a unified interface.

It supports the following kernels:

Kernel Single Layer Source(dim) Double Layer Source(dim) Target(dim)
Stokes PVel force+TrD(4) double layer (9) pressure,velocity (1+3)
Stokes PVelLaplacian force+TrD(4) double layer (9) pressure,velocity,Laplacian velocity (1+3+3)
Stokes Traction force+TrD(4) double layer (9) traction(9)

This package is based on the ‘new_BC’ branch of the massively parallel KIFMM package PVFMM originally developed by Dr. Dhairya Malhotra. This package provides native interfaces in C++ with full OpenMP + MPI support and scales beyond thousands of cores. Kernels are hand-written in AVX/AVX2 SIMD instructions to improve speed. Native C++ interfaces are provided.

SimToolbox

SimToolbox is a set of loosely coupled handy tools (a ‘toolbox’) to simplify the development and maintainance of parallel particle-tracking simulations on HPC

At the lowest level the toolbox relies on FDPS and Trilinos. FDPS refers to the Framework for Developing Particle Simulator, which provides necessary infrastructures of parallel particle-tracking simulations such as domain decomposition and near neighbor detection. Trilinos is the massive C++ HPC project for distributed linear algebra and some other handy tool.

At the application level the toolbox implements a stable and efficient collision-resolution algorithm for general smooth-shape particles based on geometric constrained optimization. The method is demonstrated to track the system collision stress accurately and is described in detail in the following paper:

The toolbox also includes some useful code for supportive tasks in simulations, including:

1. Output to binary XML VTK data files
2. Parallel random number generation with full OpenMP and MPI support, with the help of TRNG library.
3. MPI data directory based on Zoltan Distributed Directory Utility

SafeFFT

SafeFFT is a thread-safe c++ wrapper for FFTW and MKL. In FFTW3 (or MKL) the only thread safe functions are fftw_execute_...(). This sometimes poses problems on the structure of multithreading code, where each thread may need to perform FFTs with different fftw_plan. This simple wrapper around FFTW3 aims at making things easier by maintain a global hash table of fftw_plan, where each thread may insert new plans and read already allocated plans. The hash table is locked such that multiple readers can access it but only one thread can insert new entries. This allows multiple threads to reuse already allocated plan simultaneously, without allocating a new plan everytime. I believe this approach has some performance and design advantage because allocating a new plan everytime for every thread requires a mutex lock to allow only one thread to create a plan. This simple wrapper fits a case where a large number of FFTs must be processed, but the total number of different FFT plans are not that large, and it may also be hard to preallocate all possible plans before running any FFTs.

CSDmp

This is a pedagogical codebase in Fortran 95 to demonstrate the Conventional Stokesian Dynamics algorithm:

1. Durlofsky, L., Brady, J. F. & Bossis, G. Dynamic Simulation of Hydrodynamically Interacting Particles. Journal of Fluid Mechanics 180, 21–49 (1987).

This package only computes a small number of monodisperse spherical particles in unbounded (Non-periodic boundary condition) Stokes fluid. The program relies on standard BLAS and LAPACK routines. Modify the makefile according to your software environment. Mixed precision means the velocity is calculated in single precision and the position is in double precision.

DemoSIMD

This is a collection of short code for beginners to understand SIMD and cache. It includes demos about memcpy, gemm, fast-inverse-squareroot, and cache blocking.