**This package has retired and have been replaced by STKFMM**

PeriodicFMM is a software package that implements the flexibly periodization method for kernel independent FMM described in these two papers:

- Wen Yan, Michael Shelley (2018). Flexibly Imposing Periodicity in Kernel Independent FMM: A Multipole-to-Local Operator Approach. Journal of Computational Physics
- Wen Yan, Michael Shelley (2018). Universal Image Systems for Non-Periodic and Periodic Stokes Flows above a No-Slip Wall. Journal of Computational Physics

The package provides Stokes single layer (Stokelet) kernel FMM for the following cases:

Geometry | Boundary Conditions |
---|---|

3D | Non-, Singly-, Doubly-, and Triply-Periodic |

3D above a no-slip wall | Non-, Singly-, and Doubly-Periodic |

This package is based on the ‘new_BC’ branch of the massively parallel KIFMM package
PVFMM originally developed by
Dr. Dhairya Malhotra.
This package provides native interfaces in C++ with full `OpenMP`

+ `MPI`

support and scales beyond thousands of cores. Kernels are hand-written in `AVX/AVX2`

SIMD instructions to improve speed. Native C++ interfaces are provided, with C/Fortran/Python wrappers. The wrappers are implemented with the help of
Dr. Florencio Balboa Usabiaga.

It computes the classic kernel sum problem: for a given set of single layer sources $s^j$ at points $y_s^j$, double layer sources $d^j$ points $y_d^j$, target points $x_t$, and single layer potential $K_s$, double layer potential $K_d$: $$p(x_t^i)=\sum_j K_s(x_t^i,y_s^j) s^j +\sum_j K_d (x_t^i,y_d^j)d^j $$

**Note** For some problems the kernels $K_s$ and $K_d$ may not be linear operators applied onto $s^j, d^j$.

This package computes Laplace kernel $L=\dfrac{1}{4\pi r}$ and Stokeslet kernel $G_{ij}=\dfrac{1}{8\pi}\left(\dfrac{1}{r}\delta_{ij}+\dfrac{1}{r^3}r_ir_j\right)$ and their derivatives.

Here is a detailed table, in which the summation $\sum$ is dropped for clarity and the subscript indices $i,j,k,l$ denote the tensor indices. Einstein summation and comma notation are used to simplify the expressions, for example, $G_{ij,i}f_j=\nabla\cdot (\mathbf{G}\cdot\mathbf{f})$.

In the table:

**NA**means input ignored- $Q_{ij}$, $D_{ij}$ are 3x3 tensors written as 9-dimension vectors in row-major format
- $\nabla\nabla p$ is symmetric so it is written as $p_{,xx},p_{,xy},p_{,xz},p_{,yy},p_{,yz},p_{,zz}$.
- For
`RPY`

,`StokesRegVel`

and`StokesRegVelOmega`

kernels, the parameter $b$ and $\epsilon$ can be different for each source point, and the summations are nonlinear functions of $b$ and $\epsilon$. Also $b$ and $\epsilon$ must be much smaller than the lower level leaf box of the adaptive octree, otherwise the convergence property of KIFMM is invalidated. - For all kernels, the electrostatic conductivity and fluid viscosity are ignored (set to 1).
- The regularized Stokeslet is $G_{ij}^\epsilon = \dfrac{1}{8\pi}\dfrac{r^{2}+2 \epsilon^{2}}{\left(r^{2}+\epsilon^{2}\right)^{3 / 2}} \delta_{i j} f_j+\dfrac{1}{\left(r^{2}+\epsilon^{2}\right)^{3 / 2}} r_ir_jf_j$.
- For Stokes
`PVel`

,`PVelGrad`

,`PVelLaplacian`

, and`Traction`

kernels, the pressure and velocity fields are: $$ p=\frac{1}{4 \pi} \frac{r_{j}}{r^{3}} f_{j} + \frac{1}{4 \pi}\left(-3 \frac{r_{j} r_{k}}{r^{5}}+\frac{\delta_{j k}}{r^{3}}\right) D_{j k}, \quad u_{i}=G_{ij}f_j + \frac{1}{8 \pi \mu}\left(-\frac{r_{i}}{r^{3}} trD\right) + \frac{1}{8 \pi \mu}\left[-\frac{3 r_{i} r_{j} r_{k}}{r^{5}}\right] D_{j k} $$

Kernel | Single Layer Source (dim) | Double Layer Source (dim) | Summation | Target Value (dim) |
---|---|---|---|---|

`LapPGrad` |
$q$ (1) | $d$ (3) | $p=Lq-L_{,j}d_j$ | $p,\nabla p$ (1+3) |

`LapPGradGrad` |
$q$ (1) | $d$ (3) | $p=Lq-L_{,j}d_j$ | $p,\nabla p, \nabla\nabla p$ (1+3+6). |

`LapQPGradGrad` |
$Q_{ij}$ (9) | NA | $p=L_{,ij}Q_{ij}$ | $p,\nabla p, \nabla\nabla p$ (1+3+6). |

`Stokes` |
$f_j$ (3) | NA | $u_i = G_{ij} f_j$ | $u_i$ (3) |

`RPY` |
$f_j,b$ (3+1) | NA | $u_i = (1+\frac{1}{6}b^2\nabla^2) G_{ij} f_j$ | $u_i,\nabla^2 u_i$ (3+3) |

`StokesRegVel` |
$f_j,\epsilon$ (3+1) | NA | $u_i = G_{ij}^\epsilon f_j$ | $u_i$ |

`StokesRegVelOmega` |
$f_k,n_l,\epsilon$ (3+3+1) | NA | See Appendix A of doi 10.1016/j.jcp.2012.12.026 | $u_i,w_j$ (3+3) |

`PVel` |
$f_j,trD$ (3+1) | $D_{jk}$ (9) | see above | $p,u_i$ (1+3) |

`PVelGrad` |
$f_j,trD$ (3+1) | $D_{jk}$ (9) | see above | $p,u_i,p_{,i},u_{i,j}$ (1+3+3+9) |

`PVelLapLacian` |
$f_j,trD$ (3+1) | $D_{jk}$ (9) | see above | $p,u_i,u_{i,jj}$ (1+3+3) |

`Traction` |
$f_j,trD$ (3+1) | $D_{jk}$ (9) | see above | $\sigma_{ij}=-p \delta_{i j}+\mu\left(u_{i, j}+u_{j, i}\right)$ (9) |

- All kernels are hand-written with optimized SIMD intrinsic instructions.
- Singly, doubly and triply periodicity in a unified interface.
- Support no-slip boundary condition imposed on a flat wall through image method.
- Single Layer and Double Layer potentials are simultaneously calculated through a single octree.
- M2M, M2L, L2L operations are combined into single layer operations only.
- All PVFMM data structures are wrapped in a single class.
- Multiple kernels can be activated simultaneously.
- Complete MPI and OpenMP support.

This package is based on the ‘new_BC’ branch of the massively parallel KIFMM package PVFMM originally developed by Dr. Dhairya Malhotra.

SimToolbox is a set of loosely coupled handy tools (a ‘toolbox’) to simplify the development and maintainance of parallel particle-tracking simulations on HPC

At the lowest level the toolbox relies on `FDPS`

and `Trilinos`

.
`FDPS`

refers to the [Framework for Developing Particle Simulator] (https://github.com/FDPS/FDPS), which provides necessary infrastructures of parallel particle-tracking simulations such as domain decomposition and near neighbor detection.
`Trilinos`

is the massive C++ HPC project for distributed linear algebra and some other handy tool.

At the application level the toolbox implements a stable and efficient collision-resolution algorithm for general smooth-shape particles based on geometric constrained optimization. The method is demonstrated to track the system collision stress accurately and is described in detail in the following paper:

The toolbox also includes some useful code for supportive tasks in simulations, including:

- Output to binary XML VTK data files
- Parallel random number generation with full OpenMP and MPI support, with the help of
`TRNG`

library. - MPI data directory based on Zoltan Distributed Directory Utility

SafeFFT is a thread-safe c++ wrapper for FFTW and MKL. In FFTW3 (or MKL) the only thread safe functions are `fftw_execute_...()`

. This sometimes poses problems on the structure of multithreading code, where each thread may need to perform FFTs with different `fftw_plan`

. This simple wrapper around FFTW3 aims at making things easier by maintain a global hash table of `fftw_plan`

, where each thread may insert new plans and read already allocated plans. The hash table is locked such that multiple readers can access it but only one thread can insert new entries. This allows multiple threads to reuse already allocated plan simultaneously, without allocating a new plan everytime. I believe this approach has some performance and design advantage because allocating a new plan everytime for every thread requires a mutex lock to allow only one thread to create a plan. This simple wrapper fits a case where a large number of FFTs must be processed, but the total number of different FFT plans are not that large, and it may also be hard to preallocate all possible plans before running any FFTs.

This is a pedagogical codebase in Fortran 95 to demonstrate the Conventional Stokesian Dynamics algorithm:

- Durlofsky, L., Brady, J. F. & Bossis, G. Dynamic Simulation of Hydrodynamically Interacting Particles. Journal of Fluid Mechanics 180, 21–49 (1987).

This package only computes a small number of monodisperse spherical particles in unbounded (Non-periodic boundary condition) Stokes fluid. The program relies on standard BLAS and LAPACK routines. Modify the makefile according to your software environment. Mixed precision means the velocity is calculated in single precision and the position is in double precision.

This is a collection of short code for beginners to understand SIMD and cache. It includes demos about `memcpy`

, `gemm`

, fast-inverse-squareroot, and cache blocking.