We present a highly scalable solver for volume potentials using piecewise polynomials and adaptive octree to represent the source distribution and the output potential. Our method uses the kernel independent FMM to allow for the use of a wide range of kernels, such as: Laplace, Stokes, Helmholtz etc. We present convergence results for these kernels with freespace and periodic boundary conditions. We discuss algorithmic details for distributed memory parallelism using MPI and shared memory parallelism on multicore CPUs and manycore accelerators on the Stampede and Titan supercomputers. BLAS, FFTW and SSE optimizations are used to achieve high FLOP rates.