thejoker-samples.hdf5
: This file contains the output rom The Joker samplings. The keys in the file are the APOGEE_ID
values. Under each key are datasets that contain the sample values, e.g., P
for period, e
for eccentricity, etc., along with the log-likelihood and log-prior values of each sample.thejoker-metadata.fits
: This file contains summary information and metadata about The Joker samplings in thejoker-samples.hdf5
. For example, this contains the maximum a posteriori (MAP) sample values, and some computed statistics (like phase-coverage).emcee-metadata.fits
: This file contains summary information from running emcee
to generate standard MCMC samplings for sources that return unimodal samplings from The Joker. constant-fit.fits
: This file contains the maximum likelihood values from fitting a constant velocity model to each source's visit RV data.from os import path
import os
import astropy.coordinates as coord
from astropy.constants import G
from astropy.table import Table, QTable, join
import astropy.units as u
import h5py
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
from tqdm import tqdm
from hq.config import config_to_jokerparams
from hq.plot import plot_two_panel, plot_phase_fold
from hq.data import get_rvdata
from thejoker.plot import plot_rv_curves
from thejoker import JokerSamples, JokerParams, TheJoker
from twobody.transforms import get_m2_min
Modify this to point to the path where you downloaded the VAC data products:
data_path = path.abspath('../cache/dr16-beta-demo/')
poly_trend = 1 #a config setting from the run - keep this set to 1
Load the allStar and allVisit files for DR16 beta - again, you might need to modify this to point to the location where you downloaded the DR16 beta files:
allstar = Table.read('../data/allStar-r12-l33beta.fits')
allvisit = Table.read('../data/allVisit-r12-l33beta.fits')
Load the metadata tables and join to the results from fitting the constant velocity model to the visit RV data:
meta = QTable.read(path.join(data_path, 'thejoker-metadata.fits'))
const = Table.read(path.join(data_path, 'constant-fit.fits'))
meta = join(meta, const, keys='APOGEE_ID')
# Retrieve only unique APOGEE_IDs from the metadata file:
_, idx = np.unique(meta['APOGEE_ID'], return_index=True)
meta = meta[idx]
len(meta)
The metadata table now contains a number of columns that might be useful to filtering out interesting systems.
The first column in the meta
table is the APOGEE_ID of each source.
The next several columns are the maximum a posteriori (MAP) values of the Keplerian orbital parameters for each source. For very unconstrained orbital parameters, these values aren't too useful, but for sources with many visits this is a good initial guess for the true orbital parameter values.
Mixed in the above is the log-likelihood, ln_likelihood
, value compued for the MAP sample, MAP_ln_likelihood
.
Next is a boolean column, joker_completed
, that specifies whether the sampling with The Joker returned the number of samples we requested (256): If The Joker returned 256 samples, then a source is considered "completed" by The Joker and it has a True
value in this column. More on this later.
The next few columns are statistics computed from the MAP sample value:
max_phase_gap
folds the visit RV curve on the MAP period and computes the largest gap in phase coverageperiods_spanned
is the number of MAP periods spanned (end to end) for the visitsphase_coverage
is computed by binning the phase values of the data, folded on the MAP period, i.e., the number of bins with >= 1 data point (bin width = 0.1 in phase).n_visits
is the number of visits for a given source that passed the quality cuts used for this catalog.
unimodal
indicates whether the sampling returned by The Joker is unimodal in period or not. If this column is True
, standard MCMC (with emcee
) was then run to improve the orbital parameter samplings and you should use the emcee metadata file instead - more on that soon.
The last few columns -- constant_ln_likelihood
, robust_constant_ln_likelihood
-- contain the maximum likelihood values from fitting a constant RV model, and a robust (i.e., with an outlier model) constant RV model to the data. This column can be used with MAP_ln_likelihood
as a way to select out probable binaries, i.e., by doing a likelihood ratio cut. More on that below.
Other columns can be ignored.
meta
We can now join the metadata table with the allstar table to have access to the APOGEE data along side The Joker sampling information:
# allStar table and
meta = join(meta, allstar, keys='APOGEE_ID')
To select out probably binaries or multiple systems, we can make a cut on the log-likelihood ratio of comparing The Joker samplings to the robust constant model fit to the data. First, we compute the log-likelihood ratio (llr
):
llr = meta['max_unmarginalized_ln_likelihood'] - meta['robust_constant_ln_likelihood']
Let's look at a histogram of the llr
values:
fig, ax = plt.subplots(1, 1, figsize=(6, 4))
bins = np.linspace(-150, 150, 128)
ax.hist(llr, bins=bins)
ax.set_yscale('log')
fig.tight_layout()
We'll now make a conservative cut on the llr
to select probably binary systems (this cut is arbitrary! I found that going down to ~8 or so still looks pretty good, but for now let's be conservative and require a really large likelihood ratio):
binary_mask = llr > 16
binary_mask.sum()
That still returns ~8000 systems!
Let's look at the RV curves and samples for a few of these systems:
with h5py.File(path.join(data_path, 'thejoker-samples.hdf5'), 'r') as results_f:
for i in np.random.choice(np.where(binary_mask)[0], size=4, replace=False):
star = meta[i]
samples = JokerSamples.from_hdf5(results_f[star['APOGEE_ID']],
poly_trend=poly_trend)
_visits = allvisit[allvisit['APOGEE_ID'] == star['APOGEE_ID']]
data = get_rvdata(_visits)
fig = plot_two_panel(data, samples[:128], title='{}'.format(star['APOGEE_ID']))
fig.tight_layout()
This file is similar to the summary file thejoker-metadata.fits
, but contains summary information from running emcee
to complete the posterior samplings for all sources that end up with unimodal period distributions from running The Joker. The MAP orbital parameter estimates from the emcee
runs are included in the table with names that end with _emcee
, and each parameter also has an estimate of the uncertainty on the MAP parameter with column names that end in _err
. This file also contains an estimate of the median Gelman-Rubin statistic, R_med
, which provides a way to select out converged MCMC samplings (R_med < 1.1
). We also compute the phase coverage statistics from the MAP emcee sample (e.g., phase_coverage_emcee
, periods_spanned_emcee
).
meta_emcee = QTable.read(path.join(data_path, 'emcee-metadata.fits'))
meta_emcee = join(meta_emcee, meta, keys='APOGEE_ID', table_names=['emcee', 'joker'])
_, idx = np.unique(meta_emcee['APOGEE_ID'], return_index=True)
meta_emcee = meta_emcee[idx]
llr_emcee = meta_emcee['max_unmarginalized_ln_likelihood_emcee'] - meta_emcee['robust_constant_ln_likelihood']
meta_emcee.colnames
We can compute the same log-likelihood ratio from the MCMC runs as before to select out confident binary systems that have unimodal / well-determined orbital properties. Again, making a very conservative cut:
(llr_emcee > 16).sum()
Let's now pick a subset with good phase coverage:
subset = meta_emcee[(llr_emcee > 16) &
(meta_emcee['phase_coverage_per_period'] >= 4) &
(meta_emcee['phase_coverage_emcee'] > 0.8) &
(meta_emcee['max_unmarginalized_ln_likelihood_emcee'] > -10)]
len(subset)
And plot a few of these systems:
with h5py.File(path.join(data_path, 'thejoker-samples.hdf5'), 'r') as results_f:
for star in subset[:4]:
samples = JokerSamples.from_hdf5(results_f[star['APOGEE_ID']],
poly_trend=poly_trend)
_visits = allvisit[allvisit['APOGEE_ID'] == star['APOGEE_ID']]
data = get_rvdata(_visits)
fig, ax = plt.subplots(1, 1, figsize=(8, 5))
fig = plot_phase_fold(data, samples[0], ax=ax)
fig.tight_layout()
Please let Adrian (adrianmpw@gmail.com) know if you encounter any surprises or issues, or have requests / suggestions.