Summary of VAC files

  • thejoker-samples.hdf5: This file contains the output rom The Joker samplings. The keys in the file are the APOGEE_ID values. Under each key are datasets that contain the sample values, e.g., P for period, e for eccentricity, etc., along with the log-likelihood and log-prior values of each sample.
  • thejoker-metadata.fits: This file contains summary information and metadata about The Joker samplings in thejoker-samples.hdf5. For example, this contains the maximum a posteriori (MAP) sample values, and some computed statistics (like phase-coverage).
  • emcee-metadata.fits: This file contains summary information from running emcee to generate standard MCMC samplings for sources that return unimodal samplings from The Joker.
  • constant-fit.fits: This file contains the maximum likelihood values from fitting a constant velocity model to each source's visit RV data.
In [4]:
from os import path
import os

import astropy.coordinates as coord
from astropy.constants import G
from astropy.table import Table, QTable, join
import astropy.units as u
import h5py
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
from tqdm import tqdm

from hq.config import config_to_jokerparams
from hq.plot import plot_two_panel, plot_phase_fold
from hq.data import get_rvdata

from thejoker.plot import plot_rv_curves
from thejoker import JokerSamples, JokerParams, TheJoker

from twobody.transforms import get_m2_min

Modify this to point to the path where you downloaded the VAC data products:

In [5]:
data_path = path.abspath('../cache/dr16-beta-demo/')
poly_trend = 1 #a config setting from the run - keep this set to 1

Load the allStar and allVisit files for DR16 beta - again, you might need to modify this to point to the location where you downloaded the DR16 beta files:

In [6]:
allstar = Table.read('../data/allStar-r12-l33beta.fits')
allvisit = Table.read('../data/allVisit-r12-l33beta.fits')
WARNING: hdu= was not specified but multiple tables are present, reading in first available table (hdu=1) [astropy.io.fits.connect]

Load the metadata tables and join to the results from fitting the constant velocity model to the visit RV data:

In [7]:
meta = QTable.read(path.join(data_path, 'thejoker-metadata.fits'))

const = Table.read(path.join(data_path, 'constant-fit.fits'))
meta = join(meta, const, keys='APOGEE_ID')

# Retrieve only unique APOGEE_IDs from the metadata file:
_, idx = np.unique(meta['APOGEE_ID'], return_index=True)
meta = meta[idx]
len(meta)
Out[7]:
224403

The metadata table now contains a number of columns that might be useful to filtering out interesting systems.

Data model: thejoker-metadata.fits

The first column in the meta table is the APOGEE_ID of each source.

The next several columns are the maximum a posteriori (MAP) values of the Keplerian orbital parameters for each source. For very unconstrained orbital parameters, these values aren't too useful, but for sources with many visits this is a good initial guess for the true orbital parameter values.

Mixed in the above is the log-likelihood, ln_likelihood, value compued for the MAP sample, MAP_ln_likelihood.

Next is a boolean column, joker_completed, that specifies whether the sampling with The Joker returned the number of samples we requested (256): If The Joker returned 256 samples, then a source is considered "completed" by The Joker and it has a True value in this column. More on this later.

The next few columns are statistics computed from the MAP sample value:

  • max_phase_gap folds the visit RV curve on the MAP period and computes the largest gap in phase coverage
  • periods_spanned is the number of MAP periods spanned (end to end) for the visits
  • phase_coverage is computed by binning the phase values of the data, folded on the MAP period, i.e., the number of bins with >= 1 data point (bin width = 0.1 in phase).

n_visits is the number of visits for a given source that passed the quality cuts used for this catalog.

unimodal indicates whether the sampling returned by The Joker is unimodal in period or not. If this column is True, standard MCMC (with emcee) was then run to improve the orbital parameter samplings and you should use the emcee metadata file instead - more on that soon.

The last few columns -- constant_ln_likelihood, robust_constant_ln_likelihood -- contain the maximum likelihood values from fitting a constant RV model, and a robust (i.e., with an outlier model) constant RV model to the data. This column can be used with MAP_ln_likelihood as a way to select out probable binaries, i.e., by doing a likelihood ratio cut. More on that below.

Other columns can be ignored.

In [8]:
meta
Out[8]:
QTable length=224403
APOGEE_IDMAP_KMAP_M0MAP_PMAP_eMAP_jitterMAP_ln_likelihoodMAP_ln_priorMAP_omegaMAP_v0joker_completedmax_phase_gapmax_unmarginalized_ln_likelihoodn_visitsperiods_spannedphase_coveraget0_bmjdunimodalconstant_ln_likelihoodrobust_constant_ln_likelihoodrobust_success
km / sraddkm / sradkm / s
bytes18float64float64float64float64float64float64float64float64float64boolfloat64float64int64float64float64float64boolfloat64float64bool
2M00000002+74170740.039730239035976371.44117534160614012.21884274482727050.00188846560195088390.04.688860783383183-5.0644102096557624.681254061060496-51.666218359503745True0.22601122688810872.5779581716253954317.1260449535575320.356933.26435137996False1.98175576428315141.898155896408913False
2M00000068+57102330.144234724920887850.093034096062183382.08089160919189450.0240795724093914030.05.127602820151967-5.3844280242919926.142990589141846-12.516720153512324True0.27750287393251942.54425120994335343373.277502873922060.355874.17793338723False2.5224162274111882.5215361358975104False
2M00000222+56253599.7630423722066851.10861170291900633.4429910182952880.0521097183227539060.04.87367908293782-6.0498051643371580.891664830846242-45.26735715474349True0.38979312558198312.3699612870281536.680238267565230.355821.30330256523False-2783.4409677519084-12.419623021705993True
2M00000233+14523240.055435496160604064.0495400428771973.83756327629089360.006267982535064220.05.2783814868960395-5.7807474136352544.008700370788574-38.93029167571497True0.51455066453399862.543865713542618637.5568787707518980.256584.25006596872False1.71648975984598141.6201881658357888True
2M00000446+58543290.357705421405349673.42933249473571782.8343601226806640.004187259357422590.04.689171899326431-5.4198269844055182.987565279006958-48.00106207954139True0.411253295253429132.280419334690635439.5259597434201490.356257.11355090288False2.1635730728428492.0201084765534505True
2M00000535+15043430.199220632389626233.70522689819335942.4482088088989260.00727979745715856550.04.871473149506181-5.3532376289367682.854415742558888518.58011005106259True0.63384759005144662.478474596017725311.845395028646440.356584.25006596872False2.4608585866575672.4103195279477108False
2M00000546+61521070.105461742224297874.9870729446411132.12604808807373050.0234558489173650740.05.202186559625857-5.4011092185974123.071702241897583-33.265922501188435True0.54048884572546112.5660917303291066325.2816486811523050.356938.23962145692False2.51345117299467852.5133132704660284False
2M00000797+64361190.495226283244992854.5275659561157233.99696612358093260.0144889419898390770.05.1717136734721585-5.9497542381286621.0177360773086548-120.28936171074078True0.25398506762358952.780714479021564411.6963715201392780.456590.227496061925False-11.027793259556057-3.6973422829048084True
2M00000799+79394420.174316726519806681.44117534160614012.21884274482727050.00188846560195088390.04.678999624798296-5.0644102096557621.5396614074707031-40.41800276133423True0.282179768877585562.3446387918598908426.0270814766003370.256932.25494136392False1.69704278723615461.5917189651594295True
...............................................................
AP18062278-27454940.104331861125577663.3437175750732422.00018930435180660.005133195314556360.04.822321488959943-5.1002712249755863.177019119262695339.592153815945146True0.49995268598288482.195106764566095840.499952685841017040.258290.3126655923False2.18490520018934742.0219690342199215True
AP18062279-27454950.096867369994851321.44117534160614012.21884274482727050.00188846560195088390.05.343153730916147-5.0644102096557624.68125406106049639.70388474601769True0.52056179978359162.7035630818440306413.9712471929736260.358290.17736559051False2.62347821107365542.4331153757765605True
AP18062500-26384578.3298548542609551.31189382076263432.37717270851135250.041456732898950580.0-8.674400486235669-5.6262722015380864.592532904940196105.53066719601182True0.35352081897067933-5.74553637471677453131.669021669624020.357949.18169029526False-8.642975019605604-8.642981146979798False
AP18085390-301107010.4251559513569810.093034096062183382.08089160919189450.0240795724093914030.0-7.008786111679056-5.3844280242919923.001397935552052686.85224995428194True0.4280608232369971-4.72238636832358253151.377420865288120.357949.23489029604False-7.613449636091083-7.615152612054645True
AP18093061-300103714.3241510939947291.44117534160614012.21884274482727050.00188846560195088390.0-7.78099382562387-5.0644102096557624.681254061060496-95.24719527156793True0.4506853933277593-5.265631618564788336.8435309089900540.357948.299430282False-43.17510012305701-11.37345320209609True
AP18095510-302005123.2068649339893944.4171643257141112.23139429092407230.0132908606901764870.0-3.803185767397882-5.3528995513916023.328897120552607465.88434634421156True0.4481502974406196-3.20811222533626954141.615494043289970.357948.299430282False-452.6206744698403-17.999557049539533True
AP18100107-302143919.4247118615770346.0781393051147462.1748614311218260.108431912958621980.0-1.6334792950159882-5.8122401237487792.8603522777557373-103.96766200187109True0.2966154040824165-1.50942816881609553145.29661540394860.357948.299430282False-881.0778891765927-11.800058381566636True
AP18100627-292622414.9951355265127940.35733082890510562.28887128829956050.00195372127927839760.0-2.740109875457801-5.1001338958740233.015223979949951149.64741738368448True0.4368965700678782-2.5171411627947733335.7162946073587550.357948.299430282False-361.14145799297665-11.992751966638311True
AP18111989-29592758.2431375222437532.00786066055297852.09584522247314450.016897726804018020.0-6.670905634604261-5.32959794998168952.7500641345977783129.3244004276896True0.4713909355281181-4.600552546105707339.005743541865670.257948.299430282False-21.149923200798547-9.556582376383673True
AP18112690-29231901.10178527902658741.31189382076263432.37717270851135250.041456732898950580.0-12.541591430244521-5.6262722015380861.4509402513504028-13.669390334746796True0.38959271187653854-7.766518271912578334.389592711766350.357948.299430282False-8.446917668529084-8.446917758875854False

We can now join the metadata table with the allstar table to have access to the APOGEE data along side The Joker sampling information:

In [9]:
# allStar table and 
meta = join(meta, allstar, keys='APOGEE_ID')

To select out probably binaries or multiple systems, we can make a cut on the log-likelihood ratio of comparing The Joker samplings to the robust constant model fit to the data. First, we compute the log-likelihood ratio (llr):

In [10]:
llr = meta['max_unmarginalized_ln_likelihood'] - meta['robust_constant_ln_likelihood']

Let's look at a histogram of the llr values:

In [11]:
fig, ax = plt.subplots(1, 1, figsize=(6, 4))
bins = np.linspace(-150, 150, 128)
ax.hist(llr, bins=bins)
ax.set_yscale('log')
fig.tight_layout()

We'll now make a conservative cut on the llr to select probably binary systems (this cut is arbitrary! I found that going down to ~8 or so still looks pretty good, but for now let's be conservative and require a really large likelihood ratio):

In [12]:
binary_mask = llr > 16
binary_mask.sum()
Out[12]:
8140

That still returns ~8000 systems!

Let's look at the RV curves and samples for a few of these systems:

In [14]:
with h5py.File(path.join(data_path, 'thejoker-samples.hdf5'), 'r') as results_f:
    for i in np.random.choice(np.where(binary_mask)[0], size=4, replace=False):
        star = meta[i]
        samples = JokerSamples.from_hdf5(results_f[star['APOGEE_ID']], 
                                         poly_trend=poly_trend)
    
        _visits = allvisit[allvisit['APOGEE_ID'] == star['APOGEE_ID']]
        data = get_rvdata(_visits)
        fig = plot_two_panel(data, samples[:128], title='{}'.format(star['APOGEE_ID']))
        fig.tight_layout()

Data model: emcee-metadata.fits

This file is similar to the summary file thejoker-metadata.fits, but contains summary information from running emcee to complete the posterior samplings for all sources that end up with unimodal period distributions from running The Joker. The MAP orbital parameter estimates from the emcee runs are included in the table with names that end with _emcee, and each parameter also has an estimate of the uncertainty on the MAP parameter with column names that end in _err. This file also contains an estimate of the median Gelman-Rubin statistic, R_med, which provides a way to select out converged MCMC samplings (R_med < 1.1). We also compute the phase coverage statistics from the MAP emcee sample (e.g., phase_coverage_emcee, periods_spanned_emcee).

In [18]:
meta_emcee = QTable.read(path.join(data_path, 'emcee-metadata.fits'))
meta_emcee = join(meta_emcee, meta, keys='APOGEE_ID', table_names=['emcee', 'joker'])
_, idx = np.unique(meta_emcee['APOGEE_ID'], return_index=True)
meta_emcee = meta_emcee[idx]
llr_emcee = meta_emcee['max_unmarginalized_ln_likelihood_emcee'] - meta_emcee['robust_constant_ln_likelihood']
In [19]:
meta_emcee.colnames
Out[19]:
['APOGEE_ID',
 'MAP_K_emcee',
 'MAP_K_err',
 'MAP_M0_emcee',
 'MAP_M0_err',
 'MAP_P_emcee',
 'MAP_P_err',
 'MAP_e_emcee',
 'MAP_e_err',
 'MAP_jitter_emcee',
 'MAP_jitter_err',
 'MAP_omega_emcee',
 'MAP_omega_err',
 'MAP_v0_emcee',
 'MAP_v0_err',
 'R_max',
 'R_med',
 'max_phase_gap_emcee',
 'max_unmarginalized_ln_likelihood_emcee',
 'n_visits_emcee',
 'periods_spanned_emcee',
 'phase_coverage_emcee',
 'phase_coverage_per_period',
 't0_bmjd_emcee',
 'MAP_K_joker',
 'MAP_M0_joker',
 'MAP_P_joker',
 'MAP_e_joker',
 'MAP_jitter_joker',
 'MAP_ln_likelihood',
 'MAP_ln_prior',
 'MAP_omega_joker',
 'MAP_v0_joker',
 'joker_completed',
 'max_phase_gap_joker',
 'max_unmarginalized_ln_likelihood_joker',
 'n_visits_joker',
 'periods_spanned_joker',
 'phase_coverage_joker',
 't0_bmjd_joker',
 'unimodal',
 'constant_ln_likelihood',
 'robust_constant_ln_likelihood',
 'robust_success',
 'APSTAR_ID',
 'TARGET_ID',
 'ASPCAP_ID',
 'FILE',
 'TELESCOPE',
 'LOCATION_ID',
 'FIELD',
 'J',
 'J_ERR',
 'H',
 'H_ERR',
 'K',
 'K_ERR',
 'RA',
 'DEC',
 'GLON',
 'GLAT',
 'APOGEE_TARGET1',
 'APOGEE_TARGET2',
 'APOGEE_TARGET3',
 'APOGEE2_TARGET1',
 'APOGEE2_TARGET2',
 'APOGEE2_TARGET3',
 'TARGFLAGS',
 'SURVEY',
 'PROGRAMNAME',
 'NINST',
 'NVISITS',
 'COMBTYPE',
 'COMMISS',
 'SNR',
 'STARFLAG',
 'STARFLAGS',
 'ANDFLAG',
 'ANDFLAGS',
 'VHELIO_AVG',
 'VSCATTER',
 'VERR',
 'VERR_MED',
 'OBSVHELIO_AVG',
 'OBSVSCATTER',
 'OBSVERR',
 'OBSVERR_MED',
 'SYNTHVHELIO_AVG',
 'SYNTHVSCATTER',
 'SYNTHVERR',
 'SYNTHVERR_MED',
 'RV_TEFF',
 'RV_LOGG',
 'RV_FEH',
 'RV_ALPHA',
 'RV_CARB',
 'RV_CCFWHM',
 'RV_AUTOFWHM',
 'SYNTHSCATTER',
 'STABLERV_CHI2',
 'STABLERV_RCHI2',
 'CHI2_THRESHOLD',
 'STABLERV_CHI2_PROB',
 'MEANFIB',
 'SIGFIB',
 'SNREV',
 'APSTAR_VERSION',
 'ASPCAP_VERSION',
 'RESULTS_VERSION',
 'EXTRATARG',
 'MIN_H',
 'MAX_H',
 'MIN_JK',
 'MAX_JK',
 'PARAM',
 'FPARAM',
 'PARAM_COV',
 'FPARAM_COV',
 'TEFF',
 'TEFF_ERR',
 'LOGG',
 'LOGG_ERR',
 'VMICRO',
 'VMACRO',
 'VSINI',
 'M_H',
 'M_H_ERR',
 'ALPHA_M',
 'ALPHA_M_ERR',
 'ASPCAP_CHI2',
 'ASPCAP_CLASS',
 'ASPCAPFLAG',
 'ASPCAPFLAGS',
 'PARAMFLAG',
 'FELEM',
 'FELEM_ERR',
 'X_H',
 'X_H_ERR',
 'X_M',
 'X_M_ERR',
 'C_FE',
 'CI_FE',
 'N_FE',
 'O_FE',
 'NA_FE',
 'MG_FE',
 'AL_FE',
 'SI_FE',
 'P_FE',
 'S_FE',
 'K_FE',
 'CA_FE',
 'TI_FE',
 'TIII_FE',
 'V_FE',
 'CR_FE',
 'MN_FE',
 'FE_H',
 'CO_FE',
 'NI_FE',
 'CU_FE',
 'GE_FE',
 'RB_FE',
 'CE_FE',
 'ND_FE',
 'YB_FE',
 'C_FE_ERR',
 'CI_FE_ERR',
 'N_FE_ERR',
 'O_FE_ERR',
 'NA_FE_ERR',
 'MG_FE_ERR',
 'AL_FE_ERR',
 'SI_FE_ERR',
 'P_FE_ERR',
 'S_FE_ERR',
 'K_FE_ERR',
 'CA_FE_ERR',
 'TI_FE_ERR',
 'TIII_FE_ERR',
 'V_FE_ERR',
 'CR_FE_ERR',
 'MN_FE_ERR',
 'FE_H_ERR',
 'CO_FE_ERR',
 'NI_FE_ERR',
 'CU_FE_ERR',
 'GE_FE_ERR',
 'RB_FE_ERR',
 'CE_FE_ERR',
 'ND_FE_ERR',
 'YB_FE_ERR',
 'C_FE_FLAG',
 'CI_FE_FLAG',
 'N_FE_FLAG',
 'O_FE_FLAG',
 'NA_FE_FLAG',
 'MG_FE_FLAG',
 'AL_FE_FLAG',
 'SI_FE_FLAG',
 'P_FE_FLAG',
 'S_FE_FLAG',
 'K_FE_FLAG',
 'CA_FE_FLAG',
 'TI_FE_FLAG',
 'TIII_FE_FLAG',
 'V_FE_FLAG',
 'CR_FE_FLAG',
 'MN_FE_FLAG',
 'FE_H_FLAG',
 'CO_FE_FLAG',
 'NI_FE_FLAG',
 'CU_FE_FLAG',
 'GE_FE_FLAG',
 'RB_FE_FLAG',
 'CE_FE_FLAG',
 'ND_FE_FLAG',
 'YB_FE_FLAG',
 'ELEM_CHI2',
 'ELEMFLAG',
 'REDUCTION_ID',
 'SRC_H',
 'WASH_M',
 'WASH_M_ERR',
 'WASH_T2',
 'WASH_T2_ERR',
 'DDO51',
 'DDO51_ERR',
 'IRAC_3_6',
 'IRAC_3_6_ERR',
 'IRAC_4_5',
 'IRAC_4_5_ERR',
 'IRAC_5_8',
 'IRAC_5_8_ERR',
 'IRAC_8_0',
 'IRAC_8_0_ERR',
 'WISE_4_5',
 'WISE_4_5_ERR',
 'TARG_4_5',
 'TARG_4_5_ERR',
 'AK_TARG',
 'AK_TARG_METHOD',
 'AK_WISE',
 'SFD_EBV',
 'WASH_DDO51_GIANT_FLAG',
 'WASH_DDO51_STAR_FLAG',
 'ALL_VISITS',
 'VISITS',
 'ALL_VISIT_PK',
 'VISIT_PK',
 'FPARAM_CLASS',
 'CHI2_CLASS',
 'GAIA_SOURCE_ID',
 'GAIA_PARALLAX',
 'GAIA_PARALLAX_ERROR',
 'GAIA_PMRA',
 'GAIA_PMRA_ERROR',
 'GAIA_PMDEC',
 'GAIA_PMDEC_ERROR',
 'GAIA_PHOT_G_MEAN_MAG',
 'GAIA_PHOT_BP_MEAN_MAG',
 'GAIA_PHOT_RP_MEAN_MAG',
 'TEFF_SPEC',
 'LOGG_SPEC']

We can compute the same log-likelihood ratio from the MCMC runs as before to select out confident binary systems that have unimodal / well-determined orbital properties. Again, making a very conservative cut:

In [20]:
(llr_emcee > 16).sum()
Out[20]:
591

Let's now pick a subset with good phase coverage:

In [21]:
subset = meta_emcee[(llr_emcee > 16) & 
                    (meta_emcee['phase_coverage_per_period'] >= 4) &
                    (meta_emcee['phase_coverage_emcee'] > 0.8) & 
                    (meta_emcee['max_unmarginalized_ln_likelihood_emcee'] > -10)]
len(subset)
Out[21]:
29

And plot a few of these systems:

In [53]:
with h5py.File(path.join(data_path, 'thejoker-samples.hdf5'), 'r') as results_f:
    for star in subset[:4]:
        samples = JokerSamples.from_hdf5(results_f[star['APOGEE_ID']], 
                                         poly_trend=poly_trend)
    
        _visits = allvisit[allvisit['APOGEE_ID'] == star['APOGEE_ID']]
        data = get_rvdata(_visits)
        
        fig, ax = plt.subplots(1, 1, figsize=(8, 5))
        fig = plot_phase_fold(data, samples[0], ax=ax)
        fig.tight_layout()
INFO: Filtering 1 NaN/Inf data points [thejoker.data]
INFO: Filtering 1 NaN/Inf data points [thejoker.data]

Please let Adrian (adrianmpw@gmail.com) know if you encounter any surprises or issues, or have requests / suggestions.