35. Full data reconstruction Appendix
We will now go over how to fully reconstruct all CAST data from the Run-2 and Run-3 data taking campaigns all the way up to the computation of the observed limit. This assumes the following:
- all CAST data files are located in a directory accessible by a
DATAenvironment variable. Inside should be the directory structure containing directories2017,2018,2018_2andCDL_2019as downloaded from Zenodo (Schmidt 2024). - all binaries used below are compiled and their location is in your
PATHvariable. There is abindirectory in the repository root, which contains symbolic links to the binaries if they are placed next to the source file. I recommend adding this directory to yourPATH.
Note: There is also the
Analysis/ingrid/runAnalysisChainprogram, which automates a majority of these calls. See the section below on how it replaces the data parsing and reconstruction. Here I wish to illustrate the actual steps and don't hide any.
Furthermore, the commands below produce the correct results assuming
the Analysis/ingrid/config.toml file is used as committed as part of
the phd tag in the TimepixAnalysis repository.
35.1. Raw data parsing and reconstruction
If this is all in place, first perform the raw data parsing.
Run-2, background and calibration:
raw_data_manipulation -p $DATA/2017/DataRuns \
--runType rtBackground \
--out $DATA/DataRuns2017_Raw.h5
raw_data_manipulation -p $DATA/2017/CalibrationRuns \
--runType rtCalibration \
--out $DATA/DataRuns2017_Raw.h5
Run-3:
raw_data_manipulation -p $DATA/2018_2/DataRuns \
--runType rtBackground \
--out $DATA/DataRuns2018_Raw.h5
raw_data_manipulation -p $DATA/2018_2/CalibrationRuns \
--runType rtCalibration \
--out $DATA/DataRuns2018_Raw.h5
Next, the initial data reconstruction (geometric properties):
Run-2:
reconstruction -i $DATA/DataRuns2017_Raw.h5 \
-o $DATA/DataRuns2017_Reco.h5
reconstruction -i $DATA/CalibrationRuns2017_Raw.h5 \
-o $DATA/CalibrationRuns2017_Reco.h5
Run-3:
reconstruction -i $DATA/DataRuns2018_Raw.h5 \
-o $DATA/DataRuns2018_Reco.h5
reconstruction -i $DATA/CalibrationRuns2018_Raw.h5 \
-o $DATA/CalibrationRuns2018_Reco.h5
Now, the next steps of the reconstruction (charge calibration, gas gain and FADC reconstruction):
DATA=~/CastData/data
for typ in Data Calibration; do
for year in 2017 2018; do
file="${DATA}/${typ}Runs${year}_Reco.h5"
reconstruction -i $file --only_charge
reconstruction -i $file --only_fadc
reconstruction -i $file --only_gas_gain
done
done
where we simply loop over the Data and Calibration prefixes and
years.
With this done, we can perform the \cefe calibration fits:
reconstruction -i $DATA/CalibrationRuns2017_Reco.h5 --only_fe_spec
reconstruction -i $DATA/CalibrationRuns2018_Reco.h5 --only_fe_spec
and then finally the fit of energy calibration factors determined from each fit against its gas gain:
reconstruction -i $DATA/CalibrationRuns2017_Reco.h5 --only_gain_fit
reconstruction -i $DATA/CalibrationRuns2018_Reco.h5 --only_gain_fit
This then allows to calibrate the energy for each file:
reconstruction -i $DATA/DataRuns2017_Reco.h5 --only_energy_from_e
reconstruction -i $DATA/DataRuns2018_Reco.h5 --only_energy_from_e
reconstruction -i $DATA/CalibrationRuns2017_Reco.h5 --only_energy_from_e
reconstruction -i $DATA/CalibrationRuns2018_Reco.h5 --only_energy_from_e
35.2. Parse and reconstruct the CDL data
In order to use the likelihood cut method we also need to parse and
reconstruct the CDL data. Note that this can be done before even
reconstructing any of the CAST data files (with the exception of
--only_energy_from_e, which is optional anyway).
raw_data_manipulation -p $DATA/CDL_2019/ -r Xray -o $DATA/CDL_2019/CDL_2019_Raw.h5
reconstruction -i $DATA/CDL_2019/CDL_2019_Raw.h5 -o $DATA/CDL_2019/CDL_2019_Reco.h5
reconstruction -i $DATA/CDL_2019/CDL_2019_Reco.h5 --only_charge
reconstruction -i $DATA/CDL_2019/CDL_2019_Reco.h5 --only_fadc
reconstruction -i $DATA/CDL_2019/CDL_2019_Reco.h5 --only_gas_gain
reconstruction -i $DATA/CDL_2019/CDL_2019_Reco.h5 --only_energy_from_e
With this file we can then run cdl_spectrum_calibration in order to
produce the calibration-cdl-2018.h5 file:
cdl_spectrum_creation $DATA/CDL_2019/CDL_2019_reco.h5 \
--genCdlFile --year=2018
35.3. Add tracking information to background files
To add the tracking information, the CAST log files are needed. The
default path (which may be a symbolic link of course) is in
resources/LogFiles/tracking-logs from the TimepixAnalysis root.
for year in 2017 2018; do
cast_log_reader tracking \
-p $TPXDIR/resources/LogFiles/tracking-logs \
--startTime "2017/01/01" \
--endTime "2018/12/31" \
--h5out "${DATA}/DataRuns${year}_Reco.h5"
done
Here we assume TPXDIR is an environment variable to the root of the
TimepixAnalysis repository. Feel free to adjust if your path differs.
35.4. Using runAnalysisChain
All of the commands above can also be performed in one by using runAnalysisChain:
./runAnalysisChain \
-i $DATA \
--outpath $DATA \
--years 2017 --years 2018 \
--calib --back --cdl \
--raw --reco \
--logL \
--tracking
which tells the runAnalysisChain helper to parse all data files for
Run-2 and Run-3, do the same for the CDL data files, reconstruct with
all steps, compute the likelihood values and add the tracking
information to the background data files. It makes the same
assumptions about the location of the data files as in the above.
In particular, if the tracking log files are in a different location,
add the --trackingLogs argument to the above with the correct path.
35.5. Applying a classifier
To apply a classifier we use the likelihood program. We will show
how to apply it for one file now here. Generally, manual calling of
likelihood is not needed, more on that below.
Let's apply the likelihood cut method using all vetoes for the Run-2 data:
likelihood \
-f $DATA/DataRuns2017_Reco.h5 \
--h5out $DATA/classifier/run2_lnL_80_all_vetoes.h5 \
--region=crAll \
--cdlYear=2018 \
--lnL --signalEfficiency=0.8 \
--scintiveto --fadcveto --septemveto --lineveto \
--vetoPercentile=0.99 \
--cdlFile=$DATA/CDL_2019/calibration-cdl-2018.h5 \
--calibFile=$DATA/CalibrationRuns2017_Reco.h5
which should be mostly self explanatory. The --signalEfficiency
argument is the software efficiency of the likelihood
cut. --vetoPercentile refers to the FADC veto cut. The method is
applied to the entire center chip (--region=crAll). A --tracking
flag could be added to apply the classifier only to the tracking data
(in this case it is applied only to non tracking data).
As explained in sec. 12.6, many
different veto setups were considered. For this reason a helper tool
createAllLikelihoodCombinations (excuse the verbose name, heh)
exists to simplify the process of calling likelihood. It is located
in the Analysis directory.
For example:
./createAllLikelihoodCombinations \
--f2017 $DATA/DataRuns2017_Reco.h5 \
--f2018 $DATA/DataRuns2018_Reco.h5 \
--c2017 $DATA/CalibrationRuns2017_Reco.h5 \
--c2018 $DATA/CalibrationRuns2018_Reco.h5 \
--regions crAll \
--vetoSets "{fkMLP, fkFadc, fkScinti, fkSeptem, fkLineVeto, fkExclusiveLineVeto}" \
--mlpPath $MLP/<mlp_of_choice>.pt \
--fadcVetoPercentile 0.99 \
--signalEfficiency 0.85 --signalEfficiency 0.90 \
--signalEfficiency 0.95 --signalEfficiency 0.98 \
--out $DATA/classifier/mlp \
--cdlFile $DATA/CDL_2019/calibration-cdl-2018.h5 \
--multiprocessing \
--jobs 6 \
--dryRun
would apply the MLP classifier (here with a dummy name) with
successive additions of all vetoes in vetoSets at 4 different signal
efficiencies and store the HDF5 output files in
DATA/classifier/mlp. Multiple vetoSets arguments can be given in
one call. Also each fk entry can be prefixed with a + to indicate
that this veto should not be run on its own. This allows
flexibility. As indicated by --multiprocessing and --jobs 6 this
would run 6 processes in parallel. Note that each process can peak at
up to 10 GB of RAM (depending on the used classifier and vetoes).
35.6. Computing limits
To compute limits we now feed each of the produced HDF5 output files
of the likelihood program (in pairs including Run-2 and Run-3 output
files) into mcmc_limit_calculation. As the number of combinations
can be quite large, this can again be automated using the
Analysis/runLimits program.
Let's continue with the example from above and pretend we wish to run
1000 toy candidate sets for each input pair, but only those that
include the line veto. This can be accomplished by using the
--prefix argument to runLimits, which understands standard glob patterns.
./runLimits \
--path $DATA/classifier/mlp \
--prefix "lhood_c18_R2_crAll_*line*" \
--exclude "_septem_" \
--outpath $DATA/limits/mlp/ \
--energyMin 0.2 --energyMax 12.0 \
--axionModel <path_to_differential_axion_flux>.csv \
--axionImage <path_to_axion_image>.csv \
--combinedEfficiencyFile <path_to_detection_efficiency>.csv \
--switchAxes \
--nmc 1000 \
--dryRun
Note that the CSV files mentioned here are found in the resources
directory of the repository of this thesis. See the extended thesis on
how to produce them.
To produce the observed limit, we run:
mcmc_limit_calculation \
limit \
-f <path_to_background_run2>.h5 \
-f <path_to_background_run3>.h5 \
--tracking <path_to_tracking_run2>.h5 \
--tracking <path_to_tracking_run3>.h5 \
--axionModel <path_to_differential_axion_flux>.csv \
--axionImage <path_to_axion_image>.csv \
--combinedEfficiencyFile <path_to_detection_efficiency>.csv \
--switchAxes \
--path "" \
--years 2017 --years 2018 \
--σ_p 0.05 \
--energyMin 0.2 --energyMax 12.0 \
--limitKind lkMCMC \
--outpath ~/phd/Figs/trackingCandidates/ \
--suffix ""
where the <path_to_*_run> refers to the output HDF5 file of the
likelihood program for the best setup as described in
sec. 13.13. --σ_p refers to the position
uncertainty to use. The signal and background uncertainties have the
real value placed as the default.
With this done, you have successfully reproduced the entire final results of the thesis! As mentioned in chapter 3, see the extended thesis for commands, code snippets and more information about how each figure and table is produced exactly.
35.6.1. Produce the axion model, axion image and detection efficiency file extended
For the axion model and axion image, see sec. 37.4.3. For the detection efficiency see sec. 13.10.4.4.1.