Data reprocessing
Note
Different influxdb user have different priviledges on different databases. The correct influxdb username and password should be set as environment variables before executing the steps described in the following:
Reprocessing of entire datasets (e.g. for preparation of data re-recos or further studies) can be handled by the system similarly to the prompt processing. The main difference is that the runs to be processed are injected all at once rather than being fetched automatically by the system.
The reprocessing infrastructure is also an excellent tool to test new workflows and perform development of the automation system.
Different reprocessing campaigns can be stored inside the same influx database, action on a specific campaign can be performed by specifying the campaign name trhough the -c/--campaign
option of the ecalrunctrl.py
and ecalautomation.py
scripts. In general the campaign name is used as a tag in the influxdb for all measurements (run
, job
, ...).
Creating a new campaign
A new reprocessing campaign can be created using the ecalrunctrl.py
script.
db_name
is the name of the influx database in use (usually one per subdetector with versioning and separation between prompt and rereco). test_rereco
is the name of the new campaign. Further options are described in the script help menu ecalrunctrl.py create --help
.
Adding workflows to the campaign
By default the campaign is created with a single run type all
that filters run by rejecting non global runs. The run types can be customized using the rtype-create
, rtype-update
and rtype-list
sub-commands of the ecalrunctrl.py
script.
Workflows are added to the newly created campaign with:
ecalrunctrl.py --db 'db_name' rtype-update --campaign 'test_rereco' --add workflow1 workflow2 workflow3 --type=all
The --db
and --campaign
options match the one specified at creation time. The default all
run type is updated adding three workflows (space separated argument to the --add
option).
All runs injected in the future in the test_rereco
campaign will be processed by the three workflows.
Note
The workflow name should match the one specified with the task
argument of the Handler class.
Both the git branch name and the Jenkins' item name assocaited to the workflow are irrelevant.
For example to add the pi0-mon workflow the above command line should include --add pi0-mon
.
Injecting runs in the new campaign
Once the new campaign has been created and workflows have been added, data can be injected for reprocessing using the command below.
The --db
and --campaign
options match the one specified at creation time. Data can be injected with the granularity of a single run. Options to inject a single run or group of runs (run ranges, era) are available:
--runs RUNS Comma separated list of
run(s) to inject in the new
campaign.
--range RRANGE Define a range of runs to be
injected (min,max).
--era ERA Inject all runs belonging to
a given CMS data acquisition
era.
Deactivating a campaign
If a campaign is no longer used actively it can be removed from being processed by Jenkins by removing all active tasks.
# list all tasks of the campaign
ecalrunctrl.py --db 'db_name' rtype-list --campaign 'campaign_name'
# remove active tasks of run type 'all'
ecalrunctrl.py --db 'db_name' rtype-update --campaign 'campaign_name' --remove task1,task2,task3 --type all
# remove active tasks of run type 'stable_beams'
ecalrunctrl.py --db 'db_name' rtype-update --campaign 'campaign_name' --remove task4,task5 --type stable_beams
# repeat for all run types with active tasks
# check that there are no active tasks left
ecalrunctrl.py --db 'db_name' rtype-list --campaign 'campaign_name'
To reactivate the campaign processing one can add active tasks to the campaign with the rtype-update
subcommand and the --add
option.