Skip to content

BNL

Use the Xcache servers

Both BNL and SLAC have set up the Xcache servers, to help cache locally the file on the grid or CERN EOS. Currently there are 60TB on the BNL Xcache server, and 20TB on the SLAC Xcache server.

The Xcache servers

  • provide rucioN2N feature, enabling users to access any files on the grid without knowing its exact site location and the file path.
  • and help cache locally the content of remote files actually read in the first access, thus improves the read performance for sequential access. If only partial content of a file is read, then only that part would cached.

You can run the predefined command Xcache_ls.py to generate a clist file (containing a list of physicsl file paths) for given datasets, then use the clist in your jobs.

Run Xcache_ls.py -h to get the full usage

% Xcache_ls.py -h
Usage:
     Xcache_ls.py [options] dsetNamePattern[,dsetNamePattern2[,more patterns]]
  or
     Xcache_ls.py [options] --eos eosPath/
  or
     Xcache_ls.py [options] --eos eosPath/filenamePattern
  or
     Xcache_ls.py [options] dsetListFile

  This script generates a list (clist) of
  Xcache gLFN (global logical filename) access path
  for given datasets on Atlas grid sites.
  Wildcard is supported in the dataset name pattern.

Options:
  -h, --help            show this help message and exit
  -v                    Verbose
  -V, --version         print my version
  -X XCACHESITE, --XcacheSite=XCACHESITE
                        Specify a Xcache server site of BNL or SLAC
                        (default=BNL)
  -o OUTCLISTFILE, --outClistFile=OUTCLISTFILE
                        write the list into a file instead of the screen
  --eos=EOS_PATH, --cerneos=EOS_PATH
                        List files (*.root and *.root.[0-9] on default) on
                        CERN EOS
  -d OUTCLISTDIR, --dirForClist=OUTCLISTDIR
                        write the list into a directory with a file per
                        dataset

However, for large file inputs on the grid, you are recommended to plan ahead and pre-stage them to BNL using R2D2 request or rucio command.

Using Xcache at BNL

Xcache enables to access data remotely and also to cache them locally for faster access in future.

The Xcache server at BNL is root://xrootd03.usatlas.bnl.gov:1094/.

Let us take the input file used in the SLAC example. At SLAC, the inputFile name for outside access (check the file dset-outside.txt at SLAC) is

inputFile=root://griddev03.slac.stanford.edu:2094//xrootd/atlas/atlaslocalgroupdisk/rucio/data16_13TeV/f9/bd/DAOD_SUSY15.11525262._000003.pool.root.1

For Xcache, we need add the Xcache server prefix with two slash characters, that is,

inputFile=root://xrootd03.usatlas.bnl.gov:1094//root://griddev03.slac.stanford.edu:2094//xrootd/atlas/atlaslocalgroupdisk/rucio/data16_13TeV/f9/bd/DAOD_SUSY15.11525262._000003.pool.root.1
cd T3-Example-BNL/Interactive-Job
../bin/Exam_JetsPlot $inputFile > myjob.log 2>&1

Using Xcache (gLFN) at BNL

Xcache at BNL also supports gLFN (global Logical File Name) access, without the need of knowing the exact path of a given filename.

Let us take the same dataset used in the SLAC example.

$ rucio list-dataset-replicas data16_13TeV:data16_13TeV.00311481.physics_Main.merge.DAOD_SUSY15.f758_m1616_r8669_p3185_tid11525262_00
+-------------------------------+---------+---------+
| RSE                           |   FOUND |   TOTAL |
|-------------------------------+---------+---------|
| MWT2_UC_LOCALGROUPDISK        |      39 |      39 |
| OU_OSCER_ATLAS_LOCALGROUPDISK |      39 |      39 |
| AGLT2_LOCALGROUPDISK          |      39 |      39 |
| NERSC_LOCALGROUPDISK          |      39 |      39 |
| BNL-OSG2_LOCALGROUPDISK       |      39 |      39 |
| CERN-PROD_DATADISK            |      39 |      39 |
| NET2_LOCALGROUPDISK           |      39 |      39 |
| SLACXRD_LOCALGROUPDISK        |      39 |      39 |
| SWT2_CPB_LOCALGROUPDISK       |      39 |      39 |
| NET2_DATADISK                 |      39 |      39 |
+-------------------------------+---------+---------+

Let us to list the filenames in the dataset

$ rucio list-content $dset
+-------------------------------------------------------+--------------+
| SCOPE:NAME                                            | [DID TYPE]   |
|-------------------------------------------------------+--------------|
| data16_13TeV:DAOD_SUSY15.11525262._000003.pool.root.1 | FILE         |
| data16_13TeV:DAOD_SUSY15.11525262._000006.pool.root.1 | FILE         |
...

Let us take the second one file.

inputFile=root://xrootd03.usatlas.bnl.gov:1094//atlas/rucio/data16_13TeV:DAOD_SUSY15.11525262._000003.pool.root.1
../bin/Exam_JetsPlot $inputFile > myjob.log 2>&1

Enclosed is a screenshot of the condor running jobs.

$ condor_q
-- Schedd: spar0103.usatlas.bnl.gov : <130.199.48.19:9618?... @ 08/02/19 13:12:36
 ID       OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
35106.0   yesw2000        8/2  13:12   0+00:00:01 R  0    0.3 Exam_JetsPlot

Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0
suspended Total for yesw2000: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running,
0 held, 0 suspended Total for all users: 2 jobs; 0 completed, 0 removed, 0 idle,
2 running, 0 held, 0 suspended