Kirk, Benjamin (JSC-EG311)
2018-09-25 21:01:50 UTC
Hi all,
Weâre using jobstats under SLURM and have pulled together a tool to integrate SLURM job info and lustre OST/MDT jobstats. The idea is to correlate filesystem use cases with particular applications as targets for refactoring.
In doing so, Iâm seeing some applications really trigger getxattr on the MDT, and others do not. A particular egregious example is below: 360 cores, ~10s of GB output, ~6500 files, but 16,608,476 calls to getxattr during a 4 hour runtime. And this is a nominally compute-bound problem, so that I/O pattern is likely compressed into small windows of time.
The system is CentOS 7.5 / lustre 2.10.5 / zfs-0.7.9, single mdt, 12 OSS, 2 OST each. Default stripe count of 4.
A couple questions:
1) should I care about this? We do see sporadic mdt slowness under zfs, but that doesnât seem rare. Iâm looking for a good way to trace that to jobs / use cases.
2) what types of operations might be triggering the getxattr usage on a moderate amount of files (e.g. what to watch for in the refactoring processâŠ)
Thanks,
-Ben
--------------------------
âŠ.
TRES : cpu=360,node=30,billing=360
RunTime : 04:59:14
GroupId : eg3(3000)
ExitCode : 0:0
MDT:rename : 373
MDT:snapshot_time : 2018-09-21 08:36:29
MDT:setattr : 444
MDT:mkdir : 361
MDT:getattr : 1570
MDT:getxattr : 16608476
MDT:mknod : 265
MDT:rmdir : 1
MDT:samedir_rename : 373
MDT:close : 6331
MDT:unlink : 113
MDT:open : 6345
OST0009:write_bytes : 3.46 GB
OST0008:write_bytes : 3.11 GB
OST0001:write_bytes : 1.01 GB
OST0000:write_bytes : 396.19 MB
OST0005:read_bytes : 8.19 KB
OST0005:write_bytes : 2.38 GB
OST0005:setattr : 1
OST0004:write_bytes : 790.65 MB
OST0007:write_bytes : 3.02 GB
OST0006:write_bytes : 817.14 MB
OST0016:write_bytes : 4.57 GB
OST0017:write_bytes : 5.15 GB
OST0017:setattr : 1
OST0014:write_bytes : 8.8 GB
OST0015:write_bytes : 1.37 GB
OST0012:write_bytes : 7 GB
OST0012:setattr : 1
OST0013:read_bytes : 8.39 MB
OST0013:write_bytes : 8.4 GB
OST0013:setattr : 1
OST0010:write_bytes : 1.98 GB
OST0011:read_bytes : 27.28 MB
OST0011:write_bytes : 9.42 GB
OST000c:read_bytes : 131.07 KB
OST000c:write_bytes : 5.83 GB
OST000c:setattr : 2
OST000b:read_bytes : 28.12 MB
OST000b:write_bytes : 4.23 GB
OST000e:read_bytes : 8.02 MB
OST000e:write_bytes : 7.48 GB
OST000e:setattr : 1
OST000d:write_bytes : 1.21 GB
OST000f:write_bytes : 2.88 GB
Weâre using jobstats under SLURM and have pulled together a tool to integrate SLURM job info and lustre OST/MDT jobstats. The idea is to correlate filesystem use cases with particular applications as targets for refactoring.
In doing so, Iâm seeing some applications really trigger getxattr on the MDT, and others do not. A particular egregious example is below: 360 cores, ~10s of GB output, ~6500 files, but 16,608,476 calls to getxattr during a 4 hour runtime. And this is a nominally compute-bound problem, so that I/O pattern is likely compressed into small windows of time.
The system is CentOS 7.5 / lustre 2.10.5 / zfs-0.7.9, single mdt, 12 OSS, 2 OST each. Default stripe count of 4.
A couple questions:
1) should I care about this? We do see sporadic mdt slowness under zfs, but that doesnât seem rare. Iâm looking for a good way to trace that to jobs / use cases.
2) what types of operations might be triggering the getxattr usage on a moderate amount of files (e.g. what to watch for in the refactoring processâŠ)
Thanks,
-Ben
--------------------------
âŠ.
TRES : cpu=360,node=30,billing=360
RunTime : 04:59:14
GroupId : eg3(3000)
ExitCode : 0:0
MDT:rename : 373
MDT:snapshot_time : 2018-09-21 08:36:29
MDT:setattr : 444
MDT:mkdir : 361
MDT:getattr : 1570
MDT:getxattr : 16608476
MDT:mknod : 265
MDT:rmdir : 1
MDT:samedir_rename : 373
MDT:close : 6331
MDT:unlink : 113
MDT:open : 6345
OST0009:write_bytes : 3.46 GB
OST0008:write_bytes : 3.11 GB
OST0001:write_bytes : 1.01 GB
OST0000:write_bytes : 396.19 MB
OST0005:read_bytes : 8.19 KB
OST0005:write_bytes : 2.38 GB
OST0005:setattr : 1
OST0004:write_bytes : 790.65 MB
OST0007:write_bytes : 3.02 GB
OST0006:write_bytes : 817.14 MB
OST0016:write_bytes : 4.57 GB
OST0017:write_bytes : 5.15 GB
OST0017:setattr : 1
OST0014:write_bytes : 8.8 GB
OST0015:write_bytes : 1.37 GB
OST0012:write_bytes : 7 GB
OST0012:setattr : 1
OST0013:read_bytes : 8.39 MB
OST0013:write_bytes : 8.4 GB
OST0013:setattr : 1
OST0010:write_bytes : 1.98 GB
OST0011:read_bytes : 27.28 MB
OST0011:write_bytes : 9.42 GB
OST000c:read_bytes : 131.07 KB
OST000c:write_bytes : 5.83 GB
OST000c:setattr : 2
OST000b:read_bytes : 28.12 MB
OST000b:write_bytes : 4.23 GB
OST000e:read_bytes : 8.02 MB
OST000e:write_bytes : 7.48 GB
OST000e:setattr : 1
OST000d:write_bytes : 1.21 GB
OST000f:write_bytes : 2.88 GB