Brock Palen
2008-07-21 15:43:39 UTC
Every so often lustre locks up. It will recover eventually. The
process show this self's in 'D' Uninterruptible IO Wait. This case
it was 'ar' making an archive.
Dmesg then shows:
Lustre: nobackup-MDT0000-mdc-00000101fc467800: Connection to service
nobackup-MDT0000 via nid 141.212.30.184 at tcp was lost; in progress
operations using this service will wait for recovery to complete.
LustreError: 167-0: This client was evicted by nobackup-MDT0000; in
progress operations using this service will fail.
LustreError: 17575:0:(client.c:519:ptlrpc_import_delay_req()) @@@
IMP_INVALID req at 0000010189e2f400 x912452/t0
o101->nobackup-MDT0000_UUID at ***@tcp:12 lens 488/768 ref 1
fl Rpc:P/0/0 rc 0/0
LustreError: 17575:0:(mdc_locks.c:423:mdc_finish_enqueue())
ldlm_cli_enqueue: -108
LustreError: 27076:0:(client.c:519:ptlrpc_import_delay_req()) @@@
IMP_INVALID req at 00000101ed528a00 x912464/t0
o101->nobackup-MDT0000_UUID at ***@tcp:12 lens 440/768 ref 1
fl Rpc:/0/0 rc 0/0
LustreError: 27076:0:(mdc_locks.c:423:mdc_finish_enqueue())
ldlm_cli_enqueue: -108
LustreError: 27489:0:(file.c:97:ll_close_inode_openhandle()) inode
12653753 mdc close failed: rc = -108
LustreError: 27489:0:(file.c:97:ll_close_inode_openhandle()) inode
12195682 mdc close failed: rc = -108
LustreError: 27489:0:(file.c:97:ll_close_inode_openhandle()) Skipped
46 previous similar messages
Lustre: nobackup-MDT0000-mdc-00000101fc467800: Connection restored to
service nobackup-MDT0000 using nid 141.212.30.184 at tcp.
LustreError: 11-0: an error occurred while communicating with
141.212.30.184 at tcp. The mds_close operation failed with -116
LustreError: 11-0: an error occurred while communicating with
141.212.30.184 at tcp. The mds_close operation failed with -116
LustreError: 26930:0:(file.c:97:ll_close_inode_openhandle()) inode
11441446 mdc close failed: rc = -116
LustreError: 26930:0:(file.c:97:ll_close_inode_openhandle()) Skipped
113 previous similar messages
Is there special options that should be done on interactive/login
nodes? I remember something about how much memory should be available
on login vs batch nodes. But I don't know how to change that, I just
assumed lustre would use it. Login nodes have 8GB.
__________________________________________________
www.palen.serveftp.net
Center for Advanced Computing
http://cac.engin.umich.edu
brockp at umich.edu
process show this self's in 'D' Uninterruptible IO Wait. This case
it was 'ar' making an archive.
Dmesg then shows:
Lustre: nobackup-MDT0000-mdc-00000101fc467800: Connection to service
nobackup-MDT0000 via nid 141.212.30.184 at tcp was lost; in progress
operations using this service will wait for recovery to complete.
LustreError: 167-0: This client was evicted by nobackup-MDT0000; in
progress operations using this service will fail.
LustreError: 17575:0:(client.c:519:ptlrpc_import_delay_req()) @@@
IMP_INVALID req at 0000010189e2f400 x912452/t0
o101->nobackup-MDT0000_UUID at ***@tcp:12 lens 488/768 ref 1
fl Rpc:P/0/0 rc 0/0
LustreError: 17575:0:(mdc_locks.c:423:mdc_finish_enqueue())
ldlm_cli_enqueue: -108
LustreError: 27076:0:(client.c:519:ptlrpc_import_delay_req()) @@@
IMP_INVALID req at 00000101ed528a00 x912464/t0
o101->nobackup-MDT0000_UUID at ***@tcp:12 lens 440/768 ref 1
fl Rpc:/0/0 rc 0/0
LustreError: 27076:0:(mdc_locks.c:423:mdc_finish_enqueue())
ldlm_cli_enqueue: -108
LustreError: 27489:0:(file.c:97:ll_close_inode_openhandle()) inode
12653753 mdc close failed: rc = -108
LustreError: 27489:0:(file.c:97:ll_close_inode_openhandle()) inode
12195682 mdc close failed: rc = -108
LustreError: 27489:0:(file.c:97:ll_close_inode_openhandle()) Skipped
46 previous similar messages
Lustre: nobackup-MDT0000-mdc-00000101fc467800: Connection restored to
service nobackup-MDT0000 using nid 141.212.30.184 at tcp.
LustreError: 11-0: an error occurred while communicating with
141.212.30.184 at tcp. The mds_close operation failed with -116
LustreError: 11-0: an error occurred while communicating with
141.212.30.184 at tcp. The mds_close operation failed with -116
LustreError: 26930:0:(file.c:97:ll_close_inode_openhandle()) inode
11441446 mdc close failed: rc = -116
LustreError: 26930:0:(file.c:97:ll_close_inode_openhandle()) Skipped
113 previous similar messages
Is there special options that should be done on interactive/login
nodes? I remember something about how much memory should be available
on login vs batch nodes. But I don't know how to change that, I just
assumed lustre would use it. Login nodes have 8GB.
__________________________________________________
www.palen.serveftp.net
Center for Advanced Computing
http://cac.engin.umich.edu
brockp at umich.edu