[lustre-discuss] Lustre FS inodes getting full

Discussion:

Jérôme BECOT

2015-11-06 09:55:18 UTC

Hi,

We face a weird situation here. And i'd like to know if there is
anything wrong and what can I do to fix that.

We have a 30TB system with lustre 2.6 (1 MDS / 2 OSS). The inode usage
is full though :

***@SlurmMaster:~# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda5 0 0 0 - /
udev 8256017 390 8255627 1% /dev
tmpfs 8258094 347 8257747 1% /run
tmpfs 8258094 5 8258089 1% /run/lock
tmpfs 8258094 2 8258092 1% /run/shm
/dev/sdb1 0 0 0 - /home
***@tcp:/lustre 37743327 37492361 250966 100% /scratch
cgroup 8258094 8 8258086 1% /sys/fs/cgroup

***@SlurmMaster:~# lfs df -i
UUID Inodes IUsed IFree IUse% Mounted on
lustre-MDT0000_UUID 1169686528 37413529 1132272999 3%
/scratch[MDT:0]
lustre-OST0000_UUID 17160192 16996738 163454 99%
/scratch[OST:0]
lustre-OST0001_UUID 17160192 16996308 163884 99%
/scratch[OST:1]

filesystem summary: 37740867 37413529 327338 99% /scratch

What is happening here ? I thought we would have a 4 billion files max,
not 16 million ?

Thanks

--
Jérome BECOT

Administrateur Systèmes et Réseaux

Molécules à visée Thérapeutique par des approches in Silico (MTi)
Univ Paris Diderot, UMRS973 Inserm
Case 013
Bât. Lamarck A, porte 412
35, rue Hélène Brion 75205 Paris Cedex 13
France

Tel : 01 57 27 83 82

Mohr Jr, Richard Frank (Rick Mohr)

2015-11-06 14:00:58 UTC

Permalink

Every Lustre file will use an inode on the MDS and at least one inode on an OST (more than one OST is the file stripe count is >1). If your OSTs don't have free inodes, Lustre cannot allocate an object for the file's contents.

The upper limit on the number of files will be the lesser of:

1) number of MDS inodes
2) sum of inodes across all OSTs

But depending upon file size and stripe count, you could end up with less.

-- Rick

> On Nov 6, 2015, at 4:55 AM, Jérôme BECOT <***@inserm.fr> wrote:
>
> Hi,
>
> We face a weird situation here. And i'd like to know if there is anything wrong and what can I do to fix that.
>
> We have a 30TB system with lustre 2.6 (1 MDS / 2 OSS). The inode usage is full though :
>
> ***@SlurmMaster:~# df -i
> Filesystem Inodes IUsed IFree IUse% Mounted on
> /dev/sda5 0 0 0 - /
> udev 8256017 390 8255627 1% /dev
> tmpfs 8258094 347 8257747 1% /run
> tmpfs 8258094 5 8258089 1% /run/lock
> tmpfs 8258094 2 8258092 1% /run/shm
> /dev/sdb1 0 0 0 - /home
> ***@tcp:/lustre 37743327 37492361 250966 100% /scratch
> cgroup 8258094 8 8258086 1% /sys/fs/cgroup
>
> ***@SlurmMaster:~# lfs df -i
> UUID Inodes IUsed IFree IUse% Mounted on
> lustre-MDT0000_UUID 1169686528 37413529 1132272999 3% /scratch[MDT:0]
> lustre-OST0000_UUID 17160192 16996738 163454 99% /scratch[OST:0]
> lustre-OST0001_UUID 17160192 16996308 163884 99% /scratch[OST:1]
>
> filesystem summary: 37740867 37413529 327338 99% /scratch
>
> What is happening here ? I thought we would have a 4 billion files max, not 16 million ?
>
> Thanks
>
> --
> Jérome BECOT
>
> Administrateur Systèmes et Réseaux
>
> Molécules à visée Thérapeutique par des approches in Silico (MTi)
> Univ Paris Diderot, UMRS973 Inserm
> Case 013
> Bât. Lamarck A, porte 412
> 35, rue Hélène Brion 75205 Paris Cedex 13
> France
>
> Tel : 01 57 27 83 82
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-***@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Jérôme BECOT

2015-11-06 14:10:23 UTC

Permalink

Yes that's why I understood. We don't use stripes.

What I don't know is what determines the inodes limit on the OST. I
guess that the underlaying filesystem (i.e. ldiskfs here) is the
culprit. But then on a 15TB OST with ldiskfs, I didn't expect to have a
17M inodes limitation.

We use programs that generates tons of small files, and now we're
getting full by using only 30% of disk space.

Is there any way to increase the max inode number available on the OSTs ?

Here again I guess I will probably not have any other choice than
switching to a ZFS backend ?

Le 06/11/2015 15:00, Mohr Jr, Richard Frank (Rick Mohr) a écrit :
> Every Lustre file will use an inode on the MDS and at least one inode on an OST (more than one OST is the file stripe count is >1). If your OSTs don't have free inodes, Lustre cannot allocate an object for the file's contents.
>
> The upper limit on the number of files will be the lesser of:
>
> 1) number of MDS inodes
> 2) sum of inodes across all OSTs
>
> But depending upon file size and stripe count, you could end up with less.
>
> -- Rick
>
>> On Nov 6, 2015, at 4:55 AM, Jérôme BECOT <***@inserm.fr> wrote:
>>
>> Hi,
>>
>> We face a weird situation here. And i'd like to know if there is anything wrong and what can I do to fix that.
>>
>> We have a 30TB system with lustre 2.6 (1 MDS / 2 OSS). The inode usage is full though :
>>
>> ***@SlurmMaster:~# df -i
>> Filesystem Inodes IUsed IFree IUse% Mounted on
>> /dev/sda5 0 0 0 - /
>> udev 8256017 390 8255627 1% /dev
>> tmpfs 8258094 347 8257747 1% /run
>> tmpfs 8258094 5 8258089 1% /run/lock
>> tmpfs 8258094 2 8258092 1% /run/shm
>> /dev/sdb1 0 0 0 - /home
>> ***@tcp:/lustre 37743327 37492361 250966 100% /scratch
>> cgroup 8258094 8 8258086 1% /sys/fs/cgroup
>>
>> ***@SlurmMaster:~# lfs df -i
>> UUID Inodes IUsed IFree IUse% Mounted on
>> lustre-MDT0000_UUID 1169686528 37413529 1132272999 3% /scratch[MDT:0]
>> lustre-OST0000_UUID 17160192 16996738 163454 99% /scratch[OST:0]
>> lustre-OST0001_UUID 17160192 16996308 163884 99% /scratch[OST:1]
>>
>> filesystem summary: 37740867 37413529 327338 99% /scratch
>>
>> What is happening here ? I thought we would have a 4 billion files max, not 16 million ?
>>
>> Thanks
>>
>> --
>> Jérome BECOT
>>
>> Administrateur Systèmes et Réseaux
>>
>> Molécules à visée Thérapeutique par des approches in Silico (MTi)
>> Univ Paris Diderot, UMRS973 Inserm
>> Case 013
>> Bât. Lamarck A, porte 412
>> 35, rue Hélène Brion 75205 Paris Cedex 13
>> France
>>
>> Tel : 01 57 27 83 82
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-***@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

--
Jérome BECOT

Administrateur Systèmes et Réseaux

Molécules à visée Thérapeutique par des approches in Silico (MTi)
Univ Paris Diderot, UMRS973 Inserm
Case 013
Bât. Lamarck A, porte 412
35, rue Hélène Brion 75205 Paris Cedex 13
France

Tel : 01 57 27 83 82

Dilger, Andreas

2015-11-07 06:05:51 UTC

Permalink

On 2015/11/06, 07:10, "lustre-discuss on behalf of Jérôme BECOT"
<lustre-discuss-***@lists.lustre.org on behalf of
***@inserm.fr> wrote:

>Yes that's why I understood. We don't use stripes.
>
>What I don't know is what determines the inodes limit on the OST. I
>guess that the underlaying filesystem (i.e. ldiskfs here) is the
>culprit. But then on a 15TB OST with ldiskfs, I didn't expect to have a
>17M inodes limitation.
>
>We use programs that generates tons of small files, and now we're
>getting full by using only 30% of disk space.

The default formatting a 15TB OST assumes an average file size of 1MB,
which is normally a safe assumption for Lustre.

>Is there any way to increase the max inode number available on the OSTs?

This can be changed at format time by specifying the average file size
(inode ratio) for the OSTs:

mkfs.lustre ... --mkfsoptions="-i <average_file_size>"

But you may want to specify a slightly smaller average file size to give
some safety margin.

>Here again I guess I will probably not have any other choice than
>switching to a ZFS backend ?

The best way to handle this would be to add one or two more OSTs to the
filesystem that are formatted with the smaller inode ratio, and Lustre
will chose these instead of the full ones. You could then migrate files
from the older OSTs to the new ones until they are empty, reformat them
with the smaller inode ratio, and add them back into the filesystem.

Cheers, Andreas

>Le 06/11/2015 15:00, Mohr Jr, Richard Frank (Rick Mohr) a écrit :
>> Every Lustre file will use an inode on the MDS and at least one inode
>>on an OST (more than one OST is the file stripe count is >1). If your
>>OSTs don't have free inodes, Lustre cannot allocate an object for the
>>file's contents.
>>
>> The upper limit on the number of files will be the lesser of:
>>
>> 1) number of MDS inodes
>> 2) sum of inodes across all OSTs
>>
>> But depending upon file size and stripe count, you could end up with
>>less.
>>
>> -- Rick
>>
>>> On Nov 6, 2015, at 4:55 AM, Jérôme BECOT <***@inserm.fr>
>>>wrote:
>>>
>>> Hi,
>>>
>>> We face a weird situation here. And i'd like to know if there is
>>>anything wrong and what can I do to fix that.
>>>
>>> We have a 30TB system with lustre 2.6 (1 MDS / 2 OSS). The inode usage
>>>is full though :
>>>
>>> ***@SlurmMaster:~# df -i
>>> Filesystem Inodes IUsed IFree IUse% Mounted on
>>> /dev/sda5 0 0 0 - /
>>> udev 8256017 390 8255627 1% /dev
>>> tmpfs 8258094 347 8257747 1% /run
>>> tmpfs 8258094 5 8258089 1% /run/lock
>>> tmpfs 8258094 2 8258092 1% /run/shm
>>> /dev/sdb1 0 0 0 - /home
>>> ***@tcp:/lustre 37743327 37492361 250966 100% /scratch
>>> cgroup 8258094 8 8258086 1%
>>>/sys/fs/cgroup
>>>
>>> ***@SlurmMaster:~# lfs df -i
>>> UUID Inodes IUsed IFree IUse% Mounted
>>>on
>>> lustre-MDT0000_UUID 1169686528 37413529 1132272999 3%
>>>/scratch[MDT:0]
>>> lustre-OST0000_UUID 17160192 16996738 163454 99%
>>>/scratch[OST:0]
>>> lustre-OST0001_UUID 17160192 16996308 163884 99%
>>>/scratch[OST:1]
>>>
>>> filesystem summary: 37740867 37413529 327338 99% /scratch
>>>
>>> What is happening here ? I thought we would have a 4 billion files
>>>max, not 16 million ?
>>>
>>> Thanks
>>>
>>> --
>>> Jérome BECOT

Cheers, Andreas
--
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division

Jérôme BECOT

2015-11-24 14:29:25 UTC

Permalink

Thanks for your answer.

I am sorry I didn't thank you then.

Does reducing the average file size has an impact on the performances ?
Is there a reasonable size where going beyond may make the filesystem
unstable or so ?

We are thinking of an average 100KB file size.

Thank you again

Le 07/11/2015 07:05, Dilger, Andreas a écrit :
> On 2015/11/06, 07:10, "lustre-discuss on behalf of Jérôme BECOT"
> <lustre-discuss-***@lists.lustre.org on behalf of
> ***@inserm.fr> wrote:
>
>> Yes that's why I understood. We don't use stripes.
>>
>> What I don't know is what determines the inodes limit on the OST. I
>> guess that the underlaying filesystem (i.e. ldiskfs here) is the
>> culprit. But then on a 15TB OST with ldiskfs, I didn't expect to have a
>> 17M inodes limitation.
>>
>> We use programs that generates tons of small files, and now we're
>> getting full by using only 30% of disk space.
> The default formatting a 15TB OST assumes an average file size of 1MB,
> which is normally a safe assumption for Lustre.
>
>> Is there any way to increase the max inode number available on the OSTs?
> This can be changed at format time by specifying the average file size
> (inode ratio) for the OSTs:
>
> mkfs.lustre ... --mkfsoptions="-i <average_file_size>"
>
> But you may want to specify a slightly smaller average file size to give
> some safety margin.
>
>> Here again I guess I will probably not have any other choice than
>> switching to a ZFS backend ?
> The best way to handle this would be to add one or two more OSTs to the
> filesystem that are formatted with the smaller inode ratio, and Lustre
> will chose these instead of the full ones. You could then migrate files
> from the older OSTs to the new ones until they are empty, reformat them
> with the smaller inode ratio, and add them back into the filesystem.
>
> Cheers, Andreas
>
>> Le 06/11/2015 15:00, Mohr Jr, Richard Frank (Rick Mohr) a écrit :
>>> Every Lustre file will use an inode on the MDS and at least one inode
>>> on an OST (more than one OST is the file stripe count is >1). If your
>>> OSTs don't have free inodes, Lustre cannot allocate an object for the
>>> file's contents.
>>>
>>> The upper limit on the number of files will be the lesser of:
>>>
>>> 1) number of MDS inodes
>>> 2) sum of inodes across all OSTs
>>>
>>> But depending upon file size and stripe count, you could end up with
>>> less.
>>>
>>> -- Rick
>>>
>>>> On Nov 6, 2015, at 4:55 AM, Jérôme BECOT <***@inserm.fr>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We face a weird situation here. And i'd like to know if there is
>>>> anything wrong and what can I do to fix that.
>>>>
>>>> We have a 30TB system with lustre 2.6 (1 MDS / 2 OSS). The inode usage
>>>> is full though :
>>>>
>>>> ***@SlurmMaster:~# df -i
>>>> Filesystem Inodes IUsed IFree IUse% Mounted on
>>>> /dev/sda5 0 0 0 - /
>>>> udev 8256017 390 8255627 1% /dev
>>>> tmpfs 8258094 347 8257747 1% /run
>>>> tmpfs 8258094 5 8258089 1% /run/lock
>>>> tmpfs 8258094 2 8258092 1% /run/shm
>>>> /dev/sdb1 0 0 0 - /home
>>>> ***@tcp:/lustre 37743327 37492361 250966 100% /scratch
>>>> cgroup 8258094 8 8258086 1%
>>>> /sys/fs/cgroup
>>>>
>>>> ***@SlurmMaster:~# lfs df -i
>>>> UUID Inodes IUsed IFree IUse% Mounted
>>>> on
>>>> lustre-MDT0000_UUID 1169686528 37413529 1132272999 3%
>>>> /scratch[MDT:0]
>>>> lustre-OST0000_UUID 17160192 16996738 163454 99%
>>>> /scratch[OST:0]
>>>> lustre-OST0001_UUID 17160192 16996308 163884 99%
>>>> /scratch[OST:1]
>>>>
>>>> filesystem summary: 37740867 37413529 327338 99% /scratch
>>>>
>>>> What is happening here ? I thought we would have a 4 billion files
>>>> max, not 16 million ?
>>>>
>>>> Thanks
>>>>
>>>> --
>>>> Jérome BECOT
> Cheers, Andreas

--
Jérome BECOT

Administrateur Systèmes et Réseaux

Molécules à visée Thérapeutique par des approches in Silico (MTi)
Univ Paris Diderot, UMRS973 Inserm
Case 013
Bât. Lamarck A, porte 412
35, rue Hélène Brion 75205 Paris Cedex 13
France

Tel : 01 57 27 83 82

Dilger, Andreas

2015-11-25 01:52:57 UTC

Permalink

Having more inodes on the OSTs will increase e2fsck time a bit, and reduce free space a bit, but is not otherwise harmful.

Cheers, Andreas

> On Nov 24, 2015, at 07:29, Jérôme BECOT <***@inserm.fr> wrote:
>
> Thanks for your answer.
>
> I am sorry I didn't thank you then.
>
> Does reducing the average file size has an impact on the performances ?
> Is there a reasonable size where going beyond may make the filesystem unstable or so ?
>
> We are thinking of an average 100KB file size.
>
> Thank you again
>
> Le 07/11/2015 07:05, Dilger, Andreas a écrit :
>> On 2015/11/06, 07:10, "lustre-discuss on behalf of Jérôme BECOT"
>> <lustre-discuss-***@lists.lustre.org on behalf of
>> ***@inserm.fr> wrote:
>>
>>> Yes that's why I understood. We don't use stripes.
>>>
>>> What I don't know is what determines the inodes limit on the OST. I
>>> guess that the underlaying filesystem (i.e. ldiskfs here) is the
>>> culprit. But then on a 15TB OST with ldiskfs, I didn't expect to have a
>>> 17M inodes limitation.
>>>
>>> We use programs that generates tons of small files, and now we're
>>> getting full by using only 30% of disk space.
>> The default formatting a 15TB OST assumes an average file size of 1MB,
>> which is normally a safe assumption for Lustre.
>>
>>> Is there any way to increase the max inode number available on the OSTs?
>> This can be changed at format time by specifying the average file size
>> (inode ratio) for the OSTs:
>>
>> mkfs.lustre ... --mkfsoptions="-i <average_file_size>"
>>
>> But you may want to specify a slightly smaller average file size to give
>> some safety margin.
>>
>>> Here again I guess I will probably not have any other choice than
>>> switching to a ZFS backend ?
>> The best way to handle this would be to add one or two more OSTs to the
>> filesystem that are formatted with the smaller inode ratio, and Lustre
>> will chose these instead of the full ones. You could then migrate files
>> from the older OSTs to the new ones until they are empty, reformat them
>> with the smaller inode ratio, and add them back into the filesystem.
>>
>> Cheers, Andreas
>>
>>> Le 06/11/2015 15:00, Mohr Jr, Richard Frank (Rick Mohr) a écrit :
>>>> Every Lustre file will use an inode on the MDS and at least one inode
>>>> on an OST (more than one OST is the file stripe count is >1). If your
>>>> OSTs don't have free inodes, Lustre cannot allocate an object for the
>>>> file's contents.
>>>>
>>>> The upper limit on the number of files will be the lesser of:
>>>>
>>>> 1) number of MDS inodes
>>>> 2) sum of inodes across all OSTs
>>>>
>>>> But depending upon file size and stripe count, you could end up with
>>>> less.
>>>>
>>>> -- Rick
>>>>
>>>>> On Nov 6, 2015, at 4:55 AM, Jérôme BECOT <***@inserm.fr>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We face a weird situation here. And i'd like to know if there is
>>>>> anything wrong and what can I do to fix that.
>>>>>
>>>>> We have a 30TB system with lustre 2.6 (1 MDS / 2 OSS). The inode usage
>>>>> is full though :
>>>>>
>>>>> ***@SlurmMaster:~# df -i
>>>>> Filesystem Inodes IUsed IFree IUse% Mounted on
>>>>> /dev/sda5 0 0 0 - /
>>>>> udev 8256017 390 8255627 1% /dev
>>>>> tmpfs 8258094 347 8257747 1% /run
>>>>> tmpfs 8258094 5 8258089 1% /run/lock
>>>>> tmpfs 8258094 2 8258092 1% /run/shm
>>>>> /dev/sdb1 0 0 0 - /home
>>>>> ***@tcp:/lustre 37743327 37492361 250966 100% /scratch
>>>>> cgroup 8258094 8 8258086 1%
>>>>> /sys/fs/cgroup
>>>>>
>>>>> ***@SlurmMaster:~# lfs df -i
>>>>> UUID Inodes IUsed IFree IUse% Mounted
>>>>> on
>>>>> lustre-MDT0000_UUID 1169686528 37413529 1132272999 3%
>>>>> /scratch[MDT:0]
>>>>> lustre-OST0000_UUID 17160192 16996738 163454 99%
>>>>> /scratch[OST:0]
>>>>> lustre-OST0001_UUID 17160192 16996308 163884 99%
>>>>> /scratch[OST:1]
>>>>>
>>>>> filesystem summary: 37740867 37413529 327338 99% /scratch
>>>>>
>>>>> What is happening here ? I thought we would have a 4 billion files
>>>>> max, not 16 million ?
>>>>>
>>>>> Thanks
>>>>>
>>>>> --
>>>>> Jérome BECOT
>> Cheers, Andreas
>
> --
> Jérome BECOT
>
> Administrateur Systèmes et Réseaux
>
> Molécules à visée Thérapeutique par des approches in Silico (MTi)
> Univ Paris Diderot, UMRS973 Inserm
> Case 013
> Bât. Lamarck A, porte 412
> 35, rue Hélène Brion 75205 Paris Cedex 13
> France
>
> Tel : 01 57 27 83 82
>