[lustre-discuss] dd oflag=direct error (512 byte Direct I/O)

Discussion:

김형근

2018-10-25 06:05:01 UTC

Hi.
It's a pleasure to meet you, the lustre specialists.
(I do not speak English well ... Thank you for your understanding!)

I used the dd command in lustre mount point. (using the oflag = direct option)

------------------------------------------------------------
dd if = / dev / zero of = / mnt / testfile oflag = direct bs = 512 count = 1
------------------------------------------------------------

I need direct I / O with 512 byte block size.
This is a required check function on the software I use.

But unfortunately, If the direct option is present,
bs must be a multiple of 4K (4096) (for 8K, 12K, 256K, 1M, 8M, etc.) for operation.
For example, if you enter a value such as 512 or 4095, it will not work. The error message is as follows.

'error message: dd: error writing [filename]: invalid argument'

My test system is all up to date. (RHEL, lustre-server, client)
I have used both ldiskfs and zfs for backfile systems. The result is same.

My question is simply two.

1. Why does DirectIO work only in 4k multiples block size?
2. Can I change the settings of the server and client to enable 512bytes of DirectIO?

I wait for your answer. Thank you.

Andreas Dilger

2018-10-25 07:47:58 UTC

Permalink

Post by ê¹íê·¼
Hi.
It's a pleasure to meet you, the lustre specialists.
(I do not speak English well ... Thank you for your understanding!)

Your english is better than my Korean. :-)

Post by ê¹íê·¼
I used the dd command in lustre mount point. (using the oflag = direct option)
------------------------------------------------------------
dd if = / dev / zero of = / mnt / testfile oflag = direct bs = 512 count = 1
------------------------------------------------------------
I need direct I / O with 512 byte block size.
This is a required check function on the software I use.

What software is it? Is it possible to change the application to use
4096-byte alignment?

Post by ê¹íê·¼
But unfortunately, If the direct option is present,
bs must be a multiple of 4K (4096) (for 8K, 12K, 256K, 1M, 8M, etc.) for operation.
For example, if you enter a value such as 512 or 4095, it will not work. The error message is as follows.
'error message: dd: error writing [filename]: invalid argument'
My test system is all up to date. (RHEL, lustre-server, client)
I have used both ldiskfs and zfs for backfile systems. The result is same.
My question is simply two.
1. Why does DirectIO work only in 4k multiples block size?

The client PAGE_SIZE on an x86 system is 4096 bytes. The Lustre client
cannot cache data smaller than PAGE_SIZE, so the current implementation
is limited to have O_DIRECT read/write being a multiple of PAGE_SIZE.

I think the same would happen if you try to use O_DIRECT on a disk with
4096-byte native sector drive (https://en.wikipedia.org/w/index.php?title=Advanced_Format&section=5#4K_native )?

Post by ê¹íê·¼
2. Can I change the settings of the server and client to enable 512bytes of DirectIO?

This would not be possible without changing the Lustre client code.
I don't know how easily this is possible to do and still ensure that
the 512-byte writes are handled correctly.

So far we have not had other requests to change this limitation, so
it is not a high priority to change on our side, especially since
applications will have to deal with 4096-byte sectors in any case.

Cheers, Andreas
---
Andreas Dilger
Principal Lustre Architect
Whamcloud

김형근

2018-10-29 04:39:41 UTC

Permalink

The software I use is RedHat Virtualization. When using Posix compatible FS, it seems to perform direct I / O with a block size of 256512 bytes.

If I can't resolve the issue with my storage configuration, I will contact RedHat.

Your answer was very helpful.
Thank you.

ë³ŽëŽëì¬ë : Andreas Dilger <***@whamcloud.com>

ë°ëì¬ë : ê¹íê·Œ <***@fusiondata.co.kr>

ì°žì¡° : lustre-***@lists.lustre.org <lustre-***@lists.lustre.org>

ë³Žëž ë ì§ : 2018-10-25 16:47:58

ì ëª© : Re: [lustre-discuss] dd oflag=direct error (512 byte Direct I/O)

On Oct 25, 2018, at 15:05, ê¹íê·Œ
<okok102928 fusiondata.co.kr=""> wrote:

>

> Hi.

> It's a pleasure to meet you, the lustre specialists.

> (I do not speak English well ... Thank you for your understanding!)

Your english is better than my Korean. :-)

> I used the dd command in lustre mount point. (using the oflag = direct option)

>

> ------------------------------------------------------------

> dd if = / dev / zero of = / mnt / testfile oflag = direct bs = 512 count = 1

> ------------------------------------------------------------

>

> I need direct I / O with 512 byte block size.

> This is a required check function on the software I use.

What software is it? Is it possible to change the application to use

4096-byte alignment?

> But unfortunately, If the direct option is present,

> bs must be a multiple of 4K (4096) (for 8K, 12K, 256K, 1M, 8M, etc.) for operation.

> For example, if you enter a value such as 512 or 4095, it will not work. The error message is as follows.

>

> 'error message: dd: error writing [filename]: invalid argument'

>

> My test system is all up to date. (RHEL, lustre-server, client)

> I have used both ldiskfs and zfs for backfile systems. The result is same.

>

>

> My question is simply two.

>

> 1. Why does DirectIO work only in 4k multiples block size?

The client PAGE_SIZE on an x86 system is 4096 bytes. The Lustre client

cannot cache data smaller than PAGE_SIZE, so the current implementation

is limited to have O_DIRECT read/write being a multiple of PAGE_SIZE.

I think the same would happen if you try to use O_DIRECT on a disk with

4096-byte native sector drive (https://en.wikipedia.org/w/index.php?title=Advanced_FormatÂ§ion=5#4K_native )?

> 2. Can I change the settings of the server and client to enable 512bytes of DirectIO?

This would not be possible without changing the Lustre client code.

I don't know how easily this is possible to do and still ensure that

the 512-byte writes are handled correctly.

So far we have not had other requests to change this limitation, so

it is not a high priority to change on our side, especially since

applications will have to deal with 4096-byte sectors in any case.

Cheers, Andreas

---

Andreas Dilger

Principal Lustre Architect

Whamcloud

</okok102928>

Patrick Farrell

2018-10-30 15:10:55 UTC

Permalink

Andreas,

An interesting thought on this, as the same limitation came up recently in discussions with a Cray customer. Strictly honoring the direct I/O expectations around data copying is apparently optional. GPFS is a notable example â It allows non page-aligned/page-size direct I/O, but it apparently (This is second hand from a GPFS knowledgeable person, so take with a grain of salt) uses the buffered path (data copy, page cache, etc) and flushes it, O_SYNC style. My understanding from conversations is this is the general approach taken by file systems that support unaligned direct I/O â they cheat a little and do buffered I/O in those cases.

So rather than refusing to perform unaligned direct I/O, we could emulate the approach taken by (some) other file systems. Thereâs no clear standard here, but this is an option others have taken that might improve the user experience. (I believe we persuaded our particular user to switch their code away from direct I/O, since they had no real reason to be using it.)

* Patrick

From: lustre-discuss <lustre-discuss-***@lists.lustre.org> on behalf of ê¹íê·Œ <***@fusiondata.co.kr>
Date: Sunday, October 28, 2018 at 11:40 PM
To: Andreas Dilger <***@whamcloud.com>
Cc: "lustre-***@lists.lustre.org" <lustre-***@lists.lustre.org>
Subject: Re: [lustre-discuss] dd oflag=direct error (512 byte Direct I/O)

The software I use is RedHat Virtualization. When using Posix compatible FS, it seems to perform direct I / O with a block size of 256512 bytes.

If I can't resolve the issue with my storage configuration, I will contact RedHat.

Your answer was very helpful.

Thank you.

________________________________

ë³ŽëŽëì¬ë : Andreas Dilger <***@whamcloud.com>

ë°ëì¬ë : ê¹íê·Œ <***@fusiondata.co.kr>

ì°žì¡° : lustre-***@lists.lustre.org <lustre-***@lists.lustre.org>

ë³Žëž ë ì§ : 2018-10-25 16:47:58

ì ëª© : Re: [lustre-discuss] dd oflag=direct error (512 byte Direct I/O)

On Oct 25, 2018, at 15:05, ê¹íê·Œ

Post by ê¹íê·¼
Hi.
It's a pleasure to meet you, the lustre specialists.
(I do not speak English well ... Thank you for your understanding!)

Your english is better than my Korean. :-)

What software is it? Is it possible to change the application to use
4096-byte alignment?

Post by ê¹íê·¼
2. Can I change the settings of the server and client to enable 512bytes of DirectIO?

Andreas Dilger

2018-10-30 23:09:54 UTC

Permalink

I would totally be fine with that, so long as it works in a reasonable manner.

In theory, it would even be possible to do sub-page uncached writes from the client, and have the OSS handle the read-modify-write of a single page. That would need some help from the CLIO layer to send the small write directly to the OST without going through the page cache (also invalidating any overlapping page from the client cache) and LNet handling the misaligned RDMA properly.

We used to allow misaligned RDMA with the old liblustre client because it didn't ever have any cache, but not with the Linux client. It _might_ be possible to do without major surgery on the servers, and might even speed up sub-page random writes. This would avoid the need to read a whole page over to the client just to overwrite part of it and send it back, and also avoid contending on DLM write locks for non-overlapping regions since the sub-page writes could be sent lockless from the client and the DLM locking and page-aligned IO would be handled on the OSS (that is already in the protocol).

That said, this is definitely more in your area of expertise Patrick (and Jinshan, CC'd).

Cheers, Andreas

Post by Patrick Farrell
Andreas,
An interesting thought on this, as the same limitation came up recently in discussions with a Cray customer. Strictly honoring the direct I/O expectations around data copying is apparently optional. GPFS is a notable example – It allows non page-aligned/page-size direct I/O, but it apparently (This is second hand from a GPFS knowledgeable person, so take with a grain of salt) uses the buffered path (data copy, page cache, etc) and flushes it, O_SYNC style. My understanding from conversations is this is the general approach taken by file systems that support unaligned direct I/O – they cheat a little and do buffered I/O in those cases.
So rather than refusing to perform unaligned direct I/O, we could emulate the approach taken by (some) other file systems. There’s no clear standard here, but this is an option others have taken that might improve the user experience. (I believe we persuaded our particular user to switch their code away from direct I/O, since they had no real reason to be using it.)
• Patrick
Date: Sunday, October 28, 2018 at 11:40 PM
Subject: Re: [lustre-discuss] dd oflag=direct error (512 byte Direct I/O)
The software I use is RedHat Virtualization. When using Posix compatible FS, it seems to perform direct I / O with a block size of 256512 bytes.
If I can't resolve the issue with my storage configuration, I will contact RedHat.
Your answer was very helpful.
Thank you.
보낸 날짜 : 2018-10-25 16:47:58
제목 : Re: [lustre-discuss] dd oflag=direct error (512 byte Direct I/O)
On Oct 25, 2018, at 15:05, 김형근

Post by ê¹íê·¼
Hi.
It's a pleasure to meet you, the lustre specialists.
(I do not speak English well ... Thank you for your understanding!)

Your english is better than my Korean. :-)

What software is it? Is it possible to change the application to use
4096-byte alignment?

The client PAGE_SIZE on an x86 system is 4096 bytes. The Lustre client
cannot cache data smaller than PAGE_SIZE, so the current implementation
is limited to have O_DIRECT read/write being a multiple of PAGE_SIZE.
I think the same would happen if you try to use O_DIRECT on a disk with
4096-byte native sector drive (https://en.wikipedia.org/w/index.php?title=Advanced_Format§ion=5#4K_native )?

Post by ê¹íê·¼
2. Can I change the settings of the server and client to enable 512bytes of DirectIO?

This would not be possible without changing the Lustre client code.
I don't know how easily this is possible to do and still ensure that
the 512-byte writes are handled correctly.
So far we have not had other requests to change this limitation, so
it is not a high priority to change on our side, especially since
applications will have to deal with 4096-byte sectors in any case.
Cheers, Andreas
---
Andreas Dilger
Principal Lustre Architect
Whamcloud

Cheers, Andreas
---
Andreas Dilger
CTO Whamcloud

Continue reading on narkive:

Search results for '[lustre-discuss] dd oflag=direct error (512 byte Direct I/O)' (Questions and Answers)

replies

Does anyone know what dd image is?

started 2009-09-14 15:15:49 UTC

hardware