Discussion:
Is kernel oplocks = yes a good default?
Christian Ambach
2012-04-11 13:25:59 UTC
Permalink
I was wondering why Samba servers running on Linux are not giving out
Level II oplocks by default and thus cause performance degradation for
certain workloads.

Digging into that, I discovered that this due to "kernel oplocks" set to
yes by default and on the two platforms that have kernel oplock support
code for (Linux and IRIX), level 2 oplocks are not supported by the
kernel. (OneFS is the only platform that has support for them).

Another bad thing is that kernel oplocks is a global parameter. So if an
admin is interested in getting NFS/shell interop for just a certain
share, (s)he cannot turn them off for the other shares to get better
performance from those.

I have worked on a patchset that converts the parameter into a share
option that will allow for more fine-grained configuration.
Please have a look at it.
It makes the raw.oplocks test pass when using kernel oplocks = no for
just the share to be tested.

Additionally, I would like to question the current default value of
kernel oplocks: we shouldn't cut off our users from the performance
benefits of level II oplocks on one of our major platforms by default.

I can update the patchset to also flip the default if this is considered
to be a good idea.

Cheers,
Christian


-------------- next part --------------
A non-text attachment was scrubbed...
Name: oplocks.patch
Type: text/x-patch
Size: 10361 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20120411/74705fdd/attachment.bin>
Jeremy Allison
2012-04-11 17:37:51 UTC
Permalink
Post by Christian Ambach
I was wondering why Samba servers running on Linux are not giving
out Level II oplocks by default and thus cause performance
degradation for certain workloads.
Digging into that, I discovered that this due to "kernel oplocks"
set to yes by default and on the two platforms that have kernel
oplock support code for (Linux and IRIX), level 2 oplocks are not
supported by the kernel. (OneFS is the only platform that has
support for them).
Another bad thing is that kernel oplocks is a global parameter. So
if an admin is interested in getting NFS/shell interop for just a
certain share, (s)he cannot turn them off for the other shares to
get better performance from those.
I have worked on a patchset that converts the parameter into a share
option that will allow for more fine-grained configuration.
Please have a look at it.
It makes the raw.oplocks test pass when using kernel oplocks = no
for just the share to be tested.
Looks very good to me. Do you want to include in a 3.6.x release ?
Post by Christian Ambach
Additionally, I would like to question the current default value of
kernel oplocks: we shouldn't cut off our users from the performance
benefits of level II oplocks on one of our major platforms by
default.
I can update the patchset to also flip the default if this is
considered to be a good idea.
It probably is. The default was set to allow for out-of-the-box
safety for Linux servers exporting the same files by both NFS
and CIFS, but that's probably less used than I thought at the
time, and probably can be expertly set by OEM's who know exactly
what they're doing.

+1 from me (for this patch, and also flipping the default).

Jeremy.
J. Bruce Fields
2012-04-11 18:44:09 UTC
Permalink
Post by Jeremy Allison
Post by Christian Ambach
I was wondering why Samba servers running on Linux are not giving
out Level II oplocks by default and thus cause performance
degradation for certain workloads.
Digging into that, I discovered that this due to "kernel oplocks"
set to yes by default and on the two platforms that have kernel
oplock support code for (Linux and IRIX), level 2 oplocks are not
supported by the kernel. (OneFS is the only platform that has
support for them).
Another bad thing is that kernel oplocks is a global parameter. So
if an admin is interested in getting NFS/shell interop for just a
certain share, (s)he cannot turn them off for the other shares to
get better performance from those.
I have worked on a patchset that converts the parameter into a share
option that will allow for more fine-grained configuration.
Please have a look at it.
It makes the raw.oplocks test pass when using kernel oplocks = no
for just the share to be tested.
Looks very good to me. Do you want to include in a 3.6.x release ?
Post by Christian Ambach
Additionally, I would like to question the current default value of
kernel oplocks: we shouldn't cut off our users from the performance
benefits of level II oplocks on one of our major platforms by default.
I can update the patchset to also flip the default if this is
considered to be a good idea.
It probably is. The default was set to allow for out-of-the-box
safety for Linux servers exporting the same files by both NFS
and CIFS, but that's probably less used than I thought at the
time, and probably can be expertly set by OEM's who know exactly
what they're doing.
+1 from me (for this patch, and also flipping the default).
That said, ideally we'd have kernel support for level 2 oplocks and then
people wouldn't have to make this consistency/performance choice.

(What's stopping that? There's the downgrade problem reported here:

https://lists.samba.org/archive/samba-technical/2012-March/082376.html

which I have on my todo list. Is there anything else you need from the
kernel?)

--b.
Christian Ambach
2012-04-12 16:45:11 UTC
Permalink
Post by J. Bruce Fields
That said, ideally we'd have kernel support for level 2 oplocks and then
people wouldn't have to make this consistency/performance choice.
https://lists.samba.org/archive/samba-technical/2012-March/082376.html
which I have on my todo list. Is there anything else you need from the
kernel?)
My assumption is that the main user of leases in the kernel will be the
NFSv4 server to handle delegations, so I tried to look up what NFSv4
would expect from them.
The NFSv4.x RFCs do not have much detail of potential sources for lease
Post by J. Bruce Fields
o Potentially conflicting OPEN request (or READ/WRITE done with
"special" stateid)
o SETATTR issued by another client
o REMOVE request for the file
o RENAME request for the file as either source or target of the
RENAME
I think those pretty much correlate with what would be reasons for
revocation on Windows. There might be subtle differences for SETATTR as
Windows will not revoke them on any update of file information.

But Windows revokes them on some more occasions, e.g. if byte-range
locks are requested on a file,

A full and very longish description can be found on MSDN:
http://msdn.microsoft.com/en-us/library/ff469343%28v=prot.10%29.aspx
Most important will be the part that starts with "Switch (Operation):"

The downgrade problem will be the most immediate one to solve, but we'll
have to look at other potential differences as well.
Do you have some more information to share about what happens on NFSv4
that goes beyond what I was able to find in the RFCs?

Maybe that would help creating a table of differences and similarities.

Cheers,
Christian
J. Bruce Fields
2012-04-12 17:32:09 UTC
Permalink
Post by Christian Ambach
Post by J. Bruce Fields
That said, ideally we'd have kernel support for level 2 oplocks and then
people wouldn't have to make this consistency/performance choice.
https://lists.samba.org/archive/samba-technical/2012-March/082376.html
which I have on my todo list. Is there anything else you need from the
kernel?)
My assumption is that the main user of leases in the kernel will be
the NFSv4 server to handle delegations, so I tried to look up what
NFSv4 would expect from them.
The NFSv4.x RFCs do not have much detail of potential sources for
Post by J. Bruce Fields
o Potentially conflicting OPEN request (or READ/WRITE done with
"special" stateid)
o SETATTR issued by another client
o REMOVE request for the file
o RENAME request for the file as either source or target of the
RENAME
Also, LINK. (If you add a new hardlink pointing to the file, that
breaks any read delegation on it.)

Basically, opens plus anything that would change the data, metadata, or
set of names pointing to a file should break a read delegation on that
file.
Post by Christian Ambach
I think those pretty much correlate with what would be reasons for
revocation on Windows.
I ask this same question before and got more or less the opposite
answer:

http://www.spinics.net/lists/linux-nfs/msg24336.html

Based on that I've implemented a separate lock type for NFSv4
delegations which I'm using in kernel, and I'm leaving the lease
behavior alone for now.

But I now think that's the right thing to do anyway. We can always find
a way to expose the new lock type to userspace if Samba would rather use
that instead of leases. And that way Samba will know whether it's
getting new semantics or old without having to guess based on running
tests or checking the kernel version.
Post by Christian Ambach
There might be subtle differences for SETATTR
as Windows will not revoke them on any update of file information.
I assume the file size would probably still break the OpLock. But
modifying any other file metadata (permissions, time, ?) wouldn't?
Post by Christian Ambach
But Windows revokes them on some more occasions, e.g. if byte-range
locks are requested on a file,
You wouldn't be able to get a write (exclusive) lock on a read-delegated
file. We don't bother stating that, though, just because you wouldn't
be able to get a write lock without getting a write open first, and that
would have already broken the delegation.

That results in a delegation recall for NFSv4 as well. But we don't
need to say that, because you can't get a conflicting byte range lock
without first getting a conflicting open.
Post by Christian Ambach
http://msdn.microsoft.com/en-us/library/ff469343%28v=prot.10%29.aspx
Most important will be the part that starts with "Switch (Operation):"
Thanks! I'll try to read through it, but more likely I'll need somebody
else to digest it for me....
Post by Christian Ambach
The downgrade problem will be the most immediate one to solve, but
we'll have to look at other potential differences as well.
Do you have some more information to share about what happens on
NFSv4 that goes beyond what I was able to find in the RFCs?
RFC 3530 is the right place to start. Actually, there's a
nearly-completed update that may have some clarifications:

http://datatracker.ietf.org/doc/draft-ietf-nfsv4-rfc3530bis/

Summarizing:

- Read delegations conflict with other clients' write opens (and
therefore also write locks), or any operation by another
client that would modify the file's data, metadata, or set of
names pointing at the file.
- Write delegations conflict with other clients' read and write
opens (of either type; and therefore also byte-range locks of
either type), or any operation (like getattr) by another
client that would *read* the file's data or metadata. But the
server can choose not to break a write lease on GETATTR and
instead request updated attributes from the client holding
the write lease (using a callback, CB_GETATTR).

--b.
J. Bruce Fields
2012-04-12 19:18:38 UTC
Permalink
Post by J. Bruce Fields
Post by Christian Ambach
But Windows revokes them on some more occasions, e.g. if byte-range
locks are requested on a file,
You wouldn't be able to get a write (exclusive) lock on a read-delegated
file. We don't bother stating that, though, just because you wouldn't
be able to get a write lock without getting a write open first, and that
would have already broken the delegation.
That results in a delegation recall for NFSv4 as well. But we don't
need to say that, because you can't get a conflicting byte range lock
without first getting a conflicting open.
(Looks like sometimes I rewrite a paragraph and then leave the original
version in there. Hey, read whichever one you like best!)

--b.

Christian Ambach
2012-04-12 16:44:35 UTC
Permalink
Post by Jeremy Allison
Post by Christian Ambach
I have worked on a patchset that converts the parameter into a share
option that will allow for more fine-grained configuration.
Please have a look at it.
It makes the raw.oplocks test pass when using kernel oplocks = no
for just the share to be tested.
Looks very good to me. Do you want to include in a 3.6.x release ?
Thanks for the review and push.

I'll backport it and file a bug against 3.6.x.
Post by Jeremy Allison
Post by Christian Ambach
Additionally, I would like to question the current default value of
kernel oplocks: we shouldn't cut off our users from the performance
benefits of level II oplocks on one of our major platforms by
default.
It probably is. The default was set to allow for out-of-the-box
safety for Linux servers exporting the same files by both NFS
and CIFS, but that's probably less used than I thought at the
time, and probably can be expertly set by OEM's who know exactly
what they're doing.
+1 from me (for this patch, and also flipping the default).
I don't think we should change the default as part of a bugfix.
That should better be done as part of the next major release.

I'll push another patch that changes the default in master.

Cheers,
Christian
Loading...