Locking, notify collisions using CTDB on non-clustered share?

Discussion:

Christopher R. Hertel via samba-technical

2018-04-13 04:53:30 UTC

I did some digging in Bugzilla but came up empty. I'm wondering if this is
a known issue:

- I have a clustered file system.
- I have Samba running on <n> nodes of the cluster.
- I have one share on each of the <n> nodes that points to an EXT4
filesystem.

I'm running the same smbtorture tests against the various EXT4 shares, and
getting errors like:
- NT_STATUS_OBJECT_NAME_COLLISION
- NT_STATUS_SHARING_VIOLATION
- Others that represent different forms of access collisions.

CTDB is handling locking and other aspects shared access, so the collisions
are probably occurring at the database level rather than the filesystem
level. Further precautions taken:
- The share names differ from node to node; no duplicates.
- The underlying pathnames also differ. They're in the form:
/path/to/nodename/share

Of course, when clustering is disabled (clustering=no) the errors are no
longer produced. Again, indication that the collisions are occurring at the
DB layer.

Known issue?

Chris -)-----

Volker Lendecke via samba-technical

2018-04-13 10:02:45 UTC

Permalink

Hi, Chris!

Post by Christopher R. Hertel via samba-technical
I did some digging in Bugzilla but came up empty. I'm wondering if this is
- I have a clustered file system.
- I have Samba running on <n> nodes of the cluster.
- I have one share on each of the <n> nodes that points to an EXT4
filesystem.
I'm running the same smbtorture tests against the various EXT4 shares, and
- NT_STATUS_OBJECT_NAME_COLLISION
- NT_STATUS_SHARING_VIOLATION
- Others that represent different forms of access collisions.
CTDB is handling locking and other aspects shared access, so the collisions
are probably occurring at the database level rather than the filesystem
- The share names differ from node to node; no duplicates.
/path/to/nodename/share
Of course, when clustering is disabled (clustering=no) the errors are no
longer produced. Again, indication that the collisions are occurring at the
DB layer.

Hmm. I think your setup needs some explanation. Why do you have ctdb
on top of separate ext4's? ctdb was initially designed to take care of
smb level locking on top of a cluster file system.

What I could imagine is that you have collisions in the inode space on
the different "nodes", and this might lead to problems.

Do you come to SambaXP? There we could have a quick chat about that :-)

Volker

--
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:***@sernet.de

Christopher R. Hertel via samba-technical

2018-04-13 16:48:28 UTC

Permalink

Post by Volker Lendecke via samba-technical
Hi, Chris!

Hi, Volker!

Answers to your questions, and more details, below. Thanks!

Post by Volker Lendecke via samba-technical

Hmm. I think your setup needs some explanation. Why do you have ctdb
on top of separate ext4's? ctdb was initially designed to take care of
smb level locking on top of a cluster file system.

In this scenario, we have shares mapped to the clustered file system, but we
also have shares mapped to local file systems.

The 'clustering = ' parameter is a global. I don't know of a way to
indicate to Samba that some shares should be clustered while other shares
are independent.

I see this as a real-world scenario; some shares are high availability, but
others are not, and the non-HA shares are scattered across the multiple
Samba nodes. DFS may even be involved just to make it fun.

In my particular case, I am using this configuration to test the underlying
cluster file system. I run the same smbtorture sub-test against both an
EXT4 share and against a clustered FS share. Then I compare the output
looking for meaningful differences.

This testing all worked fine when I was targeting a single node in the
cluster. To speed things up, each node in the cluster exposed an EXT4 share
and the tests were run in parallel across the cluster. That's when we
started getting various random errors that all indicated collisions of some
sort, but these errors only occurred over the EXT4 shares.

We were careful to give the shares different names:
| Node | Sharename |
| Foo-1 | [EXT4-1] |
| Foo-2 | [EXT4-2] |
| Foo-3 | [EXT4-3] |

...but the errors occurred anyway.

Post by Volker Lendecke via samba-technical
What I could imagine is that you have collisions in the inode space on
the different "nodes", and this might lead to problems.

Yes, I am assuming so.

One thing we tried was also to change the path mapped to the share, as so:
| Node | Sharename | Local Path |
| Foo-1 | [EXT4-1] | /mnt/ext4/ext4-1 |
| Foo-2 | [EXT4-2] | /mnt/ext4/ext4-2 |
| Foo-3 | [EXT4-3] | /mnt/ext4/ext4-3 |

My thought was that, perhaps, the full pathname was being used as a key
somewhere. In fact, yes. When we made this change one set of errors went
away completely. The smb2.notify.dir tests now pass without generating
collision errors (which they had done previously). So that tells me a lot.

Unfortunately, other tests still fail over EXT4. Collisions still show up.

You are probably exactly right about the inode issue. I know that, in our
test rig, the (virtual) drives with EXT4 all have the same device number.

I am planning on adding more drives so that the shares that I expose can
have different device numbers. Another test would be to put all of the EXT4
shares onto just one machine, under different subdirectories, so that the
inode numbers wouldn't have a chance to collide.

Post by Volker Lendecke via samba-technical
Do you come to SambaXP? There we could have a quick chat about that :-)

I plan on being there.

My thought on this problem is that we should be able to support this kind of
mixed-mode cluster without generating collisions for non-clustered shares.
I imagine, however, that adding parameters or changing the keys to identify
non-clustered shares could be a big lift.

Chris -)-----

Ralph Böhme via samba-technical

2018-04-13 18:15:27 UTC

Permalink

Hi Chris,

Post by Christopher R. Hertel via samba-technical
You are probably exactly right about the inode issue. I know that, in our
test rig, the (virtual) drives with EXT4 all have the same device number.

man vfs_fileid

I guess fileid:hostname should do the trick.

-slow

--
Ralph Boehme, Samba Team https://samba.org/
Samba Developer, SerNet GmbH https://sernet.de/en/samba/
GPG Key Fingerprint: FAE2 C608 8A24 2520 51C5
59E4 AA1E 9B71 2639 9E46

Christopher R. Hertel via samba-technical

2018-04-13 19:01:12 UTC

Permalink

Sounds like a plan.

If that fixes it... I'll still want to talk about how this all works
internally in June at the conference. Thanks for the clue!

Chris -)-----

Post by Ralph BÃ¶hme via samba-technical
Hi Chris,

Post by Christopher R. Hertel via samba-technical
You are probably exactly right about the inode issue. I know that, in our
test rig, the (virtual) drives with EXT4 all have the same device number.

man vfs_fileid
I guess fileid:hostname should do the trick.
-slow

Volker Lendecke via samba-technical

2018-04-14 06:31:22 UTC

Permalink

This post might be inappropriate. Click to display it.

Christopher R. Hertel via samba-technical

2018-04-14 15:57:26 UTC

Permalink

Post by Volker Lendecke via samba-technical

Post by Christopher R. Hertel via samba-technical
Sounds like a plan.
If that fixes it... I'll still want to talk about how this all works
internally in June at the conference. Thanks for the clue!

We only have one locking.tdb, indexed by node/device/inode. We have to
make sure that if you share the same file space via different shares
you don't mess up locking. Ralph's proposal fakes up the device on a
per-node basis. This is not 100% bullet proof, as it's based on a hash
into a uint64, but it might help you. The good thing is that this is
per share.

Understood and agreed.

Also, one limit to the hostname algorithm is that it only works on a single
non-clustered share. If there are two non-clustered shares, they would wind
up with the same device id.

One thought I had was to add an algorithm to vfs_fileid that would allow a
fixed device number to be assigned to a share, or combine a fixed part with
the hostname hash. Another option I am considering would be to use
gethostid(3) instead of hostname, though there are some cross-platform
issues to consider there.

I'm doing some additional testing today. The node/device/inode tuple isn't
the only key used. In some of my earlier tests, I found that if the full
path to an object was the same across two machines I would see errors.
Those errors would magically disappear if I simply changed the name of the
directory to which the share was pointing.

I'm going to see if I can reproduce those errors even with vfs_fileid loaded.

Full disclosure: I'm testing on Samba 4.6, which does not include the
hostname algorithm in vfs_fileid. I'm using the fsid algorithm instead, but
it does seem to be working and it does support multiple non-clustered
shares, unlike the hostname algorithm.

Oh... and we're having a marvelous last-blast of winter snowstorm, so I'm
stuck inside anyway. Might as well enjoy some computer time. :-)

Chris -)-----

Christopher R. Hertel via samba-technical

2018-04-15 01:10:48 UTC

Permalink

So...

The vfs_fileid module helps. I'm using the fsid algorithm and that has
resolved several of the errors that were being generated.

...but not all of them. I have found that I also need to change the share
path so that the non-clustered shares each have different paths.

For example:

Node | ShareName | Path
1 | EXT4-1 | /mnt/ext4/sambaShare
2 | EXT4-2 | /mnt/ext4/sambaShare

With that setup, when I run the smb2.notify torture test, I get errors like
this:
ERROR: nchanges=1 action=2 expectedAction=3 filter=0x00000020
and this:
(../source4/torture/smb2/notify.c:437) wrong value for
notify.smb2.out.num_changes 0x14 should be 0x9

These are generated when I run smbtorture against both shares _at the same
time_. There is a certain randomness involved, of course, but these or
similar errors are generated quite reliably with this configuration.

All I have to do is change the paths to:

Node | ShareName | Path
1 | Node-1 | /mnt/ext4/sambaShare-1
2 | Node-2 | /mnt/ext4/sambaShare-2

(...making sure, of course, that those directories exist). Now the errors
magically disappear. (Well, there are some "change_time not setup" but
these also occur on a non-clustered, single instance server so I assume that
they are "normal").

So it seems that the lookup key for Change Notify events is the full local
pathname, not dev/inode.

Some more example errors seen when the share paths are the same (3-node
cluster):
[172.31.47.126] (../source4/torture/smb2/notify.c:118) Incorrect status
NT_STATUS_INVALID_PARAMETER - should be NT_STATUS_OK
[172.31.47.126] (../source4/torture/smb2/notify.c:324) wrong value for
notify.smb2.out.num_changes 0xa should be 0x4
[172.31.47.126] (../source4/torture/smb2/notify.c:636) wrong value for
notify.smb2.out.num_changes 0xb should be 0x9
[172.31.47.194] (../source4/torture/smb2/notify.c:118) Incorrect status
NT_STATUS_INVALID_PARAMETER - should be NT_STATUS_OK
[172.31.47.194] (../source4/torture/smb2/notify.c:394) wrong value for
notify.smb2.out.num_changes 0xb should be 0xa
[172.31.47.194] (../source4/torture/smb2/notify.c:636) wrong value for
notify.smb2.out.num_changes 0xe should be 0x9
[172.31.39.200] (../source4/torture/smb2/notify.c:118) Incorrect status
NT_STATUS_INVALID_PARAMETER - should be NT_STATUS_OK
[172.31.39.200] (../source4/torture/smb2/notify.c:324) wrong value for
notify.smb2.out.num_changes 0xa should be 0x4
[172.31.39.200] (../source4/torture/smb2/notify.c:634) Incorrect status
STATUS_NOTIFY_ENUM_DIR - should be NT_STATUS_OK

Chris -)-----

Christopher R. Hertel via samba-technical

2018-04-15 04:20:47 UTC

Permalink

I double checked the errors I was getting just to be sure. Some of the ones
I reported are bogus (as in, they occur on stand-alone systems as well as
matching non-clustered shares within a cluster and so are not of interest).
In the list below, there are two that can be ignored:

Ignore: (../source4/torture/smb2/notify.c:118) Incorrect status
NT_STATUS_INVALID_PARAMETER - should be NT_STATUS_OK

Ignore: (../source4/torture/smb2/notify.c:636) wrong value for
notify.smb2.out.num_changes 0xb should be 0x9

The above two are "normal". Others in the list below are not "normal", in
that they do not appear the share paths are unique.

Post by Christopher R. Hertel via samba-technical
Some more example errors seen when the share paths are the same (3-node
[172.31.47.126] (../source4/torture/smb2/notify.c:118) Incorrect status NT_STATUS_INVALID_PARAMETER - should be NT_STATUS_OK
[172.31.47.126] (../source4/torture/smb2/notify.c:324) wrong value for notify.smb2.out.num_changes 0xa should be 0x4
[172.31.47.126] (../source4/torture/smb2/notify.c:636) wrong value for notify.smb2.out.num_changes 0xb should be 0x9
[172.31.47.194] (../source4/torture/smb2/notify.c:118) Incorrect status NT_STATUS_INVALID_PARAMETER - should be NT_STATUS_OK
[172.31.47.194] (../source4/torture/smb2/notify.c:394) wrong value for notify.smb2.out.num_changes 0xb should be 0xa
[172.31.47.194] (../source4/torture/smb2/notify.c:636) wrong value for notify.smb2.out.num_changes 0xe should be 0x9
[172.31.39.200] (../source4/torture/smb2/notify.c:118) Incorrect status NT_STATUS_INVALID_PARAMETER - should be NT_STATUS_OK
[172.31.39.200] (../source4/torture/smb2/notify.c:324) wrong value for notify.smb2.out.num_changes 0xa should be 0x4
[172.31.39.200] (../source4/torture/smb2/notify.c:634) Incorrect status STATUS_NOTIFY_ENUM_DIR - should be NT_STATUS_OK
Chris -)-----

Jeremy Allison via samba-technical

2018-04-23 20:13:20 UTC

Permalink

Post by Christopher R. Hertel via samba-technical
So...
The vfs_fileid module helps. I'm using the fsid algorithm and that has
resolved several of the errors that were being generated.
...but not all of them. I have found that I also need to change the share
path so that the non-clustered shares each have different paths.
Node | ShareName | Path
1 | EXT4-1 | /mnt/ext4/sambaShare
2 | EXT4-2 | /mnt/ext4/sambaShare
With that setup, when I run the smb2.notify torture test, I get errors like
ERROR: nchanges=1 action=2 expectedAction=3 filter=0x00000020
(../source4/torture/smb2/notify.c:437) wrong value for
notify.smb2.out.num_changes 0x14 should be 0x9
These are generated when I run smbtorture against both shares _at the same
time_. There is a certain randomness involved, of course, but these or
similar errors are generated quite reliably with this configuration.
Node | ShareName | Path
1 | Node-1 | /mnt/ext4/sambaShare-1
2 | Node-2 | /mnt/ext4/sambaShare-2
(...making sure, of course, that those directories exist). Now the errors
magically disappear. (Well, there are some "change_time not setup" but
these also occur on a non-clustered, single instance server so I assume that
they are "normal").
So it seems that the lookup key for Change Notify events is the full local
pathname, not dev/inode.

Yes, that's true. ChangeNotify is pathname based, not inode based.

Christopher R. Hertel via samba-technical

2018-04-23 23:47:10 UTC

Permalink

Below...

Post by Jeremy Allison via samba-technical

Yes, that's true. ChangeNotify is pathname based, not inode based.

Jeremy: Thanks for the confirmation. Quite helpful.

Changing the paths *and* using fileid:fsid in combination has cleared up
most of the errors we were seeing. I have one new SHARING_VIOLATION error
I'm looking into. Again, it's on the non-clusered EXT4 share, so it doesn't
interfere with the cluster we are running, but it does mess with our
testing. I'm digging into it a will let folks know if there are any other
changes that are needed in order to run a mixed-mode cluster.

Thanks!

Chris -)-----