Discussion:
[Performance] Samba 3 vs. Samba 4 performance in NetBench
Kaplan, Marc
2005-06-09 05:42:35 UTC
Permalink
List:

I decided it would be fun to compare performance of Samba 3 and Samba 4
using the "industry standard" NetBench tool. I hope to update this
periodically during the development of Samba 3 and Samba 4. Here are
details of the setup:

Clients
* First 8 (higher performance clients): Gigabit P3-1GHz 256MB/RAM
* Last 52 (standard): 100Mbit P3-600MHz, 128MB/RAM

Note: I used this configuration to maximize performance and also to
simulate both higher end and lower end clients in a network.

Server
* Pentium 4, 2.4GHz, 512MB RAM.
* Raid 0 on three WD 250GB ATA drives
* Broadcom Gigabit Network card
* SUSE Enterprise Linux 8, running a hand build 2.6.11.11 kernel

Samba configuration
* Samba 3 and Samba 4 smb.confs attached to this e-mail.
* Samba 3 SVN checkout from 6-4-05 @ 1AM GMT
* Samba 4 SVN checkout from 6-4-05 @ 9AM GMT

Note1: For the Samba 3 oplocks on and off tests I set the proper
smb.conf options.
Note2: For the xattrs test I made sure /Raid was mounted with
user_xattr, and verified that attrs were getting set properly.
Note3: For the eadb test I had posix:eadb = /eadbpath.tdb set in
smb.conf and verified that attrs were getting set properly.

The results are attached in an Excel document (sorry, that's what
NetBench spits out), that should you should be able to open in
OpenOffice.

Attachments:
smb.conf.samba3: The smb.conf I used for Samba 3
smb.conf.samba4: The smb.conf I used for samba 4 (eadb was the last test
I ran so it's still set)
Samba4vsSamba3.xls: Comparison of Samba 3 and Samba 4 performance in
NetBench.

-Marc
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Samba4vsSamba3.xls
Type: application/vnd.ms-excel
Size: 27648 bytes
Desc: Samba4vsSamba3.xls
Url : http://lists.samba.org/archive/samba-technical/attachments/20050608/cbe2aa38/Samba4vsSamba3.xls
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smb.conf.samba4
Type: application/octet-stream
Size: 164 bytes
Desc: smb.conf.samba4
Url : http://lists.samba.org/archive/samba-technical/attachments/20050608/cbe2aa38/smb.conf.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smb.conf.samba3
Type: application/octet-stream
Size: 562 bytes
Desc: smb.conf.samba3
Url : http://lists.samba.org/archive/samba-technical/attachments/20050608/cbe2aa38/smb.conf-0001.obj
Kaplan, Marc
2005-06-09 06:11:20 UTC
Permalink
I should note that the reason I did the oplock on vs. oplock off tests
in Samba 3 is that tridge pointed out to me that Samba 4 doesn't have
oplock support yet. As you can see oplocks provide a huge performance
bump in the Samba 3 tests, and the oplock off test provides a good
baseline to compare Samba 4 against.

-Marc
-----Original Message-----
samba-technical-bounces+mkaplan=***@lists.samba.org
[mailto:samba-technical-bounces+mkaplan=***@lists.samba.or
g]
On Behalf Of Kaplan, Marc
Sent: Wednesday, June 08, 2005 5:42 PM
Subject: [Performance] Samba 3 vs. Samba 4 performance in NetBench
I decided it would be fun to compare performance of Samba 3 and Samba
4
using the "industry standard" NetBench tool. I hope to update this
periodically during the development of Samba 3 and Samba 4. Here are
Clients
* First 8 (higher performance clients): Gigabit P3-1GHz 256MB/RAM
* Last 52 (standard): 100Mbit P3-600MHz, 128MB/RAM
Note: I used this configuration to maximize performance and also to
simulate both higher end and lower end clients in a network.
Server
* Pentium 4, 2.4GHz, 512MB RAM.
* Raid 0 on three WD 250GB ATA drives
* Broadcom Gigabit Network card
* SUSE Enterprise Linux 8, running a hand build 2.6.11.11 kernel
Samba configuration
* Samba 3 and Samba 4 smb.confs attached to this e-mail.
Note1: For the Samba 3 oplocks on and off tests I set the proper
smb.conf options.
Note2: For the xattrs test I made sure /Raid was mounted with
user_xattr, and verified that attrs were getting set properly.
Note3: For the eadb test I had posix:eadb = /eadbpath.tdb set in
smb.conf and verified that attrs were getting set properly.
The results are attached in an Excel document (sorry, that's what
NetBench spits out), that should you should be able to open in
OpenOffice.
smb.conf.samba3: The smb.conf I used for Samba 3
smb.conf.samba4: The smb.conf I used for samba 4 (eadb was the last
test
I ran so it's still set)
Samba4vsSamba3.xls: Comparison of Samba 3 and Samba 4 performance in
NetBench.
-Marc
Jeremy Allison
2005-06-09 07:27:52 UTC
Permalink
Post by Kaplan, Marc
I should note that the reason I did the oplock on vs. oplock off tests
in Samba 3 is that tridge pointed out to me that Samba 4 doesn't have
oplock support yet. As you can see oplocks provide a huge performance
bump in the Samba 3 tests, and the oplock off test provides a good
baseline to compare Samba 4 against.
Marc, FYI: Please hold off on the aio tests against HEAD until
I mail you. I'm tracking down a "lost wakeup" bug. Fun fun fun :-).

Jeremy.
Andrew Tridgell
2005-06-09 08:58:01 UTC
Permalink
Marc,
Post by Kaplan, Marc
I should note that the reason I did the oplock on vs. oplock off tests
in Samba 3 is that tridge pointed out to me that Samba 4 doesn't have
oplock support yet. As you can see oplocks provide a huge performance
bump in the Samba 3 tests, and the oplock off test provides a good
baseline to compare Samba 4 against.
yes, its interesting what a difference the oplocks make, although as I
suspect when you first showed me the results, they don't explain all
of the difference.

I'd like to get to the bottom of the remaining difference. As I've
mentioned to you, my own tests using BENCH-NBENCH, which is supposed
to be a simulation of NetBench, show Samba4 doing better by about as
much as you results show Samba3 doing better! I'd be keen to see
BENCH-NBENCH results for your setup if you can do them. Unfortunately
I can't run a true NetBench run here as I don't have the necessary
hardware.

I did think of one possible explanation this morning. Samba4 and
Samba3 negotiate different buffer sizes. Could that explain such a big
difference? I'd be surprised if it can, but it might be worth looking
at.

The good news is that you see the difference even with a single
client. That should make it possible to analyse the difference by
comparing a sniff of the two situations (with oplocks off). If you
keep the runtime as short as you can while still having a clear
difference then the sniff should be small enough for you to put up
somewhere for download. That will allow us to see exactly which calls
are faster/slower, and whether there is any difference in what calls
are made, and what parameters are used for those calls.

btw, I think NetBench can produce output showing the latency of each
category of call. Do you have that info? It doesn't seem to be in the
spreadsheet you sent.

Cheers, Tridge
Kaplan, Marc
2005-06-10 04:26:27 UTC
Permalink
Tridge,
Post by Andrew Tridgell
I'd like to get to the bottom of the remaining difference. As I've
mentioned to you, my own tests using BENCH-NBENCH, which is supposed
to be a simulation of NetBench, show Samba4 doing better by about as
much as you results show Samba3 doing better! I'd be keen to see
BENCH-NBENCH results for your setup if you can do them. Unfortunately
I can't run a true NetBench run here as I don't have the necessary
hardware.
The BENCH-NBENCH 4 running over //127.0.0.1/share1 results showed that
samba 3 came in at about 26MB/s and samba 4 came in at about 23MB/s. I
need to do some more work here... I'll provide updates when I have them.
Post by Andrew Tridgell
The good news is that you see the difference even with a single
client. That should make it possible to analyse the difference by
comparing a sniff of the two situations (with oplocks off). If you
keep the runtime as short as you can while still having a clear
difference then the sniff should be small enough for you to put up
somewhere for download. That will allow us to see exactly which calls
are faster/slower, and whether there is any difference in what calls
are made, and what parameters are used for those calls.
Ok, I can get this if you want it. These will be very large files, so
maybe I'll put them somewhere for you on samba.org when I get them.
Post by Andrew Tridgell
btw, I think NetBench can produce output showing the latency of each
category of call. Do you have that info? It doesn't seem to be in the
spreadsheet you sent.
I thought you might ask for this but I didn't include the full results
because it would make the file rather large. I attached a new version
that has only the necessary pieces of this data (file is only 100K now
though). Please refer to the original spreadsheet for the graph and the
comparison table, as they are not included in this version.

Opens seem faster on average in Samba 4, but both reads and writes are a
good deal slower.

-Marc
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Samba4vsSamba3-detail2.xls
Type: application/vnd.ms-excel
Size: 93696 bytes
Desc: Samba4vsSamba3-detail2.xls
Url : http://lists.samba.org/archive/samba-technical/attachments/20050609/b28a5ec0/Samba4vsSamba3-detail2.xls
Andrew Tridgell
2005-06-10 08:09:33 UTC
Permalink
Marc,
Post by Kaplan, Marc
The BENCH-NBENCH 4 running over //127.0.0.1/share1 results showed that
samba 3 came in at about 26MB/s and samba 4 came in at about 23MB/s. I
need to do some more work here... I'll provide updates when I have them.
ok, good, being able to reproduce this with BENCH-NBENCH should make
it easier to track down. We'll just need to run through the
differences between our two setups and elminate them one by one until
we get to the critical difference.

I'll do a series of runs here today, and send you the results this
evening. I'll also send you the scripts I'm using and the kernel
version etc.

The ideal result is that we work out why Samba4 is slow in your tests,
and why Samba3 is slow in mine, and improve both!
Post by Kaplan, Marc
Ok, I can get this if you want it. These will be very large files, so
maybe I'll put them somewhere for you on samba.org when I get them.
that sounds good, thanks. Maybe compress them with rzip? It should
handle them quite well.
Post by Kaplan, Marc
Opens seem faster on average in Samba 4, but both reads and writes are a
good deal slower.
The spreadsheet you sent compares Samba4_xattrs to Samba4_eadb (or at
least that is what the tables are titled). Can you give me the
no-xattr, no-oplock comparison details between Samba3 and Samba4?

Cheers, Tridge
Andrew Tridgell
2005-06-10 09:40:10 UTC
Permalink
Marc,
Post by Kaplan, Marc
The BENCH-NBENCH 4 running over //127.0.0.1/share1 results showed that
samba 3 came in at about 26MB/s and samba 4 came in at about 23MB/s. I
need to do some more work here... I'll provide updates when I have them.
Attached is a graph of current Samba3 versus Samba4 performance with
BENCH-NBENCH. The details are:

- kernel 2.6.10
- 2.4 GHz dual Xeon with hyperthreading
- 2G ram
- single SCSI disk
- ext2 filesystem
- no xattr support

I chose ext2 as it tends to provide the least varience between runs
(it doesn't have any periodic journal flushing) and thus tends to be
better for looking purely at smbd performance in isolation. I got
similar results with ext3, and I'd be quite surprised if we find the
filesystem matters for Samba3 vs Samba4 results.

I also attach the scripts I used to produce the results, and the
samba3 and samba4 smb.conf files. Notice I recreate the filesystem
between runs to minimise any systematic errors.

Is there any chance you could try to minimise the software differences
between our two setups, to see if we can narrow down what is causing
us to get such different results?

Cheers, Tridge

-------------- next part --------------
A non-text attachment was scrubbed...
Name: samba3-samba4.png
Type: image/png
Size: 4500 bytes
Desc: not available
Url : Loading Image...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: do_run.sh
Type: application/octet-stream
Size: 467 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba-technical/attachments/20050610/4214c409/do_run.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: extract.sh
Type: application/octet-stream
Size: 157 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba-technical/attachments/20050610/4214c409/extract.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smb.conf.samba3
Type: application/octet-stream
Size: 144 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba-technical/attachments/20050610/4214c409/smb.conf.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smb.conf.samba4
Type: application/octet-stream
Size: 172 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba-technical/attachments/20050610/4214c409/smb.conf-0001.obj
Kaplan, Marc
2005-06-10 21:09:05 UTC
Permalink
The sp
Post by Andrew Tridgell
The spreadsheet you sent compares Samba4_xattrs to Samba4_eadb (or at
least that is what the tables are titled). Can you give me the
no-xattr, no-oplock comparison details between Samba3 and Samba4?
The spreadsheet I sent should have had five tabs in it titled:
Samba3-Oplocks(detail)
Samba3-NoOplocks(detail)
Samba4-noattrs
Samba4-xattrs
Samba4-eadb

Are you not seeing all of these? If not, maybe it's a bug in the
spreadsheet software you're using.

-Marc
John H Terpstra
2005-06-10 21:15:20 UTC
Permalink
Post by Andrew Tridgell
The sp
Post by Andrew Tridgell
The spreadsheet you sent compares Samba4_xattrs to Samba4_eadb (or at
least that is what the tables are titled). Can you give me the
no-xattr, no-oplock comparison details between Samba3 and Samba4?
Samba3-Oplocks(detail)
Samba3-NoOplocks(detail)
Samba4-noattrs
Samba4-xattrs
Samba4-eadb
Are you not seeing all of these? If not, maybe it's a bug in the
spreadsheet software you're using.
-Marc
Marc,

Please could you send me that spreadsheet too. Thanks.

- John T.
--
John H Terpstra
Samba-Team Member
Phone: +1 (650) 580-8668

Author:
The Official Samba-3 HOWTO & Reference Guide, ISBN: 0131453556
Samba-3 by Example, ISBN: 0131472216
Hardening Linux, ISBN: 0072254971
Other books in production.
Andrew Tridgell
2005-06-12 14:14:33 UTC
Permalink
Marc,
Post by Kaplan, Marc
Are you not seeing all of these? If not, maybe it's a bug in the
spreadsheet software you're using.
nope, its a bug in the user who doesn't know enough about spreadsheets :-)

I've had a look at the detailed numbers now, and still can't
understand whats happening. One puzzling thing is that the "engines
participating" looks wrong in a lot of cases. For example, for the
dm_60_clients load, only 55 clients participated with Samba4, whereas
Samba3 had 58. Why weren't both 60? Did you use exactly the same load
file for both runs? If you did, then I wonder why NetBench didn't show
an error for one or both of them.

Have you had a chance to try BENCH-NBENCH with the scripts I sent you?

Any chance to get a sniff of the 1 client cases with oplocks and attrs
off?

Cheers, Tridge
Kaplan, Marc
2005-06-16 06:01:14 UTC
Permalink
Tridge,

The results haven't changed much in the ext2 test with Samba 4 rebuilt
from a straight ./configure (there were no special options configured).

Server attributes: Pentium 4 2.4GHz, 512MB RAM, 3-drive Software RAID-0,
broadcom GigE NIC. SUSE EL8, handbuilt 2.6.11.11 kernel, ext2 file
system.

Samba build: Samba3-SVN-7569(oplocks off), Samba4-SVN-7568(posix:xattrs
= no)

NETBENCH
Clients Samba3 Samba4
1 51.150 42.285
4 203.449 171.702
8 204.407 170.333
12 194.383 159.905
16 189.470 154.434
20 185.257 150.425
24 185.442 149.986
28 184.451 149.125
32 182.569 150.044

This is basically the same pattern as before.

BENCH-NBENCH
Clients Samba3 Samba4
1 31.34 30.41
2 30.87 30.14
3 30.89 30.22
4 30.93 30.13
5 30.85 29.75
6 30.42 29.78
7 27.13 29.79
8 24.96 26.56
9 23.03 24.96
10 23.45 22.88

So still, we're maxed out in terms of CPU and performance at a single
BENCH-NBENCH client. It is worthwhile to note that because I'm doing
this over localhost, the smbtorture processes itself is using CPU (looks
like a 35% smbtorture to 65% smbd split). The difference between samba3
and samba4 in the NBENCH test is so small, that I would say it's not
significant at all.

The next step is to get you a sniff of a one client NetBench, which I'll
try to do soon. I'd also like to run the test on a faster box --
possibly dual or quad CPU, but I'll have to dig around some to find one.


-Marc
-----Original Message-----
Sent: Sunday, June 12, 2005 2:15 AM
To: Kaplan, Marc
Subject: RE: [Performance] Samba 3 vs. Samba 4 performance in NetBench
Marc,
Post by Kaplan, Marc
Are you not seeing all of these? If not, maybe it's a bug in the
spreadsheet software you're using.
nope, its a bug in the user who doesn't know enough about spreadsheets
:-)
I've had a look at the detailed numbers now, and still can't
understand whats happening. One puzzling thing is that the "engines
participating" looks wrong in a lot of cases. For example, for the
dm_60_clients load, only 55 clients participated with Samba4, whereas
Samba3 had 58. Why weren't both 60? Did you use exactly the same load
file for both runs? If you did, then I wonder why NetBench didn't show
an error for one or both of them.
Have you had a chance to try BENCH-NBENCH with the scripts I sent you?
Any chance to get a sniff of the 1 client cases with oplocks and attrs
off?
Cheers, Tridge
Andrew Tridgell
2005-06-21 09:43:05 UTC
Permalink
Marc,

Thank you for sending me the sniffs comparing Samba3 and Samba4 in
netbench. They are certainly interesting!

The differences I have noticed so far are:

1) samba4 negotiates a buffer size 12288 vs 16644 for Samba3

2) you have Samba3 setup in share level security, Samba4 in user level
(probably not significant)

3) The test files already exist with samba3, not with samba4. Maybe
you started the runs in different ways? This shouldn't matter, but
I'm curious as to how it happened given the way netbench works.

4) samba3 reports 1 sector per unit in QUERY_FS_INFO, Samba3 reports 2

5) the read sizes between the two runs are about the same, but the
write sizes are vastly different!


The last point is probably the key factor. If we look at a histogram
of write sizes for Samba3 and Samba4 we see:

Samba3:
1378 (3%) 4
1490 (3%) 4096
2239 (4%) 88
2452 (5%) 16384
2506 (5%) 65536
3422 (6%) 2
3490 (7%) 65534
3778 (7%) 1
8190 (15%) 2048
11634 (22%) 512

Samba4:
946 (1%) 4096
1050 (1%) 4
1728 (2%) 88
1858 (2%) 65536
1877 (2%) 16384
2666 (3%) 65534
2735 (3%) 2
6479 (7%) 2048
8867 (10%) 512
49509 (56%) 1

I have omitted tail ends of the histograms.

So this means that for some reason your client is doing a massive
number (56%) of 1 byte write calls with Samba4, but only a moderate
number (7%) for Samba3. The trick will be to work out why!

My guess is this is caused by the extreme rounding of the allocation
size in Samba3 QUERY_FILE_INFO calls. Samba3 rounds up to the nearest
1MB, whereas Samba4 rounds to the nearest 512 bytes. In the sniffs I
see the client doing long series of 1 byte writes at gaps of about
2k. It doesn't write the data in between, it just does things like
this:

SMB Write AndX Request, FID: 0x0200, 1 byte at offset 4607
SMB Write AndX Request, FID: 0x0200, 1 byte at offset 8703
SMB Write AndX Request, FID: 0x0200, 1 byte at offset 12799
SMB Write AndX Request, FID: 0x0200, 1 byte at offset 16895
SMB Write AndX Request, FID: 0x0200, 1 byte at offset 20991
SMB Write AndX Request, FID: 0x0200, 1 byte at offset 25087
SMB Write AndX Request, FID: 0x0200, 1 byte at offset 29183
SMB Write AndX Request, FID: 0x0200, 1 byte at offset 33279

going on for hundreds of operations. I'm guessing the client is trying
to force allocation of disk blocks. For Samba3 is skips these, as it
thinks it only has to do one write every 1M to force allocation (in
fact, Samba3 is lying and it does really need to do a write every 4k
or so to force allocation, but the client doesn't know that).

I've now added a tuning parameter in Samba4 for this. You can do:

posix:allocationrounding = 0x100000

and it will round to the nearest 1M (same as Samba3).

If you are using xattrs in Samba4 then it instead uses the allocation
information as provided by the client, but for the setup you are
testing (not using xattrs for both Samba3 and Samba4) the above change
should make things match better. I'll be interested to hear what
difference it makes. Note you will need svn version 7793 or later to
get the above option.

To test its working, try putting a small file on a samba4 share, and
using the allinfo command in smbclient to see what the allocation size
is being rounded to.

All this is really just a benchmark hack btw, but it will be
interesting to see what effect it has.

Cheers, Tridge
Jeremy Allison
2005-06-21 10:43:00 UTC
Permalink
Post by Andrew Tridgell
going on for hundreds of operations. I'm guessing the client is trying
to force allocation of disk blocks. For Samba3 is skips these, as it
thinks it only has to do one write every 1M to force allocation (in
fact, Samba3 is lying and it does really need to do a write every 4k
or so to force allocation, but the client doesn't know that).
That's exactly what it's doing. I recently had a reported Samba3 bug
where Word produced corrupted files because these 1 byte probes
didn't return "out of space" when a user was hitting out of quota
(it was a university). The fix was to add a parameter to force
smbd to write out the intervening bytes as zeros before doing the
1 byte write. That then causes the out of space error to be returned
at the correct time so MS-Office doesn't corrupt user files. Samba4
will have to have the same hack to be able to support quota-limited
shares where the admins need to catch this.

The code is found under the "strict allocate" code path in Samba3.

Jeremy.
Andrew Tridgell
2005-06-21 11:25:01 UTC
Permalink
Jeremy,
Post by Jeremy Allison
That's exactly what it's doing.
yep, I remember you telling me about the word problem. It's an
interesting one!

Marc, can you get me an equivalent sniff of netbench using exactly the
same client but with a w2k3 server? Keep everthing else as similar as
you can to the two sniffs you've sent me already. I'd like to find out
if w2k3 is getting this 1 byte write problem, and if not then what it
is that is telling the client to avoid them.

In the sniffs you sent me the client is not making any FS_INFO file
system attribute queries, so it can't be using that to determine if
the server has sparse file support or not, and its not doing any file
query calls or ioctls that control sparseness. That leaves me stumped
as to how w2k3 could be avoiding this (unless its not avoiding it!).

I set the rounding size in Samba4 to 512 in order to pass one of the
ifstest tests. I'm reluctant to change the default as a benchmark hack
and would prefer to find out more about what is causing the client
behaviour.

Cheers, Tridge
Jeremy Allison
2005-06-22 01:04:02 UTC
Permalink
Post by Andrew Tridgell
Jeremy,
Post by Jeremy Allison
That's exactly what it's doing.
yep, I remember you telling me about the word problem. It's an
interesting one!
Marc, can you get me an equivalent sniff of netbench using exactly the
same client but with a w2k3 server? Keep everthing else as similar as
you can to the two sniffs you've sent me already. I'd like to find out
if w2k3 is getting this 1 byte write problem, and if not then what it
is that is telling the client to avoid them.
In the sniffs you sent me the client is not making any FS_INFO file
system attribute queries, so it can't be using that to determine if
the server has sparse file support or not, and its not doing any file
query calls or ioctls that control sparseness. That leaves me stumped
as to how w2k3 could be avoiding this (unless its not avoiding it!).
Well the default on NTFS is non-sparse, and you have to explicitly
set a file sparse to select that, so I'm guessing that netbench
is just doing a good simulation of what Word actually does - ie. it
always does the one-byte allocation write.

Jeremy.
Volker Lendecke
2005-06-22 01:08:46 UTC
Permalink
Post by Jeremy Allison
Well the default on NTFS is non-sparse, and you have to explicitly
set a file sparse to select that, so I'm guessing that netbench
is just doing a good simulation of what Word actually does - ie. it
always does the one-byte allocation write.
I doubt Word has its fingers sticking in here. I've seen it with cygwin dd as
well, and this for almost 100% certain does sequential writes.

Volker
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba-technical/attachments/20050621/4eeeb824/attachment.bin
Andrew Tridgell
2005-06-22 04:50:48 UTC
Permalink
Jeremy,
Post by Jeremy Allison
Well the default on NTFS is non-sparse, and you have to explicitly
set a file sparse to select that, so I'm guessing that netbench
is just doing a good simulation of what Word actually does - ie. it
always does the one-byte allocation write.
This might be right, but I have reasons to suspect it isn't.

First off, as files on windows servers are by default non-sparse, the
application would only need to do 1 write at the desired size, not a
write per 2k of file, amounting to hundreds of extra operations. Doing
all those 1 byte writes would only make sense if the application knew
that the server is using sparse files. I don't see any IOCTL calls in
the sniff, so the application is not asking for sparse files, which
means it should be working on the assumption that the files are
non-sparse.

Secondly, the detailed stats in the spreadsheet Marc sent me don't
show a difference in the number of writes between Samba3 and Samba4,
which means at the level that NetBench is measuring the operations
these massive numbers of small writes are not visible. That implies
that the writes are being done at the redirector level, not at the
application level (as we know NetBench measures the operations it asks
for, not the operations actually sent).

The key will be to see if NetBench does these silly 1 byte writes when
talking to a w2k3 server. If it does, then its just stupid
application/redirector code. If it doesn't then we need to know what
factor the client is using to detect that the writes are not needed.

Cheers, Tridge
Kaplan, Marc
2005-06-23 03:12:03 UTC
Permalink
Tridge:

Looks like the NetBench hack did indeed bring the performance of Samba 4
up to near parity with Samba 3 (no oplocks). I am using the exact same
config as last time as far as the servers and clients go, the only
changes to the setup are that I used Samba4-SVN-7804 and I set
posix:allocationrounding = 0x100000 in smb.conf.

Here are the results:
NETBENCH
Clients Samba3-nooplocks Samba4-normal Samba4-1M alloc
1 51.150 42.285 43.776
4 203.449 171.702 194.142
8 204.407 170.333 193.482
12 194.383 159.905 181.612
16 189.470 154.434 175.575
20 185.257 150.425 171.574
24 185.442 149.986 170.001
28 184.451 149.125 172.312
32 182.569 150.044 173.536

So right now Samba 4 with the alloc hack is consistently 10Mbit/s slower
than samba3. If you think that a network trace when samba has
posix:allocationrounding = 0x100000 would be insightful let me know and
I'll upload it to samba.org.

Also: I did the win2k3 single client NetBench sniff. You can find it in
/home/mkaplan/win2k3-netbench-1client.cap.rz. It's about 50 some Megs
compressed and 2.1GB uncompressed.

-Marc
-----Original Message-----
Sent: Monday, June 20, 2005 9:45 PM
To: Kaplan, Marc
Subject: RE: [Performance] Samba 3 vs. Samba 4 performance in NetBench
Marc,
Thank you for sending me the sniffs comparing Samba3 and Samba4 in
netbench. They are certainly interesting!
1) samba4 negotiates a buffer size 12288 vs 16644 for Samba3
2) you have Samba3 setup in share level security, Samba4 in user level
(probably not significant)
3) The test files already exist with samba3, not with samba4. Maybe
you started the runs in different ways? This shouldn't matter, but
I'm curious as to how it happened given the way netbench works.
4) samba3 reports 1 sector per unit in QUERY_FS_INFO, Samba3 reports 2
5) the read sizes between the two runs are about the same, but the
write sizes are vastly different!
The last point is probably the key factor. If we look at a histogram
1378 (3%) 4
1490 (3%) 4096
2239 (4%) 88
2452 (5%) 16384
2506 (5%) 65536
3422 (6%) 2
3490 (7%) 65534
3778 (7%) 1
8190 (15%) 2048
11634 (22%) 512
946 (1%) 4096
1050 (1%) 4
1728 (2%) 88
1858 (2%) 65536
1877 (2%) 16384
2666 (3%) 65534
2735 (3%) 2
6479 (7%) 2048
8867 (10%) 512
49509 (56%) 1
I have omitted tail ends of the histograms.
So this means that for some reason your client is doing a massive
number (56%) of 1 byte write calls with Samba4, but only a moderate
number (7%) for Samba3. The trick will be to work out why!
My guess is this is caused by the extreme rounding of the allocation
size in Samba3 QUERY_FILE_INFO calls. Samba3 rounds up to the nearest
1MB, whereas Samba4 rounds to the nearest 512 bytes. In the sniffs I
see the client doing long series of 1 byte writes at gaps of about
2k. It doesn't write the data in between, it just does things like
SMB Write AndX Request, FID: 0x0200, 1 byte at offset 4607
SMB Write AndX Request, FID: 0x0200, 1 byte at offset 8703
SMB Write AndX Request, FID: 0x0200, 1 byte at offset 12799
SMB Write AndX Request, FID: 0x0200, 1 byte at offset 16895
SMB Write AndX Request, FID: 0x0200, 1 byte at offset 20991
SMB Write AndX Request, FID: 0x0200, 1 byte at offset 25087
SMB Write AndX Request, FID: 0x0200, 1 byte at offset 29183
SMB Write AndX Request, FID: 0x0200, 1 byte at offset 33279
going on for hundreds of operations. I'm guessing the client is trying
to force allocation of disk blocks. For Samba3 is skips these, as it
thinks it only has to do one write every 1M to force allocation (in
fact, Samba3 is lying and it does really need to do a write every 4k
or so to force allocation, but the client doesn't know that).
posix:allocationrounding = 0x100000
and it will round to the nearest 1M (same as Samba3).
If you are using xattrs in Samba4 then it instead uses the allocation
information as provided by the client, but for the setup you are
testing (not using xattrs for both Samba3 and Samba4) the above change
should make things match better. I'll be interested to hear what
difference it makes. Note you will need svn version 7793 or later to
get the above option.
To test its working, try putting a small file on a samba4 share, and
using the allinfo command in smbclient to see what the allocation size
is being rounded to.
All this is really just a benchmark hack btw, but it will be
interesting to see what effect it has.
Cheers, Tridge
Loading...