Discussion:
SMB2 Performance Regression for Huge Numbers of Small Files: Excessive Find and Other Requests
awl1 via samba-technical
2017-10-31 16:11:51 UTC
Permalink
Hello Ralph, Jeremy, Andrew, hello fellow Samba experts to whom it may
concern,

first of all, I'm sorry that it took such a long time for me to complete
the full preparation of a reproducible test scenario, packet trace
recording and analysis. Unfortunately, I have been fighting with ongoing
health issues which kept me from making progess as intended...

This is the follow-up for the previous thread "Windows SMB2 client doing
excessive, inefficient SMB2 Find (and other) requests" (from
mid-September) here on samba-technical (and a number of earlier threads
on the samba-user list since late July):
https://lists.samba.org/archive/samba-technical/2017-September/123046.html
https://lists.samba.org/archive/samba-technical/2017-September/123082.html


But finally, here you are now, and the ABSTRACT ("management
summary"...) of my issue report is:

There is a SEVERE PERFORMANCE REGRESSION between SMB (SMB 1.5) and SMB2
(SMB 3.11) performance when looking at a scenario where a Windows 10
client copies a HUGE NUMBER OF SMALL FILES from or to a SMB2 share
drive, regardless of whether this share is provided through Samba (share
being hosted by any SMB2 capable version, tested with Samba 4.7.0) or
Windows 10 itself (share being hosted by Win 10 Pro).

Writing 2000 files to a share slows down by a factor of between 2 and 8
depending on the particular scenario (best SMB1 result: 20 sec, typical
SMB2 result: 90 sec, worst SMB2 result: 164 sec), and reading slows down
by a factor between 1.5 and 4 (best SMB1 result: 15 sec, typical/worst
SMB2 result: 60 seconds).

Also, results vary hugely depending on the type of client used to
initiate the copy process: For the write to share scenario, Windows
Explorer using SMB2 gives notably faster results than "xcopy /s", but
both these SMB2-based results are hugely slower than the results of the
respective tools using SMB1 to access the same share! For the read
scenario, it rather is the opposite: Here, "xcopy /s" using SMB2 even
turns out to be the fastest scenario (even slightly faster than "xcopy
/s" using SMB1), but reading 2000 files through Windows Explorer in the
exact same scenario is about three times slower using SMB2 than it used
to be when using SMB1...

My Wireshark packet tracing has uncovered that the root cause for this
seems NOT to be a server-side issue (neither in Samba nor the Windows
SMB2 service), but rather widely varying and hugely inefficient
communication by the Windows SMB2 client implementation (when compared
to both the Windows SMB1 client or a Linux SMB2 client doing the same
thing). "Hugely inefficient" refers to two main issues here (please find
more details further down):

a) Excessive, inefficient communication consisting of calls that are
repeated multiple times (seemingly without plausible reason/need): To
copy 2000 files, I would expect ~ 2000 calls of each SMB2 operation type
(like Find, GetInfo, Create, Close), but we see up to 10000 such
requests of many SMB2 operation types (i.e. up to 5 times as many as
needed) without reason. This wastes both time (to execute the operation
on the SMB2 server, whether Samba or Windows) and network bandwith (to
communicate SMB2 requests/responses).

b) Inefficient use of the FIND_ID_BOTH_DIRECTORY_INFO operation
(smb2.find.infolevel == 37) (in the Write scenario only!), which is
repeatedly being used with its "Pattern" parameter set to "*" without
plausible reason. The issue here is that with every subsequent file that
has been successfully copied to the share, the
FIND_ID_BOTH_DIRECTORY_INFO response grows in size, as it re-lists all
files that have been successfully copied during previous iterations.
Again, this wastes both network bandwith and server-side execution time.


TEST CASE / REPRODUCER SETUP:

As proposed by Ralph, my reproducer test scenario is as simple as
possible and consists of a single directory "TestDir" containing 2000
empty files (of length 0 - so the smallest files possible...) named
"emptyfile0000.000" to "emptyfile2000.000".

Client machine is always my home office workstation running either
Windows 10 Pro 64-bit (with most recent "Fall Creator's Update", fully
patched to current level) or Ubuntu Linux LTS 16.04 (fully patched to
current level).

SMB1/SMB2 Share is served either by my home office NAS (Thecus
N4200Pro), either running the Thecus-provided Samba 3.6.15 (SMB1 version
1.5) or a self-compiled Samba 4.7.0 (SMB2 version 3.11; including all
Samba compile dependencies in most recent versions), or by another
workstation PC in my home office running Windows 10 Pro (SMB2 version
3.11; using most recent "Fall Creator's Update", fully patched to
current level).

Global section of my Samba smb.conf on the Thecus NAS has been hugely
stripped down and is identical for Samba 3.6.15 and 4.7.0 (note that I
have already switched on Samba's case sensitivity option in order to
speed up handling of many files):

[global]
log file = /var/log/samba/samba.%m
max log size = 50
log level = 1
lock directory = /var/samba
case sensitive = true
default case = lower
preserve case = yes
short preserve case = yes
security = user
guest account = nobody
map to guest = Bad User
workgroup = WORKGROUP
netbios name = N4200PRO


I have recorded Wireshark packet traces in pcapng format for the
following scenarios (AFAICT, not disclosing any really private details
from my home office network any more):

WRITE TO SHARE:
W1) copy TestDir with 2000 files from local file system on a Win 10 SMB1
client (using both Explorer and command-line "xcopy /s") to SMB1 3.6.15
server share
W2) copy from Win 10 SMB2 client (Explorer, "xcopy /s") to SMB2 4.7.0
server share
W3) copy from Win 10 SMB2 client (Explorer, "xcopy /s") to Win 10 SMB2
server share
W4) copy from Linux SMB1 client (using both krusader and command-line
"cp -r") to SMB1 3.6.15 server share
W5) copy from Linux SMB2 client (krusader, "cp -r") to SMB2 4.7.0 server
share
W6) copy from Linux SMB2 client (krusader, "cp -r") to Win 10 SMB2
server share

READ FROM SHARE:
R1) copy TestDir with 2000 files from SMB1 3.6.15 server share to local
file system on a Win 10 SMB1 client (Explorer, "xcopy /s")
R2) copy from SMB2 4.7.0 server share to Win 10 SMB2 client (Explorer,
"xcopy /s")
R3) copy from Win 10 SMB2 server share to Win 10 SMB2 client (Explorer,
"xcopy /s")
R4) copy from SMB1 3.6.15 server share to Linux SMB1 client (krusader,
"cp -r")
R5) copy from SMB2 4.7.0 server share to Linux SMB2 client (krusader,
"cp -r")
R6) copy from Win 10 SMB2 server share to Linux SMB2 client (krusader,
"cp -r")

The above packet trace files are provided here:
http://home.mnet-online.de/awl1/write_to_share.zip (containing W1) to W6))
http://home.mnet-online.de/awl1/read_from_share.zip (containing R1) to R6))


As I am unable to attach binary files to this mail, I have provided the
detailed results of my trace file analysis using an Excel sheet here:
http://home.mnet-online.de/awl1/Inefficient%20Windows%20SMB2%20Client.xls
http://home.mnet-online.de/awl1/Inefficient%20Windows%20SMB2%20Client.pdf

Basically, background color "yellow" in this sheet means
"average/acceptable", "red" means "poor" (very inefficient), and "green"
means "good", i.e. represents target performance that is proven to be
possible when SMB2 communication is efficient (rather than hugely
suboptimal).


DETAILS about above mentioned ISSUES a) and b):

a) Multiple, identical, repeated calls to SMB2 operations per file for
(seemingly) no reason:

Please have a look at the fourth and fifth column of the Excel sheet
where I have listed the numbers and SMB/SMB2 operation types from the
packet traces. It turns out that in almost all of the really slow
scenarios, we see a huge overhead of multiple, repeated calls to SMB2
operations for no reason that would be plausible (at least to me): When
copying a single directory with 2000 empty files, why in the world
should this require e.g. (as in the W2 scenario with an "xcopy /s" client)

* ~ 5500 SMB2 Find operations, of which ~ 500
FIND_ID_BOTH_DIRECTORY_INFO and ~ 5000 FIND_NAME_INFO
* ~ 6000 SMB2 SetInfo operations
* ~ 15500 SMB2 Create operations and
* ~ 15500 SMB2 Close operations

summing up to an execution time of ~ 165 seconds (when the same thing
can be done against the exact same SMB2 server from a Linux SMB2 client
without redundant operations in ~ 21 seconds)? I don't know what might
cause this huge level of redundancy in the Windows SMB2 client
implementation, I can only see its detrimental influence on performance...


b) Inefficient use of the FIND_ID_BOTH_DIRECTORY_INFO operation (in
mainstream "Write" scenarios W2, W3 only):

Why does the SMB2 Find request of type SMB2_FIND_ID_BOTH_DIRECTORY_INFO
(smb2.find.infolevel == 37) always use a wildcard Pattern "*"? This
seems completely unnecessary. While it might be needed to check that a
file with the same name (case [in]sensitive depending on Samba
parameters) is not already existing, it clearly is not necessary to
enumerate all files in the current directory, which is what Pattern "*"
causes in my testing:

From each iteration to the next, having copied one more file to the
target share, the Find Response grows in size, i.e.

Find Response (0x0e)
    [Info Level: SMB2_FIND_ID_BOTH_DIRECTORY_INFO (37)]
    StructureSize: 0x0009
    Info: 7000000000000000a0f5d991fe1bd3010070d1ff911bd301...
        Offset: 0x00000048
        Length: 1120
        FileIdBothDirectoryInfo: .
        FileIdBothDirectoryInfo: ..
        FileIdBothDirectoryInfo: emptyfile0001.000
        FileIdBothDirectoryInfo: emptyfile0002.000
(...)
        FileIdBothDirectoryInfo: emptyfile<n>.000

i.e. the size of the Find Response grows with every single file
successfully copied onto the share, and the current Find Response always
contains the names of all the n (with n running between 0 and 2000)
files that have been successully copied to the share so far.

This results in a much larger trace file: In my testing, the trace file
size for copying 2000 files from a Win10 machine with the buggy client
to a Samba server is ~ 30 MB and contains no less than ~ 2500 Find
requests, while using a Linux client in the exact same scenario to copy
those 2000 files onto the same share, the session trace file is less
than 5 MB in size and contains at most four (4) Find requests (!!!).

Needless to say that of course, this use of the "*" pattern is also
detrimental to both performance and network throughput...


Finally, my request to you:

Can the Samba team please look into the spreadsheet (and trace data) as
provided, confirm that you are able to reproduce the poor performance
and my analysis results, and finally make your peers at Microsoft aware
of these Windows SMB2 client issues? I'll be happy to provide any
further information, packet traces or other things you would like to see
for your assessment.

I would hope that when you address these issues with the Microsoft Samba
team, the chance of seeing these inefficiencies fixed in subsequent
Windows Updates is much better than when I try to do the same as a
single home office user...


Thanks a million one more time for your kind help with this & best regards
Andreas

Loading...