Cifs vfs send error in read 11

Description Brian Wang 2007-01-30 17:33:28 UTC


Description


Brian Wang



2007-01-30 17:33:28 UTC

Most recent kernel where this bug did *NOT* occur:
Distribution: SUSE
Hardware Environment: Intel
Software Environment: SUSE 10, kernel 2.6.16 with CIFS 1.45
Problem Description:

1. with SMP/32 bits, CIFS client freezes the kernel. The problem is the the 
calls to i_size_write doesn't own the inode mutex, which causes i_size_read 
lock the kernel.

2. It looks like CIFS client completely unusable. dmesg shows tons of errors.
(see below). if you run a "find" command agaist a CIFS mount, "find" command 
returns different result everytime. (e.g. run "find /cifs_mnt | wc" returns 
different count everytime even the mounted share is completely static).

The question I'd like to ask: are there any serious users of Linux CIFS 
client? the problem is so easy to reproduce and it just seems completely 
broken and useless. surprisingly, not many bugs are reported?

I have tried different CIFS version from 1.40 to the most recent 1.45 on 
2.6.16 and 2.6.17, the problem is everywhere!


CIFS VFS: Send error in read = -11
 CIFS VFS: No response for cmd 4 mid 17535
 CIFS VFS: No response to cmd 46 mid 17534
 CIFS VFS: No response to cmd 46 mid 17532
 CIFS VFS: Send error in Close = -11
 CIFS VFS: No response to cmd 46 mid 17526
 CIFS VFS: Send error in read = -11
 CIFS VFS: Send error in read = -11
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 17530
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 17529
 CIFS VFS: No response to cmd 46 mid 17527
 CIFS VFS: Send error in read = -11
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 17528
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 17531
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 17525
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 17533
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 17524
 CIFS VFS: Send error in read = -11
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: No response for cmd 50 mid 20566
 CIFS VFS: No response for cmd 50 mid 20565
 CIFS VFS: No response to cmd 46 mid 20567
 CIFS VFS: No response to cmd 46 mid 20559
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 20563
 CIFS VFS: Send error in read = -11
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 20560
 CIFS VFS: No response to cmd 46 mid 20562
 CIFS VFS: No response to cmd 46 mid 20557
 CIFS VFS: Send error in read = -11
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 20555
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 20564
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 20553
 CIFS VFS: Send error in read = -11
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 20551
 CIFS VFS: No response to cmd 46 mid 20550
 CIFS VFS: No response to cmd 46 mid 20561
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 20549
 CIFS VFS: Send error in read = -11
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 20558
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 20548
 CIFS VFS: Send error in read = -11
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 20547
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 20546
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response to cmd 46 mid 20552
 CIFS VFS: Send error in read = -11
 CIFS VFS: No response for cmd 4 mid 20556
 CIFS VFS: Send error in Close = -11
 CIFS VFS: No response to cmd 46 mid 20554
 CIFS VFS: Send error in read = -11
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in Close = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
 CIFS VFS: Send error in read = -9
init_special_inode: bogus i_mode (777)
init_special_inode: bogus i_mode (777)


Steps to reproduce:


1. Mount a  CIFS share from a Windows or SAMBA server with siginificant amount 
of files. (for example, something like /usr on any Linux system).

2. run "cp -R /cifs-mnt /somedir". It happens quick with multiple cp commands.

3. Do "dmesg" and you will see tons of errors.

4. If your system is SMP/32bits, the system will freeze up in a few minutes. 
you can ping it, but you CANNOT do anything else.


Comment 1


Brian Wang



2007-01-30 17:58:11 UTC

More information.

It looks like it is easy to screw up with multiple cp or find commands. I 
think there are some kind of race within the module.


Comment 2


Shirish Pargaonkar



2007-02-07 22:13:16 UTC

I have a SLES10 test machine with two (cpu0 and cpu1) Pentium III (coppermine) 
which are 32 bit processors running 2.6.20 kernel and cifs client version 1.47.
I do not whether this MP machine is SMP (how do you determine that?).

But I mounted /usr from a samba server running on another machine and started
the cp -R command in three different ttys simultaneously.
Do not see any error messages (at least the kinds mentioned in the bug) 
in dmesg and no hang experienced on the machine running cifs client so far.

/usr on the samba server is pretyy big, 2GBytes, it has been quite a few
minutes since copies started and although all three cp commands have not 
completed (have copied 500 or so MBytes each), I guess I would have seen error 
messages and experienced hang by now.

Will check back in the morning.


Comment 3


Brian Wang



2007-02-08 07:36:07 UTC

Shirish,

Built the kernel with SMP enabled is crucial to reproduce this bug and the 
freeze. because i_size_read and i_size_write are completely different on 
SMP/32 bits than on kernel without SMP enabled or 64 bit machines.

You have to rebuild the kernel by change the config file to enable SMP. I 
guess there are a few flags in the config you need, my kernel was built with 
the following.

CONFIG_SMP=y
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_SMP=y

I think I filed another bug on the freeze. The freez problem is reported 
numberous of times if you search GOOGLE. if you check the implementation of 
i_size_read and i_size_write, they will give you hints how it happens, they 
take different path on SMP/32bit platform. Bascially there are a few places 
within cifs module calling i_size_write without holding the inode mutex. Once 
we put the mutex lock/unlock around those calls if the mutex is not held, the 
freez problem would be fixed.


Comment 4


Brian Wang



2007-02-08 07:37:59 UTC

by the way, if you see those error messages, pay attention to the number of 
files you copied (or find if you run "find", pipe it through wc, you will see 
different process report different count).


Comment 5


Shirish Pargaonkar



2007-02-08 08:07:13 UTC

I am restarting the three copies test.  The machine was hung, could only ping
and had to reboot.
But then / filesystem was full.  So not sure if hang was a result of that, it is
very likely.  And there were no errors in dmesg, if there were, they probably
were overwritten.

So this time, I am doing three simultaneous copies of a 1G directory under
/usr (/usr/share).


Comment 6


Shirish Pargaonkar



2007-02-15 07:30:50 UTC

I can't recreate the problem, this time / filesystem did not become 100% full.
I tried few times.  I do not see any error messages in the syslog as well.

I always had the config options 

CONFIG_SMP=y
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_SMP=y

and then I added some more config options like 
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEX=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_VM=y

I will rebuild the kernel by taking out some of the new options I added
and redo the testing, may be they change/affect any timing issues.


Comment 7


Brian Wang



2007-02-15 08:45:19 UTC

Shirish

The freeze problem is easier to see logically by reading the code and put 
WARN_ON(!mutex_is_locked(&file->f_dentry->d_inode->i_mutex)) statement before 
it.

I suggest that you find all the places within the module where "i_size_write" 
is called and check is the inode mutex is held. My undersstanding is that you 
absoultely CANNOT call i_size_wrie without holding the inode mutex, which is 
violated in CIFS module. Sepcifically, in cifs_partialpagewrite, I believe it 
should acquire the i_mutex before cifs_write is called, calling cifs_write 
without holding the mutex result in i_size_write gets called without the lock 
and take the risk of i_size_read freezes the kernel.

There are 2 paths which could end up calling cifs_write, one path is normal 
VFS path, the other is from mm, VFS path seems to always holding i_mutex, but 
mm path doesn't, which could cause the freeze and potential data corruption 
problem.


We have been testing and working on this module for a few weeks, I personally 
believe this module is not for serious use and major work is needed, 
expecially the lock strategy. The errors i filed for this bug we believe is 
caused by the lack of proper lock when the server doesn't respond in time, the 
connection is torn down, a lag was set, but the flag set/check doesn't have 
the proper lock strategy which result in other threads still hold the dead 
connection to throws tons of errors.


Comment 8


Brian Wang



2007-02-15 08:54:59 UTC

sorry about the typos


Comment 9


Shirish Pargaonkar



2007-02-15 09:06:55 UTC

Brian,

Thanks. I am looking at the code and i_size_xxxx calls.
It is just that, if I can_have/have_access_to a system/set-up where I can
consistently recreate the problem, I can verify the fixed code.


Comment 10


Brian Wang



2007-02-15 09:11:11 UTC

All I reported could be reprodued in a few seconds on my machine, which is a 
Intel server with 4 CPUs and 3.6 GB memory.


Comment 11


Shirish Pargaonkar



2007-02-15 09:48:15 UTC

The machine hung.  I could reproduce the problem.  Can ping but nothing else.
Taking out those extra debug config options helped.  The only error message in
syslog I get though is
 init_special_inode: bogus i_mode (755)

Anyway, looking at it.


Comment 12


Steve French



2007-02-21 13:09:56 UTC

*** Bug 7013 has been marked as a duplicate of this bug. ***


Comment 13


Steve French



2007-02-21 13:10:35 UTC

Brian,
I agree - looks like you are correct.

I wonder if the aio/vectored-io/mm changes altered the default locking for that
path.


Comment 14


Brian Wang



2007-02-21 14:18:58 UTC

Thanks , guys.

I added locks to a few places in file.c aand inode.c and the freeze problem is 
pretty much gone,

But I don't think my fix is good fix because I don't have a good overall view 
of VFS&CIFS. I am glad you guys are able to reproduce it and work on it. 

But even without the freeze problem, still tons of those send error in read, 
and tons of errors such as :

Feb 21 15:41:20 angband kernel:  CIFS VFS: No writable handles for inode
Feb 21 15:41:20 angband kernel:  CIFS VFS: No writable handles for inode
Feb 21 15:41:20 angband kernel:  CIFS VFS: No writable handles for inode
Feb 21 15:41:20 angband kernel:  CIFS VFS: No writable handles for inode
Feb 21 15:41:20 angband kernel:  CIFS VFS: No writable handles for inode
Feb 21 15:41:20 angband kernel:  CIFS VFS: No writable handles for inode
Feb 21 15:41:20 angband kernel:  CIFS VFS: No writable handles for inode
Feb 21 15:41:20 angband kernel:  CIFS VFS: No writable handles for inode
Feb 21 15:41:20 angband kernel:  CIFS VFS: No writable handles for inode



Feb 21 15:47:27 angband kernel:  CIFS VFS: Received no data, expecting 398


"No writable handles for inode" I believe is when fsync is called against a fd.


If you think you have a fix, I am willing to test the patch, my problem could 
reprodue the problems pretty quick.


Comment 15


Brian Wang



2007-02-21 14:20:27 UTC

my problem=my program

sorry


Comment 16


Steve French



2007-02-21 15:21:12 UTC

OK - would like to see your patch in any case although I suspect it is similar
to what Shirish has been running.   My worry about locking in CIFS is not about
cifs specific data structures, global variables, linked lists (although as with
every large piece of code, those need to be rereviewed
Instead, the cases like this (locking rules implicit in the use of a worker
function, in this case i_size_write) are easy to fall through the cracks.

Looking at Documentation/filesystems/Locking and related files to refresh my
memory, I noticed that there seems to be some dated information in there
(although a very useful file) relating to i_mutex.  Grepping for i_mutex usage
in the vfs really worries me - there are lots of places it is held across
blocking calls, and if these ever recursed into the memory manager in low memory
situations (balance_inode_pages kind of stuff) could deadlock ... right?

No sense going from a deadlock to a different deadlock, so am thinking that a
better approach to protecting i_size_write of cifs inodes is not to grab a mutex
that is held all over the place (fsync, readdir, ...) in the vfs but to lock the
 cifs i_size writes with something else.


Comment 17


Steve French



2007-02-21 15:25:51 UTC

-9 error on read is bad file handle presumably caused by the connection to the
server dropping while i/o was in progress.

The case of "No writable handle" on write is probably similar - in writepages we
need the inode to be open to write out data to it (which is not a big deal
because before file close it first writes out data) - but perhaps in your case
the session  to the server dropped - I keep stats of the number of reconnects
and session status in /proc/fs/cifs/Stats or /proc/fs/cifs/DebugData


Comment 18


Steve French



2007-02-22 20:42:21 UTC

Shirish's patch seems to fix the problem, but am concerned that a vfs can not
safely take the i_mutex if we are in the context of an fsync (since that path
already seems to take the i_mutex).

Anyone have ideas or a good test case for this?


Comment 19


Brian Wang



2007-02-22 20:52:39 UTC

Where can I get Shirish's patch? would you post it here? 

Thanks


Comment 20


Brian Wang



2007-02-22 20:58:21 UTC

By the way, the locking fix i out in myself runs for more than 3 days before 
it falls apart. It looks like the mm path calls cifs_write takes i_mutex most 
of the time and fails to take it in some very rare cases which is enough to 
break it down after a while.


Comment 21


Brian Wang



2007-02-22 22:09:16 UTC

Shirish,

would you email me your patch or direct me to where I can download it? Can't 
wait to try it out. I appreciate it.

Thanks


Comment 23


Brian Wang



2007-02-23 13:29:59 UTC

Shirish,

Thanks for the patch. It is similar to what I did, but i think your patch may 
lock too many places?. I don't know why you are not running into deadlock, I 
guess your test only does "read" from a CIFS mount? If you also do "write" to 
it, I think you may run into deadlock.

If i only lock only 2 places in inode.c cifs_get_inode_info_unix, 2 places in 
readdir in function , i can pretty much get rid of the freeze.  but if i put 
WARN_ON(!mutex_is_locked(&file->f_dentry->d_inode->i_mutex)); in cifs_write 
before i_size_write, i got 1 warning after a few days running, which means in 
some corner cases the lock is not held, but it is held most of the time. 

Badness in cifs_write at fs/cifs/file.c:966
 [<c01061b3>] show_trace+0x13/0x20
 [<c010633e>] dump_stack+0x1e/0x20
 [<f8ab072c>] cifs_write+0x51c/0x540 [cifs]
 [<f8ab0fe6>] cifs_writepage+0x186/0x230 [cifs]
 [<c015038d>] shrink_zone+0xe4d/0xfe0
 [<c0150cb2>] try_to_free_pages+0x122/0x220
 [<c014bcce>] __alloc_pages+0x15e/0x340
 [<c014da43>] __do_page_cache_readahead+0x103/0x270
 [<c014dc15>] blockable_page_cache_readahead+0x65/0xc0
 [<c014dcdd>] make_ahead_window+0x6d/0xc0
 [<c014de4f>] page_cache_readahead+0x11f/0x1c0
 [<c01472ad>] do_generic_mapping_read+0x51d/0x580
 [<c0147b80>] __generic_file_aio_read+0xf0/0x250
 [<c0147d29>] generic_file_aio_read+0x49/0x60
 [<c0167149>] do_sync_read+0xb9/0xf0
 [<c0167b91>] vfs_read+0xc1/0x150
 [<c0168127>] sys_read+0x47/0x70
 [<c010436b>] sysenter_past_esp+0x54/0x79


Comment 24


Brian Wang



2007-02-23 14:33:39 UTC

I pulled cifs 1.47 from cifs-2.6 git tree, but unfortuately 1.47 doesn't 
compile on my box with kernel 2.6.16-13. Steve, so 1.47 is is not compaitable 
with 2.6.16 ? 


Comment 25


Shirish Pargaonkar



2007-02-23 19:55:48 UTC

Brian,

You are right, I was doing cp -R from cifs mount to local directory.
I will run the test other way and see what happens.

cifs 1.47 will not build against kernel 2.6.16, Steve can tell better but I am
at kernel 2.6.20 and building 1.47 of cifs.


Comment 26


Brian Wang



2007-02-23 21:17:20 UTC

Steve and Shirish,

So what's your thought on using a module global spin lock to protect 
i_size_write? so we don't need to worry about the interaction with VFS and mm. 
any concerns with the approach? I think Steve made the good point on not to 
grab i_mutex.

Also, a question for steve, we have to stay with 2.6.16 and I noticed that 
there are quite a few important fixes in 1.46 and 1.47, I am just wonder if 
those fixes could be back port to 1.45.


Comment 27


Shirish Pargaonkar



2007-02-24 05:52:57 UTC

Started the copy from local directory to cifs mounted directory, machine
did not hang but individual cp commands deadlocked.  
Will work on spinlock solution.


Comment 28


Brian Wang



2007-02-24 08:46:07 UTC

My spin-lock test over night runs pretty well . I am convinced this is a 
better solution than playing with i_mutex. I will keep my test run but I do 
believe the freeze problem should be fixed.

There are 2 parts about this bug, May be I should file a different one? the 
other part I also believe is because of race condition. You can reproduce by 
the same way, run multiple "cp" on same mount point, you will get lots of read 
errors and the some files are corrupted. same thing with writing into the 
mount point. I think it might be better to file it as a seperate bug, what do 
you guys think?


Comment 29


Steve French



2007-02-24 08:55:59 UTC

The spinlock to use seems to be inode->i_lock (see note in struct inode in
linux/include/fs.h).

Not sure how I missed that, but it is not used enough by filesystems yet.


Comment 30


Steve French



2007-02-24 08:57:00 UTC

And yes, it would be easier to file a distinct problem if we see a second
problem, especially if it turns out to be a server or networking issue, it would
be easier to track.


Comment 31


Brian Wang



2007-02-24 09:02:57 UTC

Thanks, Steve, i guess most people missed that! if you look at the note for 
i_size_write, it actually suggest holding i_mutex, I think that should be 
updated!


Comment 32


Brian Wang



2007-02-24 09:14:39 UTC

bug 8077(missing files and file corruption) is filed for the other part of 
this bug.


Comment 33


Anonymous Emailer



2007-02-24 09:52:00 UTC

Reply-To: akpm@linux-foundation.org

<wakes up>

Yes, i_mutex is required during i_size_write().  It might be that we can fix the
CIFS lockups by takeing i_lock around the i_size_write calls, but if we do this
then we will need to ensure that the VFS absolutely never runs i_size_write() against
a cifs file or directory, because the VFS doens't use i_lock for i_size protection.

Taking i_mutex around i_size_write() would be preferred.


Comment 34


Brian Wang



2007-02-24 10:05:28 UTC

In that case, why not take i_lock inside of i_size_write and make it a rule 
that nowhere else shoudl take i_lock for i_size?


Comment 35


Anonymous Emailer



2007-02-24 18:53:30 UTC

Reply-To: akpm@linux-foundation.org

Because i_size_write() is already protected by i_mutex.  Taking another lock in the
VFS and in all filesystems just because of a couple of unlocked codepaths
in CIFS sounds like a bad dead.


Comment 36


Steve French



2007-02-24 22:33:47 UTC

I don't see anywhere that the vfs could be calling i_size_write on cifs inodes
(doesn't really make sense for the vfs to update the file size for a remote
inode - only the network filesystem can know the real size).

The problem is not really changing the unlocked paths in cifs to taking the
i_mutex, it is avoiding all potential deadlock with the 70+ places in the fs and
mm directory that take the i_mutex - it is much easier to avoid deadlock by
taking something that is guaranteed not to deadlock and still protects all the
writes to this field (i_size).


Comment 37


Steve French



2007-02-24 22:41:40 UTC

Created attachment 10524 [details]
protect i_size_write calls in cifs via spinlock

The same idea, but protected with spinlock on i_lock.

Seems safer to me, although looking more carefully there is one place in the
vfs and mm that calls i_size_write on cifs behalf (vmtruncate).

Not sure the best way to handle that without deadlock.


Comment 38


Steve French



2007-02-24 22:53:25 UTC

> Because i_size_write() is already protected by i_mutex.

That may be true, but it is not obvious - the callers (outside the fs ie in mm
and vfs) of i_size_write include:

1) vmtruncate
2) simple_commit_write/no_bh_commit_write/generic_commit_write
3) fsstack_copy_inode_size
4) generic_file_direct_write
5) xip_file_write

None of them grab i_mutex and all are exported.

A network filesystem of course could update the file size on all kinds of events
... (e.g. lookup/revalidate of a file changed on the server) not just a local
write on the client - so how could one ever tell if the thread already has
i_mutex.   Seems the easiest way is to find a way to avoid calling vmtruncate
(which presumably can't be wrapped with a spinlock because it could block).


Comment 39


Steve French



2007-02-24 22:55:31 UTC

> a way to avoid calling vmtruncate
I meant find a way to do the equivalent of truncate - and protect its
i_size_write with whatever safe lock we decide on.


Comment 40


Anonymous Emailer



2007-02-25 01:11:23 UTC

Reply-To: akpm@linux-foundation.org

> On Sat, 24 Feb 2007 22:53:27 -0800 bugme-daemon@bugzilla.kernel.org wrote:
> 1) vmtruncate
> 2) simple_commit_write/no_bh_commit_write/generic_commit_write
> 3) fsstack_copy_inode_size
> 4) generic_file_direct_write
> 5) xip_file_write

They should all be called under i_mutex with the exception of fsstack_copy_inode_size,
which is I think copying to an inode whcih hasn't been made visible to anyone else
yet.

Sorry, but the rule is that i_size_write() requires i_mutex.

There _are_ exceptions to this (as I discovered when I had an assertion
in i_size_write) but they're all sutiations where we know that no other
thread can be running i_size_write) on that inode at the same time.


Comment 41


Brian Wang



2007-02-25 10:28:33 UTC

The prrferred solution might be fixing it with i_mutex.

But I think the quick and safe fix for this specific CIFS bug might be holding 
i_lock , at the same time, looks like we need to duplicate vmtruncate into 
CIFS module to make it a static cifs_vmtruncate and protect i_size_write with 
i_lock.

Now we know that  fsstack_copy_inode_size won't be a problem as the reason 
stated above. The question is could the other 3 be potential problems for 
CIFS, according to Steve.

> 2) simple_commit_write/no_bh_commit_write/generic_commit_write
> 4) generic_file_direct_write
> 5) xip_file_write


Comment 42


Steve French



2007-02-25 13:07:34 UTC

The only "external" (external to cifs) use of i_size_write is vmtruncate, and
that is easy to handle and can be done within cifs with little code duplication.

But isn't this cut and dried?  Based on Brian and Shirish's reports it seems we
(a network filesystem) can't really know when the vfs has i_mutex on a
particular inode (for this thread) for sure so we can't safely use it in this
case (if  the vfs didn't hold the i_mutex before calling write and/or writepages
we wouldn't have this problem in the first place - cifs would simply grab
i_mutex around all its i_size_write calls and this would be trivial and we would
be done ... but Brian and Shirish's notes seem to indicate that the vfs
sometimes has i_mutex before it gets to cifs on write and other times not and
there is no easy rule as there is in path based calls into the vfs on the use of
 i_mutex).

cifs currently can set file size in its:
1) write (for the direct case)
2) writepage
3) writepages
4) commit_write
5) readdir (cifs_filldir)
6) and in rmdir (on directory rather than a file)
7) and in many other calls
    symlink
    read_inode (on the root inode of the sb)
    mkdir
(and due to revalidate ... it is also called in)
    file_mmap
    dentry_revalidate
    certain lseek calls
    getattr

I don't see any way to figure out based on Documentation/filesystem/Locking or
grepping fs and mm - exactly which of these 13+ cases could hold the i_mutex on
a particular inode.

It is a much more performance sensitive operation on a network filesystem of
course since the file size could change on the server on revalidate of inode
metadata so a spinlock may be preferable to a mutex anyway.


Comment 44


Steve French



2007-02-25 19:38:33 UTC

Did some testing - the latest patch (attached to this bugzilla bug #) seems to
run ok vs. the stable version of the server code (Samba 3.0.23d) (but found an
unrelated problem in the most current Samba server svn code which I am looking at).

Let me know if you see any testing problem.

It seems like it covers all paths safely and I am satisfied with it, although I
really don't want to push it if akpm objects (he knows the mm quite well)


Comment 45


Brian Wang



2007-02-25 20:25:10 UTC

i ran my test without vmtruncate change for more than 20 hours and encountered 
no problem. I started my test 2 hours ago with vmtruncate change included and 
will keep it running and I will let you know if I find anything.

I understand that from Linux OS point of view, this may not be "perfect" fix, 
which might be the reason of akpm's objection. But i use this as part of the 
product development, so it is good enough for me. 

Thanks and I hope you will have time look at #8077,which I believe is a very 
critical bug too.


Comment 46


Anonymous Emailer



2007-02-25 23:41:05 UTC

Reply-To: akpm@linux-foundation.org


I guess we can live with fs-private i_lock usage.  We just have to be
careful about any future vfs-level i_size_write()s popping up.


Comment 47


Steve French



2007-02-26 11:13:54 UTC

checked into cifs-2.6 tree - pending minor additional tests will request a merge
soon


Comment 48


Steve French



2007-02-26 12:08:12 UTC

Created attachment 10540 [details]
missed one i_size_write in cifs truncate from last patch

Shaggy noticed that there are two i_size_writes in the new cifs truncate (we
forgot spinlock around one in the previous patch).

This patch applies on top of the previous one.


Comment 49


Brian Wang



2007-02-26 22:30:11 UTC

Steve,

the 2nd patch above, looks like the first spin_unlock(&inode->i_lock); needs 
to be removed? you got double unlocks.


Comment 50


Brian Wang



2007-02-26 22:48:21 UTC

sorry, my bad, didn't notice the goto after that.


Comment 51


Amit K Arora



2007-07-24 03:51:17 UTC

Please reopen this bug (I am not able to do it from my id), since the patch in this bug report is resulting in a deadlock situation.

The problem is happening since the cifs code calls is_size_safe_to_change()
holding a spinlock, and this function can sleep. Obviously, sleeping inside
spinlock is not a good idea!

Process with 8720 pid is the culprit here. It gets the spinlock on inode->i_lock
and calls is_size_safe_to_change(). This in turn eventually calls
wait_for_response() which puts the process on a waitqueue and makes it go to
uninterruptible sleep.
Now, since the above process is holding the lock, suppose all other CPUs (in
this server we have only two) get to run processes which also start waiting on
the same inode->i_lock, we get the deadlock - since the process with 8720 pid, which is holding the lock can _not_ be woken up now (as both the cpus are busy spinning!).

This is the stack trace for the process with 8720 pid:


locktests     D 000000000fcff7fc  8720 23007  22971         23008 23006 (NOTLB)
Call Trace:
[C0000000725EB070] [C000000000521010] 0xc000000000521010 (unreliable)
[C0000000725EB240] [C000000000010E80] .__switch_to+0x130/0x154
[C0000000725EB2D0] [C000000000357F38] .schedule+0xa98/0xbf4
[C0000000725EB3E0] [D000000000A21DA8] .wait_for_response+0xe8/0x1bc [cifs]
[C0000000725EB4C0] [D000000000A22758] .SendReceive+0x2dc/0x598 [cifs]
[C0000000725EB590] [D000000000A06BEC] .CIFSSMBOpen+0x2d8/0x518 [cifs]
[C0000000725EB690] [D000000000A16FDC] .cifs_reopen_file+0x2e4/0x524 [cifs]
[C0000000725EB780] [D000000000A17E18] .find_writable_file+0xe4/0x184 [cifs]
[C0000000725EB820] [D000000000A185C0] .is_size_safe_to_change+0x24/0x90 [cifs]
[C0000000725EB8B0] [D000000000A1BD08] .cifs_get_inode_info_unix+0x794/0x9a8 [cifs]
[C0000000725EBA20] [D000000000A1A79C] .cifs_open+0x71c/0x8c4 [cifs]
[C0000000725EBB30] [C0000000000E69D8] .__dentry_open+0x13c/0x2bc
[C0000000725EBBE0] [C0000000000E6CCC] .do_filp_open+0x50/0x70
[C0000000725EBD00] [C0000000000E6D60] .do_sys_open+0x74/0x130
[C0000000725EBDB0] [C000000000125148] .compat_sys_open+0x24/0x38
[C0000000725EBE30] [C0000000000086A4] syscall_exit+0x0/0x40


Comment 52


Andrew Morton



2007-07-24 09:45:08 UTC

reopened..


Comment 53


Steve French



2007-07-24 12:08:19 UTC

good catch - in this path the tcp session has died and we are reconnecting while someone is changing file size.

is_size_safe_to_change will need to be changed so that it does not block.  In this case the intent of calling find_writable_file is to check whether the inode might be in the process of being written to, not to get a file handle we can use to write.  

I see a few easy ways to change this - but one is to add a parm to find_writable_file


Comment 54


Steve French



2007-07-24 12:40:48 UTC

Could you check to make sure that this patch is ok and works?


Comment 55


Steve French



2007-07-24 12:42:35 UTC

Created attachment 12124 [details]
fix to test for this hang

also could you reassign this bug back to me, I hit the wrong button when I tried to add the attachment


Comment 56


Amit K Arora



2007-07-25 04:53:36 UTC

Hi Steve,

Thanks for the new patch. I have asked the person who faced this problem to try it out. Will update the bug when I get to hear from him.


Comment 58


Amit K Arora



2007-07-26 02:08:06 UTC

(In reply to comment #56)
> Hi Steve,
> 
> Thanks for the new patch. I have asked the person who faced this problem to
> try
> it out. Will update the bug when I get to hear from him.
 
The tests with this new patch have passed. They have been running fine for more than 12 hours, where we were observing the bug within 1-2 hours.

Steve, thanks again for this patch !


Comment 59


Steve French



2007-07-26 08:48:08 UTC

You may want to monitor /proc/fs/cifs/Stats during the test to see if the number of session reconnects keeps increasing (a possible indication of networking problems or server crash).  Although the cifs client automatically reconnects after a tcp session crash (and it might not be visible to an application or test) it is a possible indication that there is a networking problem and worth keeping an eye out for (the cifs client logs a dmesg entry in most of these cases too).

Оффлайн
Sam Stone

Ночью загнулась Ubuntu 8.04 32bit, на которой крутится сквид. Система почти не откликалась (даже top не запускался >_<), поэтому пришлось пристрелить )
После разбирал логи. Нашел следующее:

May 28 18:02:09 ubuntu-squid2 kernel: [623246.015765] Status code returned 0xc0000008 NT_STATUS_INVALID_HANDLE
May 28 18:02:09 ubuntu-squid2 kernel: [623246.015806]  CIFS VFS: Send error in read = -9
May 28 18:03:08 ubuntu-squid2 kernel: [623304.695599]  CIFS VFS: server not responding
May 28 18:03:08 ubuntu-squid2 kernel: [623304.695634]  CIFS VFS: No response to cmd 47 mid 55300
May 28 18:03:08 ubuntu-squid2 kernel: [623304.695653]  CIFS VFS: Write2 ret -11, wrote 0
May 28 18:03:08 ubuntu-squid2 kernel: [623304.696709]  CIFS VFS: No response to cmd 47 mid 55301
May 28 18:03:08 ubuntu-squid2 kernel: [623304.696715]  CIFS VFS: Write2 ret -11, wrote 0
May 28 18:03:10 ubuntu-squid2 kernel: [623306.677723]  CIFS VFS: Write2 ret -11, wrote 0
May 28 18:03:10 ubuntu-squid2 kernel: [623306.691224]  CIFS VFS: Write2 ret -9, wrote 0
May 28 18:03:10 ubuntu-squid2 kernel: [623306.808990]  CIFS VFS: Send error in Close = -9
May 28 18:03:10 ubuntu-squid2 kernel: [623306.809774]  CIFS VFS: Send error in Close = -9
May 28 18:03:10 ubuntu-squid2 kernel: [623306.810450]  CIFS VFS: Send error in Close = -9
May 28 18:03:10 ubuntu-squid2 kernel: [623306.811258]  CIFS VFS: Send error in Close = -9
May 28 18:03:10 ubuntu-squid2 kernel: [623306.811942]  CIFS VFS: Send error in Close = -9
May 28 18:03:10 ubuntu-squid2 kernel: [623306.812609]  CIFS VFS: Send error in Close = -9
May 28 18:03:10 ubuntu-squid2 kernel: [623306.816633]  CIFS VFS: Send error in Close = -9
May 28 18:03:10 ubuntu-squid2 kernel: [623306.817829]  CIFS VFS: Send error in Close = -9
May 28 18:03:10 ubuntu-squid2 kernel: [623306.998759]  CIFS VFS: Send error in Close = -9
May 28 18:03:12 ubuntu-squid2 kernel: [623308.800953]  CIFS VFS: Send error in Close = -9
May 28 18:03:12 ubuntu-squid2 kernel: [623308.801858]  CIFS VFS: Send error in Close = -9
May 28 18:03:12 ubuntu-squid2 kernel: [623308.803931]  CIFS VFS: Send error in Close = -9
May 28 20:06:37 ubuntu-squid2 kernel: [630702.744998]  CIFS VFS: server not responding
May 28 20:06:37 ubuntu-squid2 kernel: [630702.745019]  CIFS VFS: No response to cmd 46 mid 25601
May 28 20:06:37 ubuntu-squid2 kernel: [630702.745027]  CIFS VFS: Send error in read = -11
May 28 20:06:43 ubuntu-squid2 kernel: [630708.938589]  CIFS VFS: No response to cmd 47 mid 25600
May 28 20:06:43 ubuntu-squid2 kernel: [630708.938613]  CIFS VFS: Write2 ret -11, wrote 0
May 28 20:06:43 ubuntu-squid2 kernel: [630709.016277]  CIFS VFS: Write2 ret -11, wrote 0
May 28 20:06:47 ubuntu-squid2 kernel: [630712.665185]  CIFS VFS: Send error in Close = -9
May 28 20:06:47 ubuntu-squid2 kernel: [630712.666851]  CIFS VFS: Send error in Close = -9
May 28 20:06:47 ubuntu-squid2 kernel: [630712.668491]  CIFS VFS: Send error in Close = -9
May 28 20:06:47 ubuntu-squid2 kernel: [630712.669432]  CIFS VFS: Send error in Close = -9
May 28 20:06:47 ubuntu-squid2 kernel: [630712.670158]  CIFS VFS: Send error in Close = -9
May 28 20:06:47 ubuntu-squid2 kernel: [630712.670952]  CIFS VFS: Send error in Close = -9
May 28 20:06:47 ubuntu-squid2 kernel: [630712.671666]  CIFS VFS: Send error in Close = -9
May 28 20:06:57 ubuntu-squid2 kernel: [630722.990778]  CIFS VFS: Send error in Close = -9
May 28 20:06:57 ubuntu-squid2 kernel: [630723.021206]  CIFS VFS: Send error in Close = -9
May 28 20:06:57 ubuntu-squid2 kernel: [630723.030701]  CIFS VFS: Send error in Close = -9
May 28 20:06:57 ubuntu-squid2 kernel: [630723.032759]  CIFS VFS: Send error in Close = -9
May 28 20:06:57 ubuntu-squid2 kernel: [630723.088765]  CIFS VFS: Send error in Close = -9
May 28 20:06:57 ubuntu-squid2 kernel: [630723.104688]  CIFS VFS: Send error in Close = -9
May 28 20:06:58 ubuntu-squid2 kernel: [630723.767114] Status code returned 0xc0000008 NT_STATUS_INVALID_HANDLE
May 28 20:06:58 ubuntu-squid2 kernel: [630723.767129]  CIFS VFS: Send error in read = -9
May 28 20:06:58 ubuntu-squid2 kernel: [630724.069174]  CIFS VFS: Send error in Close = -9
May 28 20:06:59 ubuntu-squid2 kernel: [630725.242832] Status code returned 0xc0000008 NT_STATUS_INVALID_HANDLE
May 28 20:06:59 ubuntu-squid2 kernel: [630725.242848]  CIFS VFS: Send error in read = -9
May 28 20:06:59 ubuntu-squid2 kernel: [630725.244136] Status code returned 0xc0000008 NT_STATUS_INVALID_HANDLE
May 28 20:06:59 ubuntu-squid2 kernel: [630725.244142]  CIFS VFS: Send error in read = -9
May 28 20:06:59 ubuntu-squid2 kernel: [630725.246802]  CIFS VFS: Send error in Close = -9
May 28 21:15:36 ubuntu-squid2 kernel: [634836.103254]  CIFS VFS: server not responding
May 28 21:15:36 ubuntu-squid2 kernel: [634836.103283]  CIFS VFS: No response for cmd 50 mid 54018
May 28 21:15:41 ubuntu-squid2 kernel: [634840.709225]  CIFS VFS: No response to cmd 47 mid 54017
May 28 21:15:41 ubuntu-squid2 kernel: [634840.709249]  CIFS VFS: Write2 ret -11, wrote 0
May 28 21:15:41 ubuntu-squid2 kernel: [634840.933319]  CIFS VFS: Write2 ret -11, wrote 0
May 28 21:15:42 ubuntu-squid2 kernel: [634842.438332]  CIFS VFS: Send error in Close = -9
May 28 21:15:42 ubuntu-squid2 kernel: [634842.439220]  CIFS VFS: Send error in Close = -9
May 28 21:15:42 ubuntu-squid2 kernel: [634842.440525]  CIFS VFS: Send error in Close = -9
May 28 21:15:42 ubuntu-squid2 kernel: [634842.441542]  CIFS VFS: Send error in Close = -9
May 28 21:15:43 ubuntu-squid2 kernel: [634842.984735]  CIFS VFS: Send error in Close = -9
May 28 21:15:43 ubuntu-squid2 kernel: [634842.985963]  CIFS VFS: Send error in Close = -9
May 28 21:15:43 ubuntu-squid2 kernel: [634842.986713]  CIFS VFS: Send error in Close = -9
May 28 21:15:43 ubuntu-squid2 kernel: [634842.987439]  CIFS VFS: Send error in Close = -9
Примонтированы 2 шары с win2k сервера, на которые качаются торренты

sudo mount -o smbfs //1.2.3.4/pub/ /media/pub/ -o user=user,password=password,uid=linuxuser,gid=linuxuser,dir_mode=0770,file_mode=0660,iocharset=utf8
В чем причина?.. CIFS смертельно подавился?.. Если да, как можно этого избежать в будущем? Физически ubuntu и w2k находятся в одной серверной стойке, на виртуальных машинах, т.е. с сетью вроде как проблем быть не должно.

16.04.1 4.4.0-59-generic x86_64 GNU/Linux

We recently created several centOS 8 serveur for running some springboot applications, which mount several directories from
a windows server. Mount is working, but we are getting lots of errors at random time of the day

sample dmesg output:

[Wed Oct 21 15:31:21 2020] CIFS VFS: Send error in SessSetup = -11
[Wed Oct 21 18:59:56 2020] CIFS VFS: Send error in SessSetup = -11
[Thu Oct 22 07:37:51 2020] CIFS VFS: Send error in SessSetup = -11
[Thu Oct 22 08:28:37 2020] CIFS VFS: Send error in SessSetup = -11
[Thu Oct 22 08:49:07 2020] CIFS VFS: Server XXXXXXX has not responded in 120 seconds. Reconnecting...
[Thu Oct 22 11:07:38 2020] CIFS VFS: Send error in SessSetup = -11
[Thu Oct 22 11:07:52 2020] CIFS VFS: Send error in SessSetup = -11
[Thu Oct 22 14:20:52 2020] CIFS VFS: Send error in SessSetup = -11
[Thu Oct 22 17:22:24 2020] CIFS VFS: Send error in SessSetup = -11
[Fri Oct 23 06:33:03 2020] CIFS VFS: Send error in SessSetup = -11
[Fri Oct 23 06:35:03 2020] CIFS VFS: Server XXXXXXX has not responded in 120 seconds. Reconnecting...
[Fri Oct 23 08:27:45 2020] CIFS VFS: Send error in SessSetup = -11
[Fri Oct 23 08:28:06 2020] CIFS VFS: Send error in SessSetup = -11

The issue does not arise when we mount the same dir from centOS 7

Mount

centOS 7

//SERVER/DATA on /mnt/data type cifs (rw,nosuid,noexec,relatime,vers=3.0,cache=strict,username=data_user,domain=xxx.xxx.xxx,uid=1000,forceuid,gid=1000,forcegid,addr=xxx.xx.x.xx,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)

centOS 8

//SERVER/DATA on /mnt/data type cifs (rw,nosuid,noexec,relatime,vers=3.0,cache=strict,username=data_user,domain=xxx.xxx.xxx,uid=1000,forceuid,gid=1000,forcegid,addr=xxx.xx.x.xx,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,bsize=65536,echo_interval=60,actimeo=1)

Kernel and package version

+-----------+-----------------------------+-----------------------------+
|   prop    |          CentOS 7           |          centOS 8           |
+-----------+-----------------------------+-----------------------------+
| kernel    | 3.10.0-1127.18.2.el7.x86_64 | 4.18.0-147.8.1.el8_1.x86_64 |
| cifs-util | 6.2-10.el7                  | 6.8-3.el8                   |
+-----------+-----------------------------+-----------------------------+

Here is the exact sequence leading to the error on CentOS 8, with cifs debug enabled:

[Thu Oct 22 10:04:58 2020] fs/cifs/dir.c: name: data
[Thu Oct 22 10:04:58 2020] fs/cifs/dir.c: name: data
[Thu Oct 22 10:04:58 2020] fs/cifs/inode.c: Update attributes: data inode 0x00000000bc98fea9 count 1 dentry: 0x00000000bd47d1e0 d_time 15148690417 jiffies 15179170239
[Thu Oct 22 10:04:58 2020] fs/cifs/inode.c: Getting info on data
[Thu Oct 22 10:04:58 2020] fs/cifs/smb2ops.c: set credits to 1 due to smb2 reconnect
[Thu Oct 22 10:04:58 2020] fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
[Thu Oct 22 10:04:58 2020] fs/cifs/smb2pdu.c: In echo request
[Thu Oct 22 10:04:58 2020] fs/cifs/smb2pdu.c: Need negotiate, reconnecting tcons
[Thu Oct 22 10:04:58 2020] fs/cifs/smb2pdu.c: Negotiate protocol
[Thu Oct 22 10:04:58 2020] fs/cifs/transport.c: Sending smb: smb_len=106
[Thu Oct 22 10:04:58 2020] fs/cifs/smb2pdu.c: Negotiate protocol
[Thu Oct 22 10:04:58 2020] fs/cifs/connect.c: RFC1002 header 0xf8
[Thu Oct 22 10:04:58 2020] fs/cifs/smb2misc.c: SMB2 data length 120 offset 128
[Thu Oct 22 10:04:58 2020] fs/cifs/smb2misc.c: SMB2 len 248
[Thu Oct 22 10:04:58 2020] fs/cifs/transport.c: cifs_sync_mid_result: cmd=0 mid=0 state=4
[Thu Oct 22 10:04:58 2020] fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
[Thu Oct 22 10:04:58 2020] fs/cifs/smb2pdu.c: mode 0x1
[Thu Oct 22 10:04:58 2020] fs/cifs/smb2pdu.c: negotiated smb3.0 dialect
[Thu Oct 22 10:04:58 2020] fs/cifs/asn1.c: OID len = 10 oid = 0x1 0x3 0x6 0x1
[Thu Oct 22 10:04:58 2020] fs/cifs/asn1.c: OID len = 7 oid = 0x1 0x2 0x348 0xbb92
[Thu Oct 22 10:04:58 2020] fs/cifs/asn1.c: OID len = 7 oid = 0x1 0x2 0x348 0x1bb92
[Thu Oct 22 10:04:58 2020] fs/cifs/asn1.c: OID len = 8 oid = 0x1 0x2 0x348 0x1bb92
[Thu Oct 22 10:04:58 2020] fs/cifs/asn1.c: OID len = 10 oid = 0x1 0x3 0x6 0x1
[Thu Oct 22 10:04:58 2020] fs/cifs/connect.c: Security Mode: 0x1 Capabilities: 0x300067 TimeAdjust: 0
[Thu Oct 22 10:04:58 2020] fs/cifs/connect.c: Free previous auth_key.response = 00000000a46e9e42
[Thu Oct 22 10:04:58 2020] fs/cifs/smb2pdu.c: Session Setup
[Thu Oct 22 10:04:58 2020] fs/cifs/smb2pdu.c: sess setup type 4
[Thu Oct 22 10:04:58 2020] fs/cifs/transport.c: Sending smb: smb_len=124
[Thu Oct 22 10:04:58 2020] fs/cifs/connect.c: Received no data or error: -104
[Thu Oct 22 10:04:58 2020] fs/cifs/connect.c: cifs_reconnect: will retry 0 target(s)
[Thu Oct 22 10:04:58 2020] fs/cifs/connect.c: Mark tcp session as need reconnect
[Thu Oct 22 10:04:58 2020] fs/cifs/connect.c: cifs_reconnect: marking sessions and tcons for reconnect
[Thu Oct 22 10:04:58 2020] fs/cifs/connect.c: cifs_reconnect: tearing down socket
[Thu Oct 22 10:04:58 2020] fs/cifs/connect.c: State: 0x3 Flags: 0x0
[Thu Oct 22 10:04:58 2020] fs/cifs/connect.c: Post shutdown state: 0x3 Flags: 0x0
[Thu Oct 22 10:04:58 2020] fs/cifs/connect.c: cifs_reconnect: moving mids to private list
[Thu Oct 22 10:04:58 2020] fs/cifs/connect.c: cifs_reconnect: issuing mid callbacks
[Thu Oct 22 10:04:58 2020] fs/cifs/connect.c: Socket created
[Thu Oct 22 10:04:58 2020] fs/cifs/connect.c: sndbuf 16384 rcvbuf 87380 rcvtimeo 0x1b58
[Thu Oct 22 10:04:58 2020] fs/cifs/transport.c: cifs_sync_mid_result: cmd=1 mid=0 state=8
[Thu Oct 22 10:04:58 2020] fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
[Thu Oct 22 10:04:58 2020] CIFS VFS: Send error in SessSetup = -11
[Thu Oct 22 10:04:58 2020] fs/cifs/connect.c: cifs_put_tcon: tc_count=2
[Thu Oct 22 10:04:58 2020] fs/cifs/smb2ops.c: set credits to 1 due to smb2 reconnect
[Thu Oct 22 10:04:58 2020] fs/cifs/smb2pdu.c: In echo request
[Thu Oct 22 10:05:09 2020] fs/cifs/smb2pdu.c: Negotiate protocol
[Thu Oct 22 10:05:09 2020] fs/cifs/transport.c: Sending smb: smb_len=106
[Thu Oct 22 10:05:09 2020] fs/cifs/connect.c: RFC1002 header 0xf8
[Thu Oct 22 10:05:09 2020] fs/cifs/smb2misc.c: SMB2 data length 120 offset 128
[Thu Oct 22 10:05:09 2020] fs/cifs/smb2misc.c: SMB2 len 248
[Thu Oct 22 10:05:09 2020] fs/cifs/transport.c: cifs_sync_mid_result: cmd=0 mid=0 state=4
[Thu Oct 22 10:05:09 2020] fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
[Thu Oct 22 10:05:09 2020] fs/cifs/smb2pdu.c: mode 0x1
[Thu Oct 22 10:05:09 2020] fs/cifs/transport.c: Sending smb: smb_len=106
[Thu Oct 22 10:05:09 2020] fs/cifs/smb2pdu.c: negotiated smb3.0 dialect
[Thu Oct 22 10:05:09 2020] fs/cifs/asn1.c: OID len = 10 oid = 0x1 0x3 0x6 0x1
[Thu Oct 22 10:05:09 2020] fs/cifs/asn1.c: OID len = 7 oid = 0x1 0x2 0x348 0xbb92
[Thu Oct 22 10:05:09 2020] fs/cifs/asn1.c: OID len = 7 oid = 0x1 0x2 0x348 0x1bb92
[Thu Oct 22 10:05:09 2020] fs/cifs/asn1.c: OID len = 8 oid = 0x1 0x2 0x348 0x1bb92
[Thu Oct 22 10:05:09 2020] fs/cifs/asn1.c: OID len = 10 oid = 0x1 0x3 0x6 0x1
[Thu Oct 22 10:05:09 2020] fs/cifs/connect.c: Security Mode: 0x1 Capabilities: 0x300067 TimeAdjust: 0
[Thu Oct 22 10:05:09 2020] fs/cifs/smb2pdu.c: Session Setup
[Thu Oct 22 10:05:09 2020] fs/cifs/smb2pdu.c: sess setup type 4
[Thu Oct 22 10:05:09 2020] fs/cifs/connect.c: Received no data or error: -104
[Thu Oct 22 10:05:09 2020] fs/cifs/connect.c: cifs_reconnect: will retry 0 target(s)
[Thu Oct 22 10:05:09 2020] fs/cifs/connect.c: Mark tcp session as need reconnect
[Thu Oct 22 10:05:09 2020] fs/cifs/connect.c: cifs_reconnect: marking sessions and tcons for reconnect
[Thu Oct 22 10:05:09 2020] fs/cifs/connect.c: cifs_reconnect: tearing down socket
[Thu Oct 22 10:05:09 2020] fs/cifs/connect.c: State: 0x3 Flags: 0x0
[Thu Oct 22 10:05:09 2020] fs/cifs/connect.c: Post shutdown state: 0x3 Flags: 0x0
[Thu Oct 22 10:05:09 2020] fs/cifs/connect.c: cifs_reconnect: moving mids to private list
[Thu Oct 22 10:05:09 2020] fs/cifs/connect.c: cifs_reconnect: issuing mid callbacks
[Thu Oct 22 10:05:09 2020] fs/cifs/transport.c: cifs_sync_mid_result: cmd=0 mid=1 state=8
[Thu Oct 22 10:05:09 2020] fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
[Thu Oct 22 10:05:09 2020] fs/cifs/connect.c: Socket created
[Thu Oct 22 10:05:09 2020] fs/cifs/connect.c: sndbuf 16384 rcvbuf 87380 rcvtimeo 0x1b58
[Thu Oct 22 10:05:09 2020] fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
[Thu Oct 22 10:05:09 2020] fs/cifs/inode.c: CIFS VFS: leaving cifs_revalidate_dentry_attr (xid = 35385880) rc = -112
[Thu Oct 22 10:05:09 2020] fs/cifs/inode.c: CIFS VFS: in cifs_revalidate_dentry_attr as Xid: 35385881 with uid: 1004

Windows server is a windows 2016 server. The only logs that appears on this server at the same time are these:

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
   <System>
      <Provider Name="Service Control Manager" Guid="{555908d1-a6d7-4695-8e1e-26931d2012f4}" EventSourceName="Service Control Manager" />
      <EventID Qualifiers="16384">7036</EventID>
      <Version>0</Version>
      <Level>4</Level>
      <Task>0</Task>
      <Opcode>0</Opcode>
      <Keywords>0x8080000000000000</Keywords>
      <TimeCreated SystemTime="2020-10-16T10:42:59.636088900Z" />
      <Channel>System</Channel>
      <Computer>XXX.xxx.xxx</Computer>
   </System>
   <EventData>
      <Data Name="param1">Connection Assistant with a Microsoft account</Data>
      <Data Name="param2">stopped</Data>
      <Binary>77006C00690064007300760063002F0031000000</Binary>
   </EventData>
</Event>

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
   <System>
      <Provider Name="Service Control Manager" Guid="{555908d1-a6d7-4695-8e1e-26931d2012f4}" EventSourceName="Service Control Manager" />
      <EventID Qualifiers="16384">7036</EventID>
      <Version>0</Version>
      <Level>4</Level>
      <Task>0</Task>
      <Opcode>0</Opcode>
      <Keywords>0x8080000000000000</Keywords>
      <TimeCreated SystemTime="2020-10-16T10:42:20.692575900Z" />
      <Channel>System</Channel>
      <Computer>XXX.xxx.xxx</Computer>
   </System>
   <EventData>
      <Data Name="param1">Client licenses service (ClipSVC)</Data>
      <Data Name="param2">stopped</Data>
      <Binary>43006C00690070005300560043002F0031000000</Binary>
   </EventData>
</Event>

After looking at open sessions in windows server, we noticed that user sessions from centOS 8 are automatically disconnected every 30 minutes, whereas user session from centOS 7 are note disconnected

Hi,

I have a fileserver which serves some shares via SMB. I mount these with

//IP_Adress/data /media/data cifs credentials=/path/to/credentials,iocharset=utf8,uid=1000,gid=1000 0 0

Now when I copy large files or a bunch of smaller files (can’t name an exact file size but mostly >800MB) I always get an «Input/Output Error». I tried to copy the files with Thunar, PCManFm and via CLI, it’s always the same. Also the whole system stalls/is very sluggish when copying, I sometimes can’t move the mouse coursor….
It sometimes works when I copy the files one by one but if it’s one large file I get the I/O error again.
The fileserver is a freenas box, don’t know if this is related to the problem….

Any ideas how to narrow this down?

Thanks.

EDIT:
Got a bunch of messages in the logs:

[ 3171.825414] Status code returned 0xc0000008 NT_STATUS_INVALID_HANDLE
[ 3171.825424] CIFS VFS: Send error in read = -9
[ 3176.885044] CIFS VFS: Send error in read = -11

Will try to look into that.

EDIT2:

Found this blogentry which solved the problem so far.

echo 0 > /proc/fs/cifs/OplockEnabled

This will disable Opportunistic Locking.

The entry is 3 years old. So would it help to fill a bug, Arch Bugtracker, upstream?

Last edited by Urfaust (2011-09-11 15:07:49)

hope I can get some help. I have to be honest up-front, I do not have great linux skills — still learning.
I recently upgraded to 20.04.1. I don’t know if these errors started right after that, or after something else I had done.

kernel: [258605.922058] CIFS: VFS: \192.168.2.107 Send error in SessSetup = -11

I’ve tried searching for what -11 means, but I can’t seem to find what this error is indicating. I know these errors weren’t there prior to 20.04.1 LTS upgrade.

This is the statements for my fstab mounts: I’ve been tweaking them quite a bit last few days trying to resolve the error…

//192.168.2.107/Multimedia/New /home/New/ cifs credentials=/etc/.kreditiv3,_netdev,vers=2.0,nobrl,iocharset=utf8 0 0
//192.168.2.107/Multimedia/Parents /home/Multimedia/Parents/ cifs credentials=/etc/.kreditiv3,_netdev,vers=2.0,nobrl,iocharset=utf8 0 0
//192.168.2.94/video/Movies /home/movies/Movies cifs credentials=/etc/.kreditiv1,_netdev,vers=2.0,nobrl,iocharset=utf8 0 0
//192.168.2.107/calibre/ /home/calibre-library cifs nobrl,credentials=/etc/.kreditiv3,uid=rob,gid=family,_netdev,vers=2.0,nobrl,iocharset=utf8,sec=ntlmssp 0 0
//192.168.2.27/movies /home/movies/oldies cifs users,credentials=/etc/.kreditiv2,uid=root,gid=root,_netdev,vers=1.0,nobrl,iocharset=utf8 0 0
//192.168.2.94/NetBackup /home/backup cifs credentials=/etc/.kreditiv1,_netdev,vers=2.0,nobrl,iocharset=utf8 0 0
//192.168.2.94/Audiobooks /home/audiobooks cifs credentials=/etc/.kreditiv1,_netdev,vers=2.0,nobrl,iocharset=utf8 0 0

Everything seems to actually work, except for the fact I’m getting lots of errors. It seems to give the error a couple of times every hour, but I don’t know why, or which mount. Without disabling a mount for a few hours, and breaking my applications (sonarr/radarr/plex/calibre), is there something I can check?
The mounts are physically on external raid boxes (qnap and synology devices).


Description


Emily Brantley



2006-08-14 19:37:31 UTC

Description of problem:

A share mounted using the CIFS mounts correctly.  Files can be viewed, listed,
created, written to.  Files cannot be read.  Kernel dmesg log reports only:  

"CIFS VFS: Send error in read = -13"


Version-Release number of selected component (if applicable):

$ uname -r
2.6.17-1.2174_FC5smp

Steps to Reproduce:
1. mount -t cifs //192.168.1.14/share /media/share -o user=user -o password=pass 
2. cd /media/share
3. cat file.txt
  
Actual results:

$ cat file.txt
cat: two girls.txt: Permission denied

Expected results:

$ cat file.txt
[file contents here...]

Additional info:

CIFS shares browsed other ways (SMB kioslave, gnome-vfs) work normally.


Comment 1


Emily Brantley



2006-08-14 19:40:43 UTC

Sorry, when I pasted the "Actual results":

$ cat file.txt
cat: two girls.txt: Permission denied

I accidentally pasted the name of an actual file on my share instead of some
generic file like I was trying to do (a story on my drive).  It's meant to be
"file.txt" like the rest, not just some mistake.

Also, trying to do cat as root makes no difference.


Comment 2


Christopher Brown



2006-08-28 12:34:42 UTC

Hey Emily,

I was seeing this also. This is a known bug upstream which is fixed in 3.0.23b.
You can get this here:

http://us1.samba.org/samba/ftp/Binary_Packages/Fedora/RPMS/i386/core/5/

however this should really be in the updates repo as I have seen quite a few
people having problems with this and I was banging my head against the wall with
it for a while. I can confirm that downloading, updating and using the above on
the samba server fixes the problem for me.

This is not a samba issue, not a kernel one and I would suggest you change the
component to reflect that above.


Comment 3


Peter Bieringer



2006-09-28 13:14:41 UTC

I have here similar error message, but a different scenario and a different result:

Client: FC5 running kernel-2.6.17-1.2187_FC5 (and earlier)
Server: RHEL4 running samba-3.0.10-1.4E.9

Error message on client side:
kernel:  CIFS VFS: Send error in read = -13

Server side shows in debug log:
  error packet at smbd/reply.c(4602) cmd=36 (SMBlockingX) NT_STATUS_RANGE_NOT_LOCKED
  error packet at smbd/reply.c(2212) cmd=46 (SMBreadX) NT_STATUS_FILE_LOCK_CONFLICT



Symptom:
Opening a document with OpenOffice on client via mounted cifs share works. After
modifying the file, during first time on pressing "save", a message appears like
 "can't create backup copy" and upper shown message appears in kernel log. After
pressing "save" button again, no such message appears (and "save" is successfully).

Can one point me to whether it's more a kernel (client side) or samba (server
side) related problem


Comment 4


Dave Jones



2006-10-16 20:08:59 UTC

A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.


Comment 5


Peter Bieringer



2006-10-19 15:28:30 UTC

Still happen with 2.6.18-1.2200.fc5


Comment 6


Christopher Brown



2006-10-19 15:36:34 UTC

I do not see this error any longer. It disappeared once I installed the rpm from
the samba ftp and I am now running the latest samba update from the fedora
repos. Perhaps it is a config file or packaging issue? Would someone like to
test by removing samba and trashing all related files and re-install?

Just a thought.

Regards
Chris


Comment 7


Peter Bieringer



2006-10-19 16:05:44 UTC

Server side is running RHEL4 with samba-3.0.10-1.4E.9
Which samba version do you use? Perhaps we have to file a bug against the samba
version of RHEL4.


Comment 8


Dave Jones



2006-10-19 18:03:15 UTC

This does sound more like a userspace server-side samba daemon problem.
You'll need to open a separate bug for RHEL4 if this does turn out to be the case.



Comment 10


Jose Plans



2006-10-21 17:23:18 UTC

Hello, 
 it's not clear to me how to reproduce your problem.
Could you share your smb.conf with us and describe samba versions that work and
doesnt ?

--
kernel:  CIFS VFS: Send error in read = -13 -> just states the original report
on permission denied.
--

and for the samba logs I wonder whether cifs tried to unlock ranges that werent
locked previously, given : 

--
  error packet at smbd/reply.c(4602) cmd=36 (SMBlockingX) NT_STATUS_RANGE_NOT_LOCKED
  error packet at smbd/reply.c(2212) cmd=46 (SMBreadX) NT_STATUS_FILE_LOCK_CONFLICT
--

So for now, I would suggest to attach to the RHEL4 bug a smb.conf file that
reproduces the problem. and see if you can get some more detailed logs using
"log level = 10" so we can see this happeneing. 10 as the locks are logged at
that level. 
thanks,
       Jose


Comment 13


Peter Bieringer



2006-12-13 17:02:28 UTC

Steps producing the log:

- start samba server
- mount directory on FC6
- open test.odt with openoffice
- change test.odt
- first try to save test.odt -> error message
- second try to save test.odt -> works
- unmount directory on FC6

FC6 kernel tells:
Dec 13 17:25:41 client kernel:  CIFS VFS: Send error in SETFSUnixInfo = -5
Dec 13 17:26:00 client kernel:  CIFS VFS: Send error in read = -13
Dec 13 17:26:00 client last message repeated 7 times



Comment 14


Christopher Brown



2006-12-13 17:03:09 UTC

I no longer see this.


Comment 15


Simo Sorce



2007-03-14 15:11:32 UTC

Should be fixed with latest updates, reopen if not.


Comment 16


Peter Bieringer



2007-03-14 15:17:42 UTC

This reminds me that this issue is currently still not solved for RHEL4 server
side, the bug hits me every day now (Fedora Core 6 client, RHEL4 server) during
saving OpenOffice files, see also:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=221852
Hopefully, Red Hat will now take care the RHEL4 serverside also.

|
2008-08-02

On a few systems, I’ve noticed CIFS mounts have a tendency to lock up the system when transferring large files, or a shitload of small ones in quick succession. When this happens, the system may or may not completely lock up, and lines like these will appear in your syslog:

Aug 1 22:58:16 dreadnought kernel: CIFS VFS: No response to cmd 46 mid 25661 
Aug 1 22:58:16 dreadnought kernel: CIFS VFS: Send error in read = -11
Aug 1 22:58:16 dreadnought kernel: CIFS VFS: No response for cmd 50 mid 25664
Aug 1 22:58:16 dreadnought kernel: CIFS VFS: No response to cmd 47 mid 25663
Aug 1 22:58:20 dreadnought kernel: CIFS VFS: No response to cmd 47 mid 25662
Aug 1 22:58:20 dreadnought kernel: CIFS VFS: Write2 ret -11, written = 0
Aug 1 22:58:20 dreadnought kernel: CIFS VFS: Send error in read = -9
Aug 1 22:58:20 dreadnought kernel: CIFS VFS: Write2 ret -9, written = 0
Aug 1 22:59:52 dreadnought kernel: CIFS VFS: No writable handles for inode
Aug 1 23:00:29 dreadnought kernel: CIFS VFS: close with pending writes
Aug 1 23:00:36 dreadnought kernel: CIFS VFS: close with pending writes
Aug 1 23:01:16 dreadnought kernel: CIFS VFS: No writable handles for inode
Aug 1 23:02:28 dreadnought kernel: CIFS VFS: close with pending writes
Aug 1 23:03:23 dreadnought kernel: CIFS VFS: close with pending writes
Aug 1 23:03:44 dreadnought kernel: CIFS VFS: close with pending writes
Aug 1 23:05:35 dreadnought kernel: CIFS VFS: No writable handles for inode
Aug 1 23:06:18 dreadnought kernel: CIFS VFS: close with pending writes
Aug 1 23:06:35 dreadnought kernel: CIFS VFS: close with pending writes
Aug 1 23:06:46 dreadnought kernel: CIFS VFS: No writable handles for inode
Aug 1 23:06:46 dreadnought kernel: CIFS VFS: close with pending writes
Aug 1 23:07:02 dreadnought kernel: CIFS VFS: close with pending writes
Aug 1 23:07:46 dreadnought kernel: CIFS VFS: server not responding

I found a workaround which involves disabling Opportunistic Locking. This will have a small performance impact, but at least the thing wont hang up and use 100% CPU for no goddamn reason. Yes, this article has an angry tone, because this issue has eaten way too much of my time today.

To disable Opportunistic Locking, you have to set the contents of the file “/proc/fs/cifs/OplockEnabled” to 0. This file only exists after the cifs module has been loaded, and will be replaced if the module is unloaded and reloaded. Thus, I configured my /etc/rc.local script (use whatever equivalent your distro has to a startup script) to both load the cifs module and set the contents of said file:

# cifs client workaround
modprobe cifs
echo 0 > /proc/fs/cifs/OplockEnabled

There you go.

Понравилась статья? Поделить с друзьями:
  • Cifs mount error 112 host is down
  • Cid font dictionary error
  • Ciblockelement add error
  • Ci модуль ошибка нет доступа
  • Cisco anyconnect secure gateway internal error