Volker Lendecke
2012-02-15 15:59:41 UTC
Hi!
Under
http://git.samba.org/?p=vl/samba.git/.git;a=shortlog;h=refs/heads/dbwrap_record_watch
find a patchset that I've been working on for a while now.
It implements the following API:
struct tevent_req *dbwrap_record_watch_send(TALLOC_CTX *mem_ctx,
struct tevent_context *ev,
struct db_record *rec,
struct messaging_context *msg);
NTSTATUS dbwrap_record_watch_recv(struct tevent_req *req,
TALLOC_CTX *mem_ctx,
struct db_record **prec);
The central idea is that you can asynchronously wait for a
dbwrap based tdb record to change. The top commit in the git
worth the trouble to implement that API and the
infrastructure around it, but if you look at our share mode
an oplock implementation, a lot of the custom smbd messages
can be replaced by the new API. For example we have a
special message to inform a second opener about an oplock
being released. This can be simplified by sending a message
to the oplock holder and then watching the record to change.
After a change, just retry. Same holds true for timed byte
range locks and a few others.
What fell out of this work is the start of a reworked
messaging API, we now have msg_read_send/recv, a tevent_req
based version of messaging_register.
One consequence of using this API throughout smbd would be a
vastly improved cleanup behaviour after a crashed smbd.
Right now we have custom code to periodically walk the
brlock database. We do not have code to walk locking.tdb,
for good reason. You just don't want to traverse a database
of 100.000 open files when maybe 100 of those are waiting
for an oplock break. By using the dbwrap_watchers.tdb for
everyone waiting for a change, it becomes much more feasible
to walk this whole db and wake up all waiters whenever an
smbd dies or a node goes down.
This patchset is not perfect yet: One example piece missing
is proper cleanup right now. The code does not yet clean up
stale entries when a waiter dies hard.
Comments?
Volker
Under
http://git.samba.org/?p=vl/samba.git/.git;a=shortlog;h=refs/heads/dbwrap_record_watch
find a patchset that I've been working on for a while now.
It implements the following API:
struct tevent_req *dbwrap_record_watch_send(TALLOC_CTX *mem_ctx,
struct tevent_context *ev,
struct db_record *rec,
struct messaging_context *msg);
NTSTATUS dbwrap_record_watch_recv(struct tevent_req *req,
TALLOC_CTX *mem_ctx,
struct db_record **prec);
The central idea is that you can asynchronously wait for a
dbwrap based tdb record to change. The top commit in the git
This simplifies the g_lock implementation. The new
implementation tries to acquire a lock. If that fails due
to a lock conflict, wait for the g_lock record to change.
Upon change, just try again. The old logic had to cope
with pending records and an ugly hack into ctdb itself. As
a bonus, we now get a really clean async
g_lock_lock_send/recv that can asynchronously wait for a
global lock. This would have been almost impossible to do
without the dbwrap_record_watch infrastructure.
Just for the g_lock implementation it would not have beenimplementation tries to acquire a lock. If that fails due
to a lock conflict, wait for the g_lock record to change.
Upon change, just try again. The old logic had to cope
with pending records and an ugly hack into ctdb itself. As
a bonus, we now get a really clean async
g_lock_lock_send/recv that can asynchronously wait for a
global lock. This would have been almost impossible to do
without the dbwrap_record_watch infrastructure.
worth the trouble to implement that API and the
infrastructure around it, but if you look at our share mode
an oplock implementation, a lot of the custom smbd messages
can be replaced by the new API. For example we have a
special message to inform a second opener about an oplock
being released. This can be simplified by sending a message
to the oplock holder and then watching the record to change.
After a change, just retry. Same holds true for timed byte
range locks and a few others.
What fell out of this work is the start of a reworked
messaging API, we now have msg_read_send/recv, a tevent_req
based version of messaging_register.
One consequence of using this API throughout smbd would be a
vastly improved cleanup behaviour after a crashed smbd.
Right now we have custom code to periodically walk the
brlock database. We do not have code to walk locking.tdb,
for good reason. You just don't want to traverse a database
of 100.000 open files when maybe 100 of those are waiting
for an oplock break. By using the dbwrap_watchers.tdb for
everyone waiting for a change, it becomes much more feasible
to walk this whole db and wake up all waiters whenever an
smbd dies or a node goes down.
This patchset is not perfect yet: One example piece missing
is proper cleanup right now. The code does not yet clean up
stale entries when a waiter dies hard.
Comments?
Volker
--
SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:kontakt at sernet.de
SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:kontakt at sernet.de