urcu.git
9 years agoFallback mechanism not working on platform where TLS is unsupported
Christian Babeux [Thu, 8 Nov 2012 19:30:08 +0000 (14:30 -0500)] 
Fallback mechanism not working on platform where TLS is unsupported

The CONFIG_RCU_TLS entry in config.h.in is defined by default to "TLS".
This has the unfortunate consequence of defining CONFIG_RCU_TLS on
platform where TLS is unsupported and effectively disabling the pthread
based fallback mechanism. This macro should be #undef by default and the
AX_TLS m4 macro will properly detect if TLS is supported.

Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoRevert "Fix: cross-build: configure.ac should use --target, not --host"
Mathieu Desnoyers [Wed, 7 Nov 2012 20:22:57 +0000 (15:22 -0500)] 
Revert "Fix: cross-build: configure.ac should use --target, not --host"

This reverts commit 1eade46a854eb8211be9fd32e0cf6835576deb63.

No. --target is for building cross-compilers. --host was appropriate.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix: cross-build: configure.ac should use --target, not --host
Mathieu Desnoyers [Wed, 7 Nov 2012 20:09:28 +0000 (15:09 -0500)] 
Fix: cross-build: configure.ac should use --target, not --host

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agotest_urcu_wfcq: add splice and nosync tests
Mathieu Desnoyers [Sun, 4 Nov 2012 18:04:40 +0000 (13:04 -0500)] 
test_urcu_wfcq: add splice and nosync tests

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agotest_urcu_wfs: cleanup
Mathieu Desnoyers [Sun, 4 Nov 2012 18:03:59 +0000 (13:03 -0500)] 
test_urcu_wfs: cleanup

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agotest_urcu_lfs: cleanup
Mathieu Desnoyers [Sun, 4 Nov 2012 18:03:32 +0000 (13:03 -0500)] 
test_urcu_lfs: cleanup

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix static linking: add missing static for _defer_rcu
Mathieu Desnoyers [Thu, 1 Nov 2012 22:34:40 +0000 (18:34 -0400)] 
Fix static linking: add missing static for _defer_rcu

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agotests: report error value for make check
Mathieu Desnoyers [Thu, 1 Nov 2012 21:56:04 +0000 (17:56 -0400)] 
tests: report error value for make check

exit 1 as soon as a test fails.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoAdd multiflavor test program
Mathieu Desnoyers [Thu, 1 Nov 2012 21:50:24 +0000 (17:50 -0400)] 
Add multiflavor test program

Add a multiflavor test program to catch symbol name clashes earlier next
time.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix static linking: fix symbol name namespaces
Mathieu Desnoyers [Thu, 1 Nov 2012 21:49:39 +0000 (17:49 -0400)] 
Fix static linking: fix symbol name namespaces

gp_futex, yield_active, rand_yield, has_sys_membarrier, rcu_defer_exit,
call_rcu_data_free, call_rcu_before_fork, call_rcu_after_fork_parent,
call_rcu_after_fork_child are exported by each urcu flavor.

In order to fix use-cases where multiple flavors are statically linked
into the same application, we need to move these symbols to local
namespaces.

Ensure that all symbols are prefixed by "rcu_".

Also add each of those symbols into urcu/map/*.h headers, so they get
mapped to their flavor-specific symbol name by the preprocessor.

This requires bumping our .so version from 1.0.0 to 2.0.0, because it
changes some symbol names.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix static linking: add missing static to thr_defer
Mathieu Desnoyers [Thu, 1 Nov 2012 20:37:04 +0000 (16:37 -0400)] 
Fix static linking: add missing static to thr_defer

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix static linking: add missing static
Mathieu Desnoyers [Thu, 1 Nov 2012 20:33:01 +0000 (16:33 -0400)] 
Fix static linking: add missing static

update_counter_and_wait and call_rcu_data_list are only used locally.
Add the static keyword to ensure their symbol are not exported. This
helps fixing static linking of many URCU flavors into the same program.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agodeprecation: fix build with gcc < 4.5
Mathieu Desnoyers [Tue, 23 Oct 2012 15:40:37 +0000 (11:40 -0400)] 
deprecation: fix build with gcc < 4.5

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agowfstack.c: update copyright notice
Mathieu Desnoyers [Tue, 23 Oct 2012 15:22:56 +0000 (11:22 -0400)] 
wfstack.c: update copyright notice

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoUpdate wfstack copyright notice
Mathieu Desnoyers [Tue, 23 Oct 2012 15:02:27 +0000 (11:02 -0400)] 
Update wfstack copyright notice

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoComment fix: update associated LGPL header name
Mathieu Desnoyers [Tue, 23 Oct 2012 15:00:30 +0000 (11:00 -0400)] 
Comment fix: update associated LGPL header name

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoUpdate cds-api.txt following API deprecations
Mathieu Desnoyers [Tue, 23 Oct 2012 12:53:56 +0000 (08:53 -0400)] 
Update cds-api.txt following API deprecations

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoDeprecate wfqueue
Mathieu Desnoyers [Tue, 23 Oct 2012 12:43:33 +0000 (08:43 -0400)] 
Deprecate wfqueue

Replaced by "wfcqueue", which has a semantic that allows placing head
and tail on different cache lines, and does not allocate memory
internally. wfqueue users can easily migrate to wfcqueue.

We choose to deprecate wfqueue rather than reimplementing it on top of
wfcqueue to ensure we keep strong ABI compatibility for existing wfqueue
users.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoDeprecate rculfstack
Mathieu Desnoyers [Tue, 23 Oct 2012 12:36:42 +0000 (08:36 -0400)] 
Deprecate rculfstack

Replaced by "lfstack", which has a less restrictive semantic, and covers
rculfstack completely.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agowfcqueue: introduce nonblocking API
Mathieu Desnoyers [Mon, 22 Oct 2012 12:55:22 +0000 (08:55 -0400)] 
wfcqueue: introduce nonblocking API

Introduce nonblocking API in wfcqueue, allowing RT threads to try to
dequeue, splice, or iterate on spliced queues without blocking: the
caller needs to handle CDS_WFCQ_WOULDBLOCK return value (or nonzero
return value for splice).

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Paul McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
9 years agolfstack: test pop_all and pop
Mathieu Desnoyers [Fri, 12 Oct 2012 13:51:41 +0000 (09:51 -0400)] 
lfstack: test pop_all and pop

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agolfstack: implement empty, pop_all and iterators, document API
Mathieu Desnoyers [Fri, 12 Oct 2012 13:30:15 +0000 (09:30 -0400)] 
lfstack: implement empty, pop_all and iterators, document API

We are changing the ABI by adding a mutex into struct cds_lfs_stack.
This ABI has never been exposed in a release so far, so we can change
it.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agolfstack: implement test
Mathieu Desnoyers [Thu, 11 Oct 2012 20:44:40 +0000 (16:44 -0400)] 
lfstack: implement test

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agolfstack: implement lock-free stack
Mathieu Desnoyers [Thu, 11 Oct 2012 19:08:57 +0000 (15:08 -0400)] 
lfstack: implement lock-free stack

This stack does not require to hold RCU read-side lock across push, and
allows multiple strategies to be used for pop.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agowfstack: implement pop_all and iteration tests
Mathieu Desnoyers [Sat, 13 Oct 2012 02:11:49 +0000 (22:11 -0400)] 
wfstack: implement pop_all and iteration tests

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agowfstack: implement cds_wfs_pop_all and iterators, document API
Mathieu Desnoyers [Sat, 13 Oct 2012 01:47:05 +0000 (21:47 -0400)] 
wfstack: implement cds_wfs_pop_all and iterators, document API

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agorculfhash test: fix trivial memleak and return node leak and errors
Mathieu Desnoyers [Mon, 22 Oct 2012 22:17:24 +0000 (18:17 -0400)] 
rculfhash test: fix trivial memleak and return node leak and errors

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agorculfhash: add missing extern
Mathieu Desnoyers [Mon, 22 Oct 2012 21:37:38 +0000 (17:37 -0400)] 
rculfhash: add missing extern

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoCleanup: fix cppcheck errors
Mathieu Desnoyers [Mon, 22 Oct 2012 21:34:31 +0000 (17:34 -0400)] 
Cleanup: fix cppcheck errors

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agowfcqueue: remove ancient comment
Mathieu Desnoyers [Sun, 14 Oct 2012 15:59:31 +0000 (11:59 -0400)] 
wfcqueue: remove ancient comment

This comment is a leftover from wfqueue and is now inappropriate in the
context of wfcqueue: the dequeue operation busy-waits if it sees a NULL
next pointer from a node that is not the tail node.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agotest_urcu_lfq: remove rcu_defer_register_thread() from test_urcu_lfq
Lai Jiangshan [Sat, 13 Oct 2012 16:48:54 +0000 (12:48 -0400)] 
test_urcu_lfq: remove rcu_defer_register_thread() from test_urcu_lfq

test_urcu_lfq has already switch to call_rcu(),
rcu_defer_register_thread() is unneeded.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agotest_urcu_lfq: test for the proper pointer
Lai Jiangshan [Sat, 13 Oct 2012 16:46:45 +0000 (12:46 -0400)] 
test_urcu_lfq: test for the proper pointer

We should use "if (qnode)" instead of "if (node)" in case of
the struct cds_lfq_node_rcu is not the first field of struct node.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agotest_urcu_lfs: remove rcu_defer_register_thread() from test_urcu_lfs
Lai Jiangshan [Sat, 13 Oct 2012 16:45:33 +0000 (12:45 -0400)] 
test_urcu_lfs: remove rcu_defer_register_thread() from test_urcu_lfs

test_urcu_lfs has already switch to call_rcu(),
rcu_defer_register_thread() is unneeded.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agotest_urcu_lfs: test for the proper pointer
Lai Jiangshan [Sat, 13 Oct 2012 16:41:17 +0000 (12:41 -0400)] 
test_urcu_lfs: test for the proper pointer

We should use "if (snode)" instead of "if (node)" in case of
the struct cds_lfs_node_rcu is not the first field of struct node.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agowfcqueue: clarify locking usage
Mathieu Desnoyers [Fri, 12 Oct 2012 14:33:20 +0000 (10:33 -0400)] 
wfcqueue: clarify locking usage

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoDocument APIs in README
Mathieu Desnoyers [Fri, 12 Oct 2012 11:47:11 +0000 (07:47 -0400)] 
Document APIs in README

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoTest cleanup: replace "l" parameter by "loops"
Mathieu Desnoyers [Fri, 12 Oct 2012 11:29:34 +0000 (07:29 -0400)] 
Test cleanup: replace "l" parameter by "loops"

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoAdd wfcqueue header to cds.h
Mathieu Desnoyers [Thu, 11 Oct 2012 20:41:16 +0000 (16:41 -0400)] 
Add wfcqueue header to cds.h

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix: urcu-bp, urcu, urcu-qsbr should include wfcqueue
Mathieu Desnoyers [Thu, 11 Oct 2012 16:44:10 +0000 (12:44 -0400)] 
Fix: urcu-bp, urcu, urcu-qsbr should include wfcqueue

Those are still including wfqueue.h, but need to move to wfcqueue.h,
since this is now needed by call_rcu. It was still working, because call
rcu headers include wfcqueue.h, but they were doing so _after_ #undef
_LGPL_SOURCE was issued, which made wfcqueue.h depend on
liburcu-common.so to find the wfcqueue symbols. This was in turn adding
a transitive dependency that was not present before, and thus causing
build failure in cross-build environments, especially those on Debian
systems, due to special handling of transitive dependencies on Debian
autotools.

Reported-by: Simon Marchi <simon.marchi@polymtl.ca>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix: call_rcu list corruption on teardown (documentation)
Mathieu Desnoyers [Thu, 11 Oct 2012 16:28:23 +0000 (12:28 -0400)] 
Fix: call_rcu list corruption on teardown (documentation)

This commit is a place-holder to document that commit
5161f31e09ce33dd79afad8d08a2372fbf1c4fbe fixed a list corruption bug in
call_rcu.

Introducing __cds_wfcq_splice_blocking() fixed a list corruption bug in
the 0.7.x series. The equivalent fix appeared in 0.6.8 for the
stable-0.6 branch.

Description of the bug:

* Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> * Lai Jiangshan (laijs@cn.fujitsu.com) wrote:
> > test code:
> > ./tests/test_urcu_lfs 100 10 10
> >
> > bug produce rate > 60%
> >
> > {{{
> > I didn't see any bug when "./tests/test_urcu_lfs 10 10 10" Or
> +"./tests/test_urcu_lfs 100 100 10"
> > But I just test it about 5 times
> > }}}
> >
> > 4cores*1threads: Intel(R) Core(TM) i5 CPU         760
> > RCU_MB (no time to test for other rcu type)
> > test commit: 768fba83676f49eb73fd1d8ad452016a84c5ec2a
> >
> > I didn't see any bug when "./tests/test_urcu_mb 10 100 10"
> >
> > Sorry, I tried, but I failed to find out the root cause currently.
>
> I think I managed to narrow down the issue:
>
> 1) the master branch does not reproduce it, but commit
>    768fba83676f49eb73fd1d8ad452016a84c5ec2a repdroduces it about 50% of the
>    time.
>
> 2) the main change between 768fba83676f49eb73fd1d8ad452016a84c5ec2a and
>    current master (f94061a3df4c9eab9ac869a19e4228de54771fcb) is call_rcu
>    moving to wfcqueue.
>
> 3) the bug always arise, for me, at the end of the 10 seconds.
>    However, it might be simply due to the fact that most of the memory
>    get freed at the end of program execution.
>
> 4) I've been able to get a backtrace, and it looks like we have some
>    call_rcu callback-invocation threads still working while
>    call_rcu_data_free() is invoked. In the backtrace, call_rcu_data_free()
>    is nicely waiting for the next thread to stop, and during that time,
>    two callback-invocation threads are invoking callbacks (and one of
>    them triggers the segfault).
>
> So I expect that commit
>
> commit 5161f31e09ce33dd79afad8d08a2372fbf1c4fbe
> Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Date:   Tue Sep 25 10:50:49 2012 -0500
>
>     call_rcu: use wfcqueue, eliminate false-sharing
>
>     Eliminate false-sharing between call_rcu (enqueuer) and worker threads
>     on the queue head and tail.
>
>     Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>     Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>
> Could have managed to fix the issue, or change the timing enough that it
> does not reproduces. I'll continue investigating.

The bug was in call rcu. It is not required for master, because we fixed
it while moving to wfcqueue.  We were erroneously writing to the head
field of the default call_rcu_data rather than tail.

The conditions to reproduce this bug:

1) setup per-cpu callback-invocation threads,
2) use call_rcu
3) call call_rcu_data_free() while there are still some pending
   callbacks that have not yet been executed by the callback-invocation
   threads,
4) we then get corruption due to the "default" callback invocation
   that walks through a corrupted queue.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agocall_rcu: remove head field alignement, explain wfcqueue motivation
Mathieu Desnoyers [Thu, 11 Oct 2012 15:41:48 +0000 (11:41 -0400)] 
call_rcu: remove head field alignement, explain wfcqueue motivation

The following commit:

commit 5161f31e09ce33dd79afad8d08a2372fbf1c4fbe
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date:   Tue Sep 25 10:50:49 2012 -0500

    call_rcu: use wfcqueue, eliminate false-sharing

    Eliminate false-sharing between call_rcu (enqueuer) and worker threads
    on the queue head and tail.

introduced a change in call_rcu: it moved from "wfqueue" to "wfcqueue".
Its changelog states that the goal is to eliminate false-sharing, but
the changelog rationale is wrong.

The actual primary goal is to use the "splice" operation (which is
similar to the "dequeue_all" operation proposed by Lai Jiangshan),
instead of open-coding this operation directly within the call_rcu
implementation. The objective stated by Lai was to make testing of this
code-path easier, and he was right: we ended up noticing a bug in the
original call_rcu implementation (in this open-coded splice operation)
that was really hard to trigger, which was fixed by the move to
wfcqueue.

About false-sharing: In the case of call_rcu callback invokation threads
vs call_rcu callers, we do not care about false-sharing because call_rcu
callback-invocation threads use batching ("splice") to get an entire
list of callbacks, which effectively empties the queue, and requires to
touch the tail anyway. Ensuring that head and tail are placed on
different cache lines would matter only if we would be using "dequeue"
in the callback-invocation thread, which is not the case: we grab the
whole queue, and then iterate from our local head to our local tail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agowfcqueue: update credits in patch documentation
Mathieu Desnoyers [Thu, 11 Oct 2012 15:27:37 +0000 (11:27 -0400)] 
wfcqueue: update credits in patch documentation

Give credits to those responsible for the design and implementation of
commit 8ad4ce587f001ae026d5560ac509c2e48986130b, "wfcqueue: implement
concurrency-efficient queue", which happened through rounds of email and
patch exchanges.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agowfcqueue documentation: hint at for_each iterators
Mathieu Desnoyers [Mon, 8 Oct 2012 16:11:30 +0000 (12:11 -0400)] 
wfcqueue documentation: hint at for_each iterators

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix urcu-call-rcu-impl.h: false-sharing
Mathieu Desnoyers [Mon, 8 Oct 2012 14:44:38 +0000 (10:44 -0400)] 
Fix urcu-call-rcu-impl.h: false-sharing

> >  struct call_rcu_data {
> > -   struct cds_wfq_queue cbs;
> > +   /*
> > +    * Align the tail on cache line size to eliminate false-sharing
> > +    * with head.
> > +    */
> > +   struct cds_wfcq_tail __attribute__((aligned(CAA_CACHE_LINE_SIZE))) cbs_tail;
> > +   /* Alignment on cache line size will add padding here */
> > +
> > +   struct cds_wfcq_head cbs_head;
>
>
> wrong here. In this code, cbs_tail and cbs_head are in the same cache line.

Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agocall_rcu: use wfcqueue, eliminate false-sharing
Mathieu Desnoyers [Tue, 25 Sep 2012 15:50:49 +0000 (10:50 -0500)] 
call_rcu: use wfcqueue, eliminate false-sharing

Eliminate false-sharing between call_rcu (enqueuer) and worker threads
on the queue head and tail.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agowfcqueue test
Mathieu Desnoyers [Sun, 23 Sep 2012 23:16:08 +0000 (19:16 -0400)] 
wfcqueue test

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agowfcqueue: implement concurrency-efficient queue
Mathieu Desnoyers [Sun, 23 Sep 2012 23:14:59 +0000 (19:14 -0400)] 
wfcqueue: implement concurrency-efficient queue

This new API simplify the wfqueue implementation, and brings a 2.3x to
2.6x performance boost due to the ability to eliminate false-sharing
between enqueue and dequeue.

This work is derived from the patch from Lai Jiangshan submitted as
"urcu: new wfqueue implementation"
(http://lists.lttng.org/pipermail/lttng-dev/2012-August/018379.html)

Its changelog:

> Some guys would be surprised by this fact:
> There are already TWO implementations of wfqueue in urcu.
>
> The first one is in urcu/static/wfqueue.h:
> 1) enqueue: exchange the tail and then update previous->next
> 2) dequeue: wait for first node's next pointer and them shift, a dummy node
>  is introduced to avoid the queue->tail become NULL when shift.
>
> The second one shares some code with the first one, and the left code
> are spreading in urcu-call-rcu-impl.h:
> 1) enqueue: share with the first one
> 2) no dequeue operation: and no shift, so it don't need dummy node,
>  Although the dummy node is queued when initialization, but it is removed
>  after the first dequeue_all operation in call_rcu_thread().
>  call_rcu_data_free() forgets to handle the dummy node if it is not removed.
> 3)dequeue_all: record the old head and tail, and queue->head become the special
>  tail node.(atomic record the tail and change the tail).
>
> The second implementation's code are spreading, bad for review, and it is not
> tested by tests/test_urcu_wfq.
>
> So we need a better implementation avoid the dummy node dancing and can service
> both generic wfqueue APIs and dequeue_all API for call rcu.
>
> The new implementation:
> 1) enqueue: share with the first one/original implementation.
> 2) dequeue: shift when node count >= 2, cmpxchg when node count = 1.
>  no dummy node, save memory.
> 3) dequeue_all: simply set queue->head.next to NULL, xchg the tail
>  and return the old head.next.
>
> More implementation details are in the code.
> tests/test_urcu_wfq will be update in future for testing new APIs.

The patch proposed by Lai brings a very interesting simplification to
the single-node handling (which is kept here), and moves all queue
handling code away from call_rcu implementation, back into the wfqueue
code. This has the benefit to allow testing enhancements.

I modified it so the API does not expose implementation details to the
user (e.g. ___cds_wfq_node_sync_next). I added a "splice" operation and
a for loop iterator which should allow wfqueue users to use the list
very efficiently both from LGPL/GPL code and from non-LGPL-compatible
code.

I also changed the API so the queue head and tail are now two separate
structures: it allows the queue user to place these as they like, either
on different cache lines (to eliminate false-sharing), or close one to
another (on same cache-line) in case a queue is spliced onto the stack
and not concurrently accessed.

Benchmarks performed on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
(dual-core, with hyperthreading)

Benchmark invoked:
for a in $(seq 1 10); do ./test_urcu_wfq 1 1 10 -a 0 -a 2; done

(using cpu number 0 and 2, which should correspond to two cores of my
Intel 2-core/hyperthread processor)

Before patch:

testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues     97274297 nr_dequeues     80745742 successful enqueues     97274297 successful dequeues     80745321 end_dequeues 16528976 nr_ops    178020039
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues     92300568 nr_dequeues     75019529 successful enqueues     92300568 successful dequeues     74973237 end_dequeues 17327331 nr_ops    167320097
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues     93516443 nr_dequeues     75846726 successful enqueues     93516443 successful dequeues     75826578 end_dequeues 17689865 nr_ops    169363169
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues     94160362 nr_dequeues     77967638 successful enqueues     94160362 successful dequeues     77967638 end_dequeues 16192724 nr_ops    172128000
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues     97491956 nr_dequeues     81001191 successful enqueues     97491956 successful dequeues     81000247 end_dequeues 16491709 nr_ops    178493147
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues     94101298 nr_dequeues     75650510 successful enqueues     94101298 successful dequeues     75649318 end_dequeues 18451980 nr_ops    169751808
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues     94742803 nr_dequeues     75402105 successful enqueues     94742803 successful dequeues     75341859 end_dequeues 19400944 nr_ops    170144908
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues     92198835 nr_dequeues     75037877 successful enqueues     92198835 successful dequeues     75027605 end_dequeues 17171230 nr_ops    167236712
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues     94159560 nr_dequeues     77895972 successful enqueues     94159560 successful dequeues     77858442 end_dequeues 16301118 nr_ops    172055532
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues     96059399 nr_dequeues     80115442 successful enqueues     96059399 successful dequeues     80066843 end_dequeues 15992556 nr_ops    176174841

After patch:

testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues    221229322 nr_dequeues    210645491 successful enqueues    221229322 successful dequeues    210645088 end_dequeues 10584234 nr_ops    431874813
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues    219803943 nr_dequeues    210377337 successful enqueues    219803943 successful dequeues    210368680 end_dequeues 9435263 nr_ops    430181280
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues    237006358 nr_dequeues    237035340 successful enqueues    237006358 successful dequeues    236997050 end_dequeues 9308 nr_ops    474041698
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues    235822443 nr_dequeues    235815942 successful enqueues    235822443 successful dequeues    235814020 end_dequeues 8423 nr_ops    471638385
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues    235825567 nr_dequeues    235811803 successful enqueues    235825567 successful dequeues    235810526 end_dequeues 15041 nr_ops    471637370
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues    221974953 nr_dequeues    210938190 successful enqueues    221974953 successful dequeues    210938190 end_dequeues 11036763 nr_ops    432913143
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues    237994492 nr_dequeues    237938119 successful enqueues    237994492 successful dequeues    237930648 end_dequeues 63844 nr_ops    475932611
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues    220634365 nr_dequeues    210491382 successful enqueues    220634365 successful dequeues    210490995 end_dequeues 10143370 nr_ops    431125747
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues    237388065 nr_dequeues    237401251 successful enqueues    237388065 successful dequeues    237380295 end_dequeues 7770 nr_ops    474789316
testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues    221201436 nr_dequeues    210831162 successful enqueues    221201436 successful dequeues    210831162 end_dequeues 10370274 nr_ops    432032598

Summary: Both enqueue and dequeue speed increase: around 2.3x speedup
for enqueue, and around 2.6x for dequeue.

We can verify that:
   successful enqueues - successful dequeues = end_dequeues

For all runs (ensures correctness: no lost node).

CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoEnsure that read-side functions meet 10-line LGPL criterion
Paul E. McKenney [Fri, 7 Sep 2012 01:15:53 +0000 (21:15 -0400)] 
Ensure that read-side functions meet 10-line LGPL criterion

This commit ensures that all read-side functions meet the 10-line LGPL
criterion that permits them to be expanded directly into non-LGPL code,
without function-call instructions.  It also documents this as the
intent.

[ paulmck: Spelling fixes called out by Josh Triplett and name
change called out by Mathieu Desnoyers (_rcu_read_lock_help() ->
_rcu_read_lock_update(). ]

[ Mathieu Desnoyers: _rcu_read_unlock_help renamed to
  _rcu_read_unlock_update_and_wakeup, and spelling fix for
  preced -> precede. ]

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agotls-compat.h: document sigaltstack(2) limitation
Mathieu Desnoyers [Thu, 6 Sep 2012 23:09:28 +0000 (19:09 -0400)] 
tls-compat.h: document sigaltstack(2) limitation

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agourcu: add notice to URCU_TLS() for it is not strictly async-signal-safe
Lai Jiangshan [Thu, 6 Sep 2012 23:07:19 +0000 (19:07 -0400)] 
urcu: add notice to URCU_TLS() for it is not strictly async-signal-safe

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoDocument sigaltstack(2) limitation
Mathieu Desnoyers [Thu, 6 Sep 2012 13:58:36 +0000 (09:58 -0400)] 
Document sigaltstack(2) limitation

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoDocumentation: update LICENSE file
Mathieu Desnoyers [Mon, 3 Sep 2012 17:51:43 +0000 (13:51 -0400)] 
Documentation: update LICENSE file

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoUpdate version to 0.7.4 v0.7.4
Mathieu Desnoyers [Mon, 27 Aug 2012 12:09:26 +0000 (08:09 -0400)] 
Update version to 0.7.4

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agorculfhash API documentation: document destroy RCU read-lock constraint
Mathieu Desnoyers [Tue, 21 Aug 2012 22:45:37 +0000 (18:45 -0400)] 
rculfhash API documentation: document destroy RCU read-lock constraint

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix: rculfhash should be offline while waiting for resize to complete
Mathieu Desnoyers [Tue, 21 Aug 2012 15:01:50 +0000 (11:01 -0400)] 
Fix: rculfhash should be offline while waiting for resize to complete

Causes hang on destroy with urcu QSBR if destroy is called within a rcu
registered thread.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoAdd missing entry to gitignore
Mathieu Desnoyers [Wed, 15 Aug 2012 15:36:05 +0000 (11:36 -0400)] 
Add missing entry to gitignore

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agourcu: move busy-wait code and name it ___cds_wfq_node_sync_next()
Lai Jiangshan [Thu, 9 Aug 2012 14:24:38 +0000 (10:24 -0400)] 
urcu: move busy-wait code and name it ___cds_wfq_node_sync_next()

This code which waits for a node's next pointer until it appears, will
be used many times, move it to a help function and name it
___cds_wfq_node_sync_next().

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agourcu: fix compat_futex_noasync()
Lai Jiangshan [Thu, 9 Aug 2012 14:19:14 +0000 (10:19 -0400)] 
urcu: fix compat_futex_noasync()

This patch fix two critical problems in the compatibility fallback of
compact_futex_noasync():

1) compat_futex_cond is not bound to any @uaddr, it services all @uaddr,
   if you wakeup only one thread(pthread_cond_signal), the @uaddr of
   this waking thread and the @uaddr of the woken-up thread may be different.
   The woken-up thread will very probably go to sleep again
   because his own condition is not true.

   *And* this waking thread(FUTEX_WAKE) wake up NOTHING.

2) If the caller want to wake up all waiting threads, he will use INT_MAX
   for @val, and:
                for (i = 0; i < INT_MAX; i++)
                        pthread_cond_signal(&compat_futex_cond);
   becomes almost infinity loop.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agourcu: add hint to DEFINE_URCU_TLS() for compound types
Lai Jiangshan [Thu, 9 Aug 2012 14:10:08 +0000 (10:10 -0400)] 
urcu: add hint to DEFINE_URCU_TLS() for compound types

Just a hint.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix: CAA_BUILD_BUG_ON should refer to CAA_BUILD_BUG_ON_ZERO
Mathieu Desnoyers [Mon, 30 Jul 2012 03:45:40 +0000 (23:45 -0400)] 
Fix: CAA_BUILD_BUG_ON should refer to CAA_BUILD_BUG_ON_ZERO

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoAdd MIPS support
Ralf Baechle [Tue, 10 Jul 2012 15:03:08 +0000 (11:03 -0400)] 
Add MIPS support

[ Edit by Mathieu Desnoyers: add explanations about supported
MIPS architectures, extracted from conversation with Ralf Baechle:

* Supported architectures

Ralf Baechle (edited by Mathieu Desnoyers):

This code works on all MIPS architecture variants.  The memory barrier
instruction, SYNC, was introduced for MIPS II.  The original MIPS I
instruction set doesn't have SYNC nor SMP support in the processor
architecture itself so SMP required horrible kludges in the system
hardware.  I think it's safe to say that Linux/MIPS will never support
any of these MIPS I SMP systems.  In the unlikely case this happens
anyway, we have a (Linux) kernel emulation of the SYNC instruction.
Voila - full binary compatibility across all MIPS processors and the
oldest hardware pays the performance penalty.

* Choice of barrier for cmm_mb()/cmm_rmb()/cmm_wmb()

Ralf Baechle:
"RMI (aka Netlogic and now Broadcom) XLR processor cores can be
configured to permit LD-LD, LD-ST, ST-LD and ST-ST reordering; default
is only ST-ST reordering.  To allow Linux to eventually enable full
reordering cmm_mb(), cmm_rmb() and cmm_wmb() all should perform SYNC
and a compiler barrier."

* No-op choice for cmm_read_barrier_depends():

Ralf Baechle:
"Technically there is nothing in the MIPS architecture spec that would
keep a MIPS implementation from reordering as freely as an Alpha or
even more liberally.  In practice most do strong ordering.  However
there is no MIPS implementation that makes full use of all the rope
provided.  So in theory a paranoid implementation of
cmm_read_barrier_depends() for MIPS should perform a SYNC.  In reality
it's not necessary and no sane MIPS core designer would implement
something that would design a core that need a non-empty
cmm_read_barrier_depends().  The reason why my patch had an empty one
is that I was using the Alpha code as a template."

Mathieu Desnoyers:
Moreover, the Linux kernel chooses a no-op for MIPS
read_barrier_depends() implementation, so any MIPS architecture that
would be as weak as Alpha would break the Linux kernel before breaking
the userspace RCU library.

* No need to put ".set noreorder" in cmm_mb() inline assembly:

Ralf Baechle:
"Certain instructions such as SYNC won't get reordered." ]

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
CC: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoCompatibility: remove bash-ismsm from test scripts
Mathieu Desnoyers [Mon, 9 Jul 2012 13:44:52 +0000 (09:44 -0400)] 
Compatibility: remove bash-ismsm from test scripts

+= is not supported by all shells.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix inappropriate lib behavior: don't call exit()
Mathieu Desnoyers [Fri, 22 Jun 2012 16:48:14 +0000 (12:48 -0400)] 
Fix inappropriate lib behavior: don't call exit()

Use abort() (implemented through the new urcu_die()) instead of exit(-1)
for unrecoverable errors.

Fixes #152

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix: re-enable compatibility with autoconf < 2.64
Mathieu Desnoyers [Thu, 14 Jun 2012 04:56:40 +0000 (00:56 -0400)] 
Fix: re-enable compatibility with autoconf < 2.64

> I tried to build the latest urcu (git master e51500) on a Centos 6.2 box, and got:
>
> jscott@dxi0-62:~/src/userspace-rcu$ make -j4
> CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh /users/jscott/src/userspace-rcu/config/missing --run aclocal-1.11 -I
> +config
> CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh /users/jscott/src/userspace-rcu/config/missing --run autoconf
>  cd . && /bin/sh /users/jscott/src/userspace-rcu/config/missing --run automake-1.11 --foreign
> configure:4010: error: possibly undefined macro: m4_ifnblank
>       If this token and others are legitimate, please use m4_pattern_allow.
>       See the Autoconf documentation.
> make: *** [configure] Error 1
> make: *** Waiting for unfinished jobs....
>
> Some digging showed that the macro m4_ifnblank requires autoconf 2.64. Centos 6.2 has autoconf 2.63. :(
>
> I just worked around it by reverting commit a767fd locally, then I can build fine.

Reported-by: John Steele Scott <toojays@toojays.net>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix c99 compatibility: use __asm__ and __volatile__ in public headers
Mathieu Desnoyers [Tue, 12 Jun 2012 15:24:31 +0000 (11:24 -0400)] 
Fix c99 compatibility: use __asm__ and __volatile__ in public headers

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix c99 compatibility: use __typeof__ instead of typeof in public headers
Mathieu Desnoyers [Mon, 11 Jun 2012 14:16:35 +0000 (10:16 -0400)] 
Fix c99 compatibility: use __typeof__ instead of typeof in public headers

Reported-by: John Steele Scott <toojays@toojays.net>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agowarning fix: tests urcutorture for NetBSD 5
Mathieu Desnoyers [Fri, 1 Jun 2012 21:12:43 +0000 (17:12 -0400)] 
warning fix: tests urcutorture for NetBSD 5

>   CC     rcutorture_urcu-urcutorture.o
> In file included from urcutorture.c:9:
> api.h: In function '__smp_thread_id':
> api.h:160: warning: cast from pointer to integer of different size
> api.h:160: warning: cast from pointer to integer of different size
> api.h: In function 'wait_thread':
> api.h:210: warning: cast from pointer to integer of different size
> api.h:210: warning: cast from pointer to integer of different size

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoUpdate version to 0.7.3 v0.7.3
Mathieu Desnoyers [Fri, 1 Jun 2012 17:45:44 +0000 (13:45 -0400)] 
Update version to 0.7.3

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix tests: make dist lib dependency
Mathieu Desnoyers [Fri, 1 Jun 2012 17:58:31 +0000 (13:58 -0400)] 
Fix tests: make dist lib dependency

Some test programs were depending in SOURCES on the CDS library. Change
this for a LDADD, which makes "make dist" work after a make clean.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoUpdate README for OS supported, tests dependency
Mathieu Desnoyers [Fri, 1 Jun 2012 17:43:23 +0000 (13:43 -0400)] 
Update README for OS supported, tests dependency

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoAdd CodingStyle to tarball
Mathieu Desnoyers [Wed, 30 May 2012 13:55:39 +0000 (09:55 -0400)] 
Add CodingStyle to tarball

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoAdd coding style document
Mathieu Desnoyers [Wed, 30 May 2012 13:03:45 +0000 (09:03 -0400)] 
Add coding style document

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoTest fix: test_perthreadlock uninitialized mutex
Mathieu Desnoyers [Tue, 29 May 2012 02:10:54 +0000 (22:10 -0400)] 
Test fix: test_perthreadlock uninitialized mutex

- Initialize the per thread mutexes. (fix)
- Remove unused count_reader/count_writer variables. (cleanup)

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agotests: support FreeBSD short "time" args
Hirohisa Yamaguchi [Sun, 27 May 2012 18:16:59 +0000 (14:16 -0400)] 
tests: support FreeBSD short "time" args

time(1) in FreeBSD does not have long argument name: change --append to
-a and --output to -o

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agofreebsd 8.2 fix: define MAP_ANONYMOUS for compatibility
Mathieu Desnoyers [Sat, 26 May 2012 15:00:16 +0000 (11:00 -0400)] 
freebsd 8.2 fix: define MAP_ANONYMOUS for compatibility

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoUpdate version to 0.7.2 v0.7.2
Mathieu Desnoyers [Thu, 24 May 2012 21:24:21 +0000 (17:24 -0400)] 
Update version to 0.7.2

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix library compatibility
Mathieu Desnoyers [Thu, 24 May 2012 20:56:08 +0000 (16:56 -0400)] 
Fix library compatibility

Commit 4d0d66bb795d1ed938e11a97a4e5f71326e20c71, implementing
tls-compat.h for pthread TLS compatibility, adds a prefix in front of
each TLS symbol (__tls_*). However, some of these symbols are exported
by the URCU library (e.g. rcu_reader_mb, defined in urcu.c as
"rcu_reader", which is overloaded by the urcu/map/urcu.h) to
applications. Therefore, this breaks binary compatibility with 0.6.x
versions of the library. This is not intended, and therefore is a bug,
so we remove this __tls_* prefix from the variables declared, defined
and referenced to through the tls-compat.h API for compilers supporting
"__thread".

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoUpdate version to 0.7.1 v0.7.1
Mathieu Desnoyers [Thu, 24 May 2012 17:17:30 +0000 (13:17 -0400)] 
Update version to 0.7.1

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agofix: uatomic_set return value compile fix for non-x86 arch.
Mathieu Desnoyers [Thu, 24 May 2012 15:51:03 +0000 (11:51 -0400)] 
fix: uatomic_set return value compile fix for non-x86 arch.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoUpdate version to 0.7.0 v0.7.0
Mathieu Desnoyers [Mon, 21 May 2012 19:09:46 +0000 (15:09 -0400)] 
Update version to 0.7.0

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoCleanup: header comments coding style
Mathieu Desnoyers [Mon, 21 May 2012 22:35:22 +0000 (18:35 -0400)] 
Cleanup: header comments coding style

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoDocument uatomic operations
Mathieu Desnoyers [Fri, 18 May 2012 03:18:35 +0000 (23:18 -0400)] 
Document uatomic operations

Document each atomic operation provided by urcu/uatomic.h, along with
their memory barrier guarantees.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoUpdate return value of "set" operations
Mathieu Desnoyers [Fri, 18 May 2012 03:14:26 +0000 (23:14 -0400)] 
Update return value of "set" operations

To follow the way the Linux kernel implements atomic_set(), we change
some API functions so they don't return any value anymore.

This is now the case for:

uatomic_set()
rcu_set_pointer()
rcu_assign_pointer()

This API change is very minor. In all instances of the Linux kernel
using rcu_assign_pointer(), none currently care about its return value.

However, we keep ABI compatibility: rcu_set_pointer_sym() still returns
the "v" value, even though it is not used by its wrapper macro anymore.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoFix mremap wrapper for NetBSD 5
Mathieu Desnoyers [Wed, 16 May 2012 18:30:09 +0000 (14:30 -0400)] 
Fix mremap wrapper for NetBSD 5

NetBSD 5 implements a mremap with a different semantic. Rename our
wrapper symbol name so it does not clash with the NetBSD 5 symbol.
Eventually, we could envision doing a special-case that uses the NetBSD
5 version instead of the fallback, but let's first get it working before
going into optimization land.

Suggested-by: Marek Vavruša <marek.vavrusa@nic.cz>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoUse urcu/tls-compat.h
Mathieu Desnoyers [Wed, 16 May 2012 13:42:55 +0000 (09:42 -0400)] 
Use urcu/tls-compat.h

Provides compatibility for OpenBSD, NetBSD and Darwin.

Suggested-by: Marek Vavruša <marek.vavrusa@nic.cz>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoImplement urcu/tls-compat.h
Mathieu Desnoyers [Tue, 15 May 2012 20:19:07 +0000 (16:19 -0400)] 
Implement urcu/tls-compat.h

Suggested-by: Marek Vavruša <marek.vavrusa@nic.cz>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoAdd TLS detection m4 macro
Mathieu Desnoyers [Tue, 15 May 2012 18:35:44 +0000 (14:35 -0400)] 
Add TLS detection m4 macro

Will allow urcu to support OSes that require to use pthread TLS (and do
not provide __thread TLS support).

Suggested-by: Marek Vavruša <marek.vavrusa@nic.cz>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agodocument concurrent data structures
Mathieu Desnoyers [Tue, 15 May 2012 11:50:30 +0000 (07:50 -0400)] 
document concurrent data structures

Document the concurrent data structures provided by the userspace RCU
library.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agodocumentation: refer to rcu-api.txt
Mathieu Desnoyers [Tue, 15 May 2012 03:01:06 +0000 (23:01 -0400)] 
documentation: refer to rcu-api.txt

API.txt moved to userspace-rcu documentation rcu-api.txt.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoMove API.txt to doc/rcu-api.txt, install in system doc/
Mathieu Desnoyers [Tue, 15 May 2012 02:37:26 +0000 (22:37 -0400)] 
Move API.txt to doc/rcu-api.txt, install in system doc/

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agorculfhash: document implied memory barriers
Mathieu Desnoyers [Tue, 8 May 2012 21:12:20 +0000 (17:12 -0400)] 
rculfhash: document implied memory barriers

We choose to provide full memory barriers before and after successful
hash table update operations. Eventually, new API with weaker semantic
can be added, but let's make the basic API as fool-proof as possible.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
9 years agorculfhash: Ensure future-proof memory barrier semantic consistency
Mathieu Desnoyers [Tue, 8 May 2012 21:09:46 +0000 (17:09 -0400)] 
rculfhash: Ensure future-proof memory barrier semantic consistency

Use cmm_smp_mb__before_uatomic_or() prior to the uatomic_or() in
_rcu_lfht_del() to ensure correct memory barrier semantic when we relax
(in the future) the barrier implementation of some architectures.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agoAPI cleanup: use "uatomic_*" in cmm_smp_mb__ API
Mathieu Desnoyers [Tue, 8 May 2012 21:07:03 +0000 (17:07 -0400)] 
API cleanup: use "uatomic_*" in cmm_smp_mb__ API

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agouatomic: add memory barrier API for and/or/add/sub/inc/sub
Mathieu Desnoyers [Tue, 8 May 2012 20:47:28 +0000 (16:47 -0400)] 
uatomic: add memory barrier API for and/or/add/sub/inc/sub

Implement:
cmm_smp_mb__before_and, cmm_smp_mb__after_and
cmm_smp_mb__before_or, cmm_smp_mb__after_or
cmm_smp_mb__before_add, cmm_smp_mb__after_add
cmm_smp_mb__before_sub, cmm_smp_mb__after_sub
cmm_smp_mb__before_inc, cmm_smp_mb__after_inc
cmm_smp_mb__before_dec, cmm_smp_mb__after_dec

For generic and x86.

These currently translate into simple compiler barriers on all
architectures, but the and/or/add/sub/inc/dec uatomics do not provide
memory ordering guarantees (only uatomic_add_return, uatomic_sub_return,
uatomic_xchg, and uatomic_cmpxchg provides full memory barrier
guarantees before and after the atomic operations).

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agorculfhash: add runhash.sh test script
Mathieu Desnoyers [Tue, 8 May 2012 04:42:58 +0000 (00:42 -0400)] 
rculfhash: add runhash.sh test script

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agorculfhash tests: add missing check
Mathieu Desnoyers [Tue, 8 May 2012 04:03:00 +0000 (00:03 -0400)] 
rculfhash tests: add missing check

We need to check if test_ht is NULL.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agorculfhash: fix: race between replace and del operations
Mathieu Desnoyers [Mon, 7 May 2012 23:07:35 +0000 (19:07 -0400)] 
rculfhash: fix: race between replace and del operations

Bug introduced by commit db00ccc36e7fb04ce8044fb1be7964acd1de6ae0

Here is the race:

Initially in hash table:  A

T0                                         T1
replace A by B
                                           del A
  read A->next
  -> check REMOVED flag, not set yet.
                                           read A->next
                                           -> check REMOVED flag, not set yet.
  cmpxchg A->next to set REMOVED flag
  -> cmpxchg succeeds
                                           uatomic_or to set REMOVED flag
                                           uatomic_xchg to atomically set the REMOVAL_OWNER flag
                                           -> first to set the flag.
  Replace returns node -> free(A)          Del success -> free(A)

With this race, we have a double-free.

The problem with the replace code is that it does not set the
"REMOVAL_OWNER" flag.

Test case to reproduce the bug:

test_urcu_hash 0 2 20 -A -s -M 1 -N 1 -O 1

(2 threads, doing replace/del, with a hash table that has only a single
key for all values). After just a couple of seconds, either the program
hangs, or, more often, it does:

*** glibc detected ***
/media/truecrypt1/compudj/doc/userspace-rcu/tests/.libs/test_urcu_hash:
malloc(): memory corruption (fast): 0x00007ffff3a29e25 ***

Program received signal SIGSEGV, Segmentation fault.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agorculfhash: replace unneeded rcu_dereference by CMM_LOAD_SHARED
Mathieu Desnoyers [Mon, 7 May 2012 15:18:14 +0000 (11:18 -0400)] 
rculfhash: replace unneeded rcu_dereference by CMM_LOAD_SHARED

The difference between the two is that CMM_LOAD_SHARED() does not imply
a read barrier between the read and following uses of the data pointed
to by the pointer read.

All sites that only use the pointer load for its bits (never
dereference) don't need the read barrier implied by rcu_dereference.

Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agorculfhash: use do {} while (0) for dbg_printf()
Mathieu Desnoyers [Mon, 7 May 2012 15:10:52 +0000 (11:10 -0400)] 
rculfhash: use do {} while (0) for dbg_printf()

found by clang(make CC=clang).

avoid empty statement.
-------------------------
if (condition)
        dbg_printf()  /* forget ";", but compiler say nothing if dbg_printf() is empty */
statement;
-------------------------

also add printf format check.
(we can use gcc extention "__printf(1, 2)" to declare a dummy inline function
to do the check, but I use "printf()" directly here)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 years agorculfhash: cleanup typo
Mathieu Desnoyers [Tue, 1 May 2012 12:09:37 +0000 (08:09 -0400)] 
rculfhash: cleanup typo

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
This page took 0.074652 seconds and 4 git commands to generate.