Discussion:
Master Plan on rewinding
Alexander E. Patrakov
2014-09-07 15:16:48 UTC
Permalink
Hello.

(TL;DR: nothing really new except the strawman proposal about threads
and the note about interaction of variable sample rate with rewindability)

As Takashi Iwai told me that the audio miniconference is for discussion
only, and not for presentation of anything, I guess that I need to
present the plans and options now. That's why this e-mail. The goal
here, besides merely presenting the plan, is to identify points that
everyone agrees upon, so that they are not discussed pointlessly at the
miniconference. Also, this e-mail serves as a justification for the
pending seemingly-destructive work.

First, the status quo. If anyone disagrees with the facts below, please
complain loudly, before I make any conclusions!

1. PulseAudio does not call snd_pcm_rewindable(), because for some ALSA
plugins it crashed. This crash is completely fixed in alsa-lib 1.0.28,
but in some cases snd_pcm_rewindable() still returns wrong results.

2. PulseAudio blindly assumes that it can rewind up to hwbuf_frames -
(snd_pcm_avail() + rewind_safeguard) frames. The rewind safeguard is
needed due to reasons that I don't completely understand, but one of
them is imprecise reporting of the hardware pointer, and another one is
that the hardware transfers several bytes at a time, and the bytes we
need to overwrite may be already cached by the hardware.

3. On the hw plugin, I could demonstrate two other bugs regarding
snd_pcm_rewindable(): stale data and bogus negative return values.

4. There are ALSA plugins like "a52" or the "bluetooth" plugin from old
version of bluez that return non-zero for snd_pcm_rewindable() and
snd_pcm_rewind, but actually don't rewind, and where rewinding is almost
impossible to implement (e.g. because this would seemingly involve
unsending of already-sent bluetooth packets). See below why "almost",
you need to search for "Alternative strawman proposal".

5. PulseAudio contains a workaround for non-rewindability of the a52
plugin that also happens to apply to other ioplug-based plugins.
However, this workaround does not match extplug-based plugins (including
"dca"), and my patch extending the workaround has been rejected with the
argument that this has to be fixed in ALSA.

6. The rate plugin has been made non-rewindable in alsa-lib 1.0.28
because the old implementation was wrong and no simple fix exists.

7. Issues regarding rewindability of PulseAudio internals such as
virtual sinks will be discussed separately.

And now the topics for discussion.

My immediate plan is to fix (3) and, if anyone replies to the rewind
safeguard proposal, maybe start fixing (2) from the alsa-lib side.

For all proposals below, unless I say otherwise, I intend to write the
required alsa-lib code myself if the proposal is accepted.

=== On the rewind safeguard ===

The consensus is that rewind_safeguard is a workaround for an ALSA bug.
This information should come from snd_pcm_rewindable() from the hw
plugin. I.e., on a hw device, it should not be equal to
snd_pcm_mmap_hw_avail(). While the method of dealing with stale data and
negative returns is obvious to me, I am not completely sure about the
safeguard. Where should this value come from?

Proposal (credit goes to Raymond Yau): the safeguard should be ideally
equal to the granularity of hardware pointer updates. However, we don't
know this granularity a priori, and cannot know this because nobody will
find out this information for old cards. Since we can't get this, let's
instead use the upper bound: the minimum possible period size for the
given sample rate, sample format and the number of channels.

On my desktop PC, on snd-hda-intel with analog outputs for S16LE stereo,
the granularity is 32 bytes (= 8 samples), and I get the pointer
granularity of 64 bytes (=16 samples) over HDMI. The minimum period size
is 32 samples in both cases. Thus Raymond's method will overestimate the
required safeguard on my cards, but it is still better than no safeguard
at all, and less than the hard-coded value in PulseAudio.

On ymfpci, the pointer granularity is 5 ms, thus the value currently
hard-coded in PA is insufficient. If I understand the card's limitations
correctly, Raymond's method will yield the spot-on result for this card.

For the allegedly existing cards where the pointer granularity is always
the same as the period size, Raymond's method will not work. However, I
don't know any concrete example of such card (I didn't look at
snd-firewire in detail, though). And in any case, my opinion is that
this "pointer granularity = period size" limitation on such cards can be
treated as a driver bug that needs to be fixed (by someone who knows the
driver, not by me). On cards where pointer updates happen only on
interrupts, the driver should not configure the card in such a way that
one period visible to the userspace corresponds to one interrupt.
Instead, it should always configure the card for the minimum possible
period size, and report only part of the period interrupts to
period_elapsed(). I.e.: userspace asked for 2 periods 64 ms each, but
let's configure the card for, say, 64 periods 2 ms each, and use the
"extra" interrupts only for updating the pointer.

Hopefully, the above heuristic and driver fix will also let PulseAudio
get rid of special treatment of the "batch" cards, simply because they
(in the "pointer granularity = period size" definition which is
currently used in PulseAudio) will no longer exist.

If the heuristic (or something else) is accepted, then the question
remains how to enable the use of snd_pcm_rewindable in PulseAudio. The
problem is compatibility with older ALSA versions that, let me remind
you, crash if snd_pcm_rewindable is called. I don't have proposals or
opinions here. The obvious options are: compile-time alsa-lib version
check with fallback to old code on buggy versions, run-time alsa-lib
version check with the same fallback, add a (both compile-time and
run-time) hard requirement of a fully-fixed alsa-lib version, or do
nothing at all.

Possible discussion: alternative heuristics and, as mentioned, migration
path in PulseAudio.

=== On non-rewindability of the rate plugin ===

I intend to write a rewindable resampler eventually, but don't have time
now. I understand that it is an important task, but issues below (and
the dayjob which you can change by offering me a new one) have higher
priority for me. However, I want everyone to understand the following
point now:

"The resampler has to be written from scratch for the reasons explained
in http://permalink.gmane.org/gmane.linux.alsa.devel/122179 , and
similar arguments apply to all other kinds of sound processing code that
needs history."

For PulseAudio, it is also needed to figure out the desired interaction
between variable rate and rewindability. Should rewinding other than
"discard everything completely" be allowed at all on variable rate
streams when the rewind crosses the sample rate change point? I.e.,
write 100 samples, change rate, write 50 samples, rewind 100 samples,
what should be the resulting rate? Should we special-case small changes
vs big ones?

Slightly off-topic, but still a discussion point.

=== On possibly-incomplete rewindability of the file plugin ===

The file plugin has quite hairy code involving the use of the write
buffer. I have not verified its correctness or studied the code in
detail. Anyway, it looks like it allows rewinding over that buffer only,
even though its slave may allow rewinding further. If true, this would
make the apparent rewindability useless, as the applications depend on
being able to rewind almost everything, and won't like the
apparently-random limitation (random because it depends on some obscure
"wbuf" implementation detail).

I think that the plugin has to be changed to avoid this limitation if it
really exists (which I still need to verify). Namely, I propose keeping
a shadow copy of the slave's buffer, and, at the beginning of each API
call, querying the slave about its hardware pointer. Only the part of
the shadow buffer that corresponds to the distance covered by the
slave's hardware pointer since the last call has to be written to the file.

The same applies to recording. As the code is much simpler there, it is
obvious that the problem exists: if one rewinds in order to look at the
past recorded samples again, the plugin will read further samples from
the file instead of supplying the already-looked-at samples. The same
solution with the slave hardware pointer should be applicable.

The possible discussion here amounts to suggesting alternative proposals.

=== On bogus rewindability of ladspa and extplug plugins ===

For ioplug, see below, because the situation is different enough.

Neither LADSPA nor snd_pcm_extplug_t have an API that allows to tell the
corresponding plugin to forget the last N samples. We cannot fix LADSPA,
as it is beyond our control. And we cannot really fix extplug, because
this just means moving the problem down one level. External libraries
(which are depended upon by the extplug-based plugins) are almost
universally not rewindable, are beyond our control, and will not be made
rewindable, because rewindability is hard and is not needed by anyone
except ALSA and PulseAudio. Still, neither .rewindable not .rewind
currently return 0.

So, I'd argue that the only correct result from snd_pcm_rewindable() is
0 for ladspa and extplug plugins. However, right now, they forward the
result from the slave. This needs to be fixed, and I have a patch.

With snd_pcm_rewind(), the situation is a bit different. It, obviously,
does nothing to tell the LADSPA or extplug plugin that the rewind
happened (because it can't). It also performs a rewind on the slave. So,
assuming that the rewind succeeds and the DSP algorithm in the plugin
uses N samples from the past, we'll end up with the output of the
correct duration but with N corrupted samples. Not ideal. But if we
don't perform the rewind at all and return 0, we'll end up with
regressing PulseAudio: huge latency when applying software volume
changes or starting new streams. By setting up the large buffer size, it
essentially assumes that any rewind will succeed.

So, my hackish proposal is: make the .rewindable callbacks on ladspa and
extplug return 0, keep the current implementation of the .rewind
callback for compatibility with the current versions of PulseAudio,
document this hack. I already have patch for this, will send out if we
agree that the proposal is good.

An alternative proposal (that I would implement if the proposal above is
rejected and someone actively confirms this one - but I don't like it)
would be to keep the current versions of the .rewindable and .rewind
callbacks and add new versions of them (and new user-visible ALSA API)
for the situation when imperfect results are unacceptable.

A completely strawman proposal would be to introduce a low-latency
thread, see below in the ioplug section.

Same for .forwardable and .forward.

Any other alternatives that I missed?

=== On bogus rewindability of some ioplug-based plugins ===

Ioplug-based plugins currently report rewindability according to the
same "appl.ptr - hw.ptr" rule that is used by the hw plugin. However,
this is incorrect.

In fact, there are two types of ioplug-based plugins. The jack plugin
does not even have a .transfer callback, and does all the work of
irrevocably submitting samples to the JACK server in a thread. So, it is
almost completely rewindable (with that "almost" being equal to JACK
buffer size). Proposal: keep this logic for all plugins without the
.transfer callback. Expose the JACK buffer size as the minimum period
size. Then the same rule proposed in the section about the hw plugin
will work for .rewindable.

Other plugins do their irrevocable work in the .transfer callback and
thus are not really rewindable. Proposal: return 0 from .rewindable and
rewind ioplug callbacks for all plugins with a .transfer callback (but
see the next section for an amendment). Document that the best practice
is to have no .transfer callback.

Alternative strawman proposal (speculative): if a .transfer callback
exists, call it from an alsa-lib-created thread (not from
snd_pcm_writei!) at the last possible moment, using the least possible
amount of data (that truly cannot be rewound) in the mmap-style buffer.
The new mantra is: .transfer is the thing that moves the hardware
pointer. This way, previously non-rewindable plugins could become
rewindable. Not sure if this is possible without changing the ABI or
possible at all, though - it essentially forces all plugins to declare
mmap support. Also not sure what to do with the .pointer callback.

For the strawman proposal, the benefit is a possibility to have dmix on
top of a52 (because freewheeling will now work), the downside is that
all programs (not only those intending to do rewinds) now pay the cost
of the background low-latency thread. Can we avoid this (e.g. with a new
flag for snd_pcm_open), so that only programs that intend to do rewinds
pay the price? Can this new flag apply to the batch hw drivers? Won't
that inflate the test matrix for plugin code?

For discussion: do we implement the first proposal or try the strawman
proposal? Any other suggestions regarding ioplug internals? Anyone else
has a problem with two completely different interfaces (with and without
.transfer) being exposed under the same ioplug framework?

=== On the pulse plugin ===

PulseAudio natively supports rewinding on the wire. However, the
ioplug-based plugin doesn't. It pretends that it rewound something, but
always passes 0, 0 as the last two parameters to pa_stream_write().
That's bad.

It looks possible (but I haven't checked) to figure out the correct
arguments (i.e. to detect the rewinds done before the call to
snd_pcm_writei) by looking at the application pointer. This way, it
looks possible to support rewinds, but does not look possible to get the
correct result (and not the result implied by the circular-buffer model)
from the snd_pcm_rewindable() function.

Also, we apply proposals from the previous section, we'll either end up
with a plugin that is truly non-rewindable and doesn't pretend to be
rewindable, or (with a strawman proposal) a rewindable plugin that
forces low latency on the PulseAudio side and causes a lot of wakeups.
Both situations are suboptimal.

So, I see no way around adding the .rewind and .rewindable callbacks to
snd_pcm_ioplug_callback, and treating the plugins with them as
rewindable. I guess .forward and .forwardable should be added, too.

For discussion: do we do the above or declare that ioplug is not really
a suitable infrastructure for the pulse plugin? What other alternatives
can be proposed?

=== On communication of non-rewindability to the program ===

PulseAudio attempts to use timer-based scheduling and rewinds. It makes
a big hardware buffer in expectation that it will be able to rewind -
with the exception of a workaround for ioplug-based plugins. It should
not set big latency for non-rewindable plugins, but currently has no
idea how to get this bit of information. See
http://lists.freedesktop.org/archives/pulseaudio-discuss/2014-April/020457.html
for the initial problem statement.

If all of the proposals above are implemented, then we'll be at a point
where each ALSA plugin is either almost fully rewindable, or not
rewindable at all. So, contrary to the end of
http://permalink.gmane.org/gmane.linux.alsa.devel/122191 , I think we
need just snd_pcm_hw_params_is_seekable() (or rename it if you wish)
with an essentially boolean result.

For discussion:

1. Decide, finally, on this bit of API, so that I can start working.

2. Decide what to do with old alsa-lib versions in PulseAudio. I.e.
transition plan.

=== On the programmer expectations ===

(social issue)

Some people, including at least Andrew Eikum and Clemens Ladisch, at
least once in the past expressed the opinion that amounts to "any plugin
that does not allow random access is, as far as the ALSA API is
concerned, buggy" (quoting
http://permalink.gmane.org/gmane.linux.alsa.devel/122159 ), i.e. they
are maybe asking for the impossible. We need to do something about that.
Choices:

1. Document clearly that there exist non-rewindable plugins, and that it
is not a bug until someone implements a working time machine.

2. Implement a strawman proposal with the background low-latency thread
if it is doable (and I am not sure that it is doable).

3. Remove ioplug, extplug and ladspa as unfixable, fix the rate and
adpcm plugins (or remove adpcm). Do something about pulse and jack
because they rely on the to-be-removed ioplug infrastructure. Then all
remaining plugins will indeed be fully rewindable.

4. Something else?

Sorry for the negative tone here, I know I am bad at formulating and
resolving social issues.

=== Conclusion ===

When all the proposals are implemented, we'll have correct
implementations of the rewind operation in all plugins (where "return 0"
counts as correct), and "return 0" only where it is truly unavoidable.
PulseAudio will be able to use all of that, and avoid doing rewinds on
non-rewindable plugins. On this positive note, I'd like to finish
writing this long e-mail.
--
Alexander E. Patrakov
Tanu Kaskinen
2014-09-07 18:38:07 UTC
Permalink
Post by Alexander E. Patrakov
=== On non-rewindability of the rate plugin ===
I intend to write a rewindable resampler eventually, but don't have time
now. I understand that it is an important task, but issues below (and
the dayjob which you can change by offering me a new one) have higher
priority for me. However, I want everyone to understand the following
"The resampler has to be written from scratch for the reasons explained
in http://permalink.gmane.org/gmane.linux.alsa.devel/122179 , and
similar arguments apply to all other kinds of sound processing code that
needs history."
For PulseAudio, it is also needed to figure out the desired interaction
between variable rate and rewindability. Should rewinding other than
"discard everything completely" be allowed at all on variable rate
streams when the rewind crosses the sample rate change point? I.e.,
write 100 samples, change rate, write 50 samples, rewind 100 samples,
what should be the resulting rate? Should we special-case small changes
vs big ones?
This last paragraph isn't related to the rate plugin, right? So you're
talking about PulseAudio internals only? If so, perhaps the best
approach would be to make the current stream buffer contents
non-rewindable when a rate change occurs, at least until someone points
out a real use case where it is important to be able to rewind past rate
change points in the buffer. Without any example use cases, I don't feel
qualified to answer the question what should happen if a rewind crosses
a rate change point (or possibly several!).

Another question is that should we do something to previously buffered
stream data when the rate changes. If the audio rate changes completely,
e.g. from 44.1 kHz to 8 kHz, any previously buffered audio was probably
meant to be played at 44.1 kH, but with the current code it will be
played at 8 kHz. I don't know if there are any applications that (ab)use
the dynamic rate feature this way, though. Maybe we could just document
that the dynamic rate feature is only meant for small adjustments.
--
Tanu
Alexander E. Patrakov
2014-09-07 19:05:06 UTC
Permalink
Post by Tanu Kaskinen
Post by Alexander E. Patrakov
=== On non-rewindability of the rate plugin ===
I intend to write a rewindable resampler eventually, but don't have time
now. I understand that it is an important task, but issues below (and
the dayjob which you can change by offering me a new one) have higher
priority for me. However, I want everyone to understand the following
"The resampler has to be written from scratch for the reasons explained
in http://permalink.gmane.org/gmane.linux.alsa.devel/122179 , and
similar arguments apply to all other kinds of sound processing code that
needs history."
For PulseAudio, it is also needed to figure out the desired interaction
between variable rate and rewindability. Should rewinding other than
"discard everything completely" be allowed at all on variable rate
streams when the rewind crosses the sample rate change point? I.e.,
write 100 samples, change rate, write 50 samples, rewind 100 samples,
what should be the resulting rate? Should we special-case small changes
vs big ones?
This last paragraph isn't related to the rate plugin, right?
Right.
Post by Tanu Kaskinen
So you're
talking about PulseAudio internals only?
Yes, in the last paragraph only.
Post by Tanu Kaskinen
If so, perhaps the best
approach would be to make the current stream buffer contents
non-rewindable when a rate change occurs, at least until someone points
out a real use case where it is important to be able to rewind past rate
change points in the buffer. Without any example use cases, I don't feel
qualified to answer the question what should happen if a rewind crosses
a rate change point (or possibly several!).
Another question is that should we do something to previously buffered
stream data when the rate changes. If the audio rate changes completely,
e.g. from 44.1 kHz to 8 kHz, any previously buffered audio was probably
meant to be played at 44.1 kH, but with the current code it will be
played at 8 kHz. I don't know if there are any applications that (ab)use
the dynamic rate feature this way, though. Maybe we could just document
that the dynamic rate feature is only meant for small adjustments.
Thanks for the feedback. Let's see if others say anything else on this
topic.
--
Alexander E. Patrakov
Clemens Ladisch
2014-09-07 20:51:07 UTC
Permalink
Post by Alexander E. Patrakov
The consensus is that rewind_safeguard is a workaround for an ALSA bug.
I do not agree with this consensus.
Post by Alexander E. Patrakov
This information should come from snd_pcm_rewindable() from the hw
plugin. I.e., on a hw device, it should not be equal to
snd_pcm_mmap_hw_avail(). [...]
the safeguard should be ideally equal to the granularity of hardware
pointer updates.
When snd_pcm_rewindable() is called, it uses the last reported hardware
pointer position, which is the boundary between safe-to-rewrite and
already-used data. Subtracting the amount of one pointer update step
makes it safe for the next pointer update, but it is not known how much
time will elapse until the hardware does this update. This time might
be too small for the software to have any chance, or even zero, but it
is also possible that there is still enough time to do the rewriting up
until the reported boundary. And if the software is too slow, it is
even possible that two or more pointer updates happen, which would have
required a larger safeguard.

In other words: the snd_pcm_mmap_hw_avail() value is possibly ouf of
date, but due to the real-time nature of hardware pointer updates, it is
not possible to define a safeguard that would be more correct. Writing
the buffer near the DMA pointer is always racy.

The only somewhat reliable way to determine if a rewrite is successful
is to check _afterwards_ whether the hardware pointer has advanced by
too much.
Post by Alexander E. Patrakov
[...] On cards where pointer updates happen only on interrupts, the
driver should not configure the card in such a way that one period
visible to the userspace corresponds to one interrupt. Instead, it
should always configure the card for the minimum possible period
size, and report only part of the period interrupts to period_elapsed().
Such stupid DMA controllers are typically found on mobile devices, which
cannot afford extraneous interrupts.
Post by Alexander E. Patrakov
=== On the programmer expectations ===
Some people, including at least Andrew Eikum and Clemens Ladisch, at
least once in the past expressed the opinion that amounts to "any
plugin that does not allow random access is, as far as the ALSA API is
concerned, buggy" (quoting http://permalink.gmane.org/gmane.linux.alsa.devel/122159 ),
i.e. they are maybe asking for the impossible.
This was merely a description of the current API and its implementation,
which assume that all devices/plugins have a rewritable ring buffer.
I am fully aware that this assumption is wrong.


Regards,
Clemens
Raymond Yau
2014-09-08 03:06:01 UTC
Permalink
Post by Clemens Ladisch
Post by Alexander E. Patrakov
The consensus is that rewind_safeguard is a workaround for an ALSA bug.
I do not agree with this consensus.
Post by Alexander E. Patrakov
This information should come from snd_pcm_rewindable() from the hw
plugin. I.e., on a hw device, it should not be equal to
snd_pcm_mmap_hw_avail(). [...]
the safeguard should be ideally equal to the granularity of hardware
pointer updates.
When snd_pcm_rewindable() is called, it uses the last reported hardware
pointer position, which is the boundary between safe-to-rewrite and
already-used data. Subtracting the amount of one pointer update step
makes it safe for the next pointer update, but it is not known how much
time will elapse until the hardware does this update. This time might
be too small for the software to have any chance, or even zero, but it
is also possible that there is still enough time to do the rewriting up
until the reported boundary. And if the software is too slow, it is
even possible that two or more pointer updates happen, which would have
required a larger safeguard.
In other words: the snd_pcm_mmap_hw_avail() value is possibly ouf of
date, but due to the real-time nature of hardware pointer updates, it is
not possible to define a safeguard that would be more correct. Writing
the buffer near the DMA pointer is always racy.
The only somewhat reliable way to determine if a rewrite is successful
is to check _afterwards_ whether the hardware pointer has advanced by
too much.
snd_pcm_rewindable is properly not any SAFE value for rewind, it should
return zero for those hardware which don't support rewind(e.g, non
interleaved or cs46xx)

It is just a theoretical value for rewind since the application don't
rewrite immediately

So the application need to estimate the safe guard value from the
processing time of the amount of data need to be reprocess /rewrite and
CPU speed

Should pulseaudio enable timer scheduling only on those cards which support
disable of period interrupt (e.g. hda-Intel and oxygen )?
Alexander E. Patrakov
2014-09-08 07:31:18 UTC
Permalink
Post by Clemens Ladisch
Post by Alexander E. Patrakov
The consensus is that rewind_safeguard is a workaround for an ALSA bug.
I do not agree with this consensus.
OK, disagreements are what this thread is for :)
Post by Clemens Ladisch
Post by Alexander E. Patrakov
This information should come from snd_pcm_rewindable() from the hw
plugin. I.e., on a hw device, it should not be equal to
snd_pcm_mmap_hw_avail(). [...]
the safeguard should be ideally equal to the granularity of hardware
pointer updates.
When snd_pcm_rewindable() is called, it uses the last reported hardware
pointer position, which is the boundary between safe-to-rewrite and
already-used data. Subtracting the amount of one pointer update step
makes it safe for the next pointer update, but it is not known how much
time will elapse until the hardware does this update. This time might
be too small for the software to have any chance, or even zero, but it
is also possible that there is still enough time to do the rewriting up
until the reported boundary. And if the software is too slow, it is
even possible that two or more pointer updates happen, which would have
required a larger safeguard.
I disagree with the words "boundary between safe-to-rewrite and
already-used data". They describe a point that hardware knows, but
software cannot know. My viewpoint is that, even with a perfect
scheduler, known-safe-to-rewrite and known-already-used data are not
separated with a point. See below for an example.

You indeed have a strong argument for keeping the safeguard in
PulseAudio as a protection against scheduler glitches, but look what
happens on ymfpci. Having read David's email, I understand that it is
not the same reason why the safeguard has been originally added.

On ymfpci, interrupts (that don't correspond 1:1 to periods) happen
every 5 ms, and pointer (which is a hardware register) is updated only
during interrupts by the hardware.

|---------|---------P----h----p---------|-a-------|---------|

Here each character corresponds to 0.5 ms, "-" means nothing
interesting, "|", "P" and "p" mark interrupt positions, "h" marks the
place where the card reads the sound data from (i.e. the true boundary
between safe-to-rewrite and already-used data). Thus, the interrupt at
"P" has already happened, but the interrupt at "p" hasn't. "a" is the
application write pointer.

So, what should alsa-lib return for snd_pcm_avail() and
snd_pcm_rewind()? The driver only knows that "P" is already used, can
infer that "p" isn't used yet, and knows nothing about samples in the
middle.

For snd_pcm_avail(), it can only say that the application may only write
up to "P" from the left. Everything else would be a possible lie leading
to not-yet-played data possibly being overwritten.

For snd_pcm_rewindable(), it can only say that the application may try
to go to "p" from the right. Everything further to the left is at risk
of being already-played, even with a perfect scheduler and infinitely
fast CPU.
Post by Clemens Ladisch
In other words: the snd_pcm_mmap_hw_avail() value is possibly ouf of
date, but due to the real-time nature of hardware pointer updates, it is
not possible to define a safeguard that would be more correct. Writing
the buffer near the DMA pointer is always racy.
Correct. The problem is that, on some cards, the uncertainty here
(described above) is an order of magnitude bigger than typical scheduler
glitches, and depends on sound card hardware. I don't want to introduce
an unnecessarily-big safeguard for all sound cards because of a few bad
ones.

So it looks like we need both the hardware-independent safeguard in
PulseAudio (against scheduler glitches) and the hardware-dependent
safeguard in alsa-lib (against known imprecise reporting by the card).
Post by Clemens Ladisch
The only somewhat reliable way to determine if a rewrite is successful
is to check _afterwards_ whether the hardware pointer has advanced by
too much.
PulseAudio does that in addition to the safeguard.
Post by Clemens Ladisch
Post by Alexander E. Patrakov
[...] On cards where pointer updates happen only on interrupts, the
driver should not configure the card in such a way that one period
visible to the userspace corresponds to one interrupt. Instead, it
should always configure the card for the minimum possible period
size, and report only part of the period interrupts to period_elapsed().
Such stupid DMA controllers are typically found on mobile devices, which
cannot afford extraneous interrupts.
Thanks for the information, I didn't know that, and hereby retract the
quoted proposal.

Still, to support this type of stupid DMA controllers properly, we need
a vocabulary of such weirdnesses and an agreed-upon way to communicate
them from the kernel to userspace. I don't think that I can build such
vocabulary.
Post by Clemens Ladisch
Post by Alexander E. Patrakov
=== On the programmer expectations ===
Some people, including at least Andrew Eikum and Clemens Ladisch, at
least once in the past expressed the opinion that amounts to "any
plugin that does not allow random access is, as far as the ALSA API is
concerned, buggy" (quoting http://permalink.gmane.org/gmane.linux.alsa.devel/122159 ),
i.e. they are maybe asking for the impossible.
This was merely a description of the current API and its implementation,
which assume that all devices/plugins have a rewritable ring buffer.
I am fully aware that this assumption is wrong.
OK. Now I see that in your case it was a misinterpretation from my side
- sorry for that! But this doesn't cancel the attitude "if a function is
documented without any caveats, it must always work, usefully" that I
have seen from others. So I'll put a documentation patch on my TODO list.
--
Alexander E. Patrakov
Clemens Ladisch
2014-09-09 08:43:06 UTC
Permalink
Post by Alexander E. Patrakov
|---------|---------P----h----p---------|-a-------|---------|
So, what should alsa-lib return for snd_pcm_avail() and snd_pcm_rewind()?
The driver only knows that "P" is already used, can infer that "p" isn't
used yet, and knows nothing about samples in the middle.
Indeed. However, the DMA pointer moves asynchronously, so it is possible
that it has already moved beyond p when snd_pcm_rewindable() returns.
For the samples between P and p, the risk is larger than for those after
p, but p is not a boundary where the risk abruptly decreases.

It would make sense to report the pointer update granularity, but not
to adjust the return value of snd_pcm_avail/rewindable().


Regards,
Clemens
Alexander E. Patrakov
2014-09-09 08:55:42 UTC
Permalink
Post by Clemens Ladisch
Post by Alexander E. Patrakov
|---------|---------P----h----p---------|-a-------|---------|
So, what should alsa-lib return for snd_pcm_avail() and snd_pcm_rewind()?
The driver only knows that "P" is already used, can infer that "p" isn't
used yet, and knows nothing about samples in the middle.
Indeed. However, the DMA pointer moves asynchronously, so it is possible
that it has already moved beyond p when snd_pcm_rewindable() returns.
For the samples between P and p, the risk is larger than for those after
p, but p is not a boundary where the risk abruptly decreases.
It would make sense to report the pointer update granularity, but not
to adjust the return value of snd_pcm_avail/rewindable().
OK, I understand your viewpoint, and the phrase "some indicator of the
actual rewind granularity and/or safeguard ... should be enough for PA
to be able to pick a suitable default latency" from David indicates that
he has a similar opinion.

Now the remaining question is: can the proposed heuristic (minimum
period size for a given sample rate, number of channels and sample
format) be useful as an upper-bound approximation of the pointer update
granularity for cards that are "rewindable even further than the nearest
period"?
--
Alexander E. Patrakov
David Henningsson
2014-09-09 09:08:16 UTC
Permalink
Post by Alexander E. Patrakov
Post by Clemens Ladisch
Post by Alexander E. Patrakov
|---------|---------P----h----p---------|-a-------|---------|
So, what should alsa-lib return for snd_pcm_avail() and
snd_pcm_rewind()?
The driver only knows that "P" is already used, can infer that "p" isn't
used yet, and knows nothing about samples in the middle.
Indeed. However, the DMA pointer moves asynchronously, so it is possible
that it has already moved beyond p when snd_pcm_rewindable() returns.
For the samples between P and p, the risk is larger than for those after
p, but p is not a boundary where the risk abruptly decreases.
It would make sense to report the pointer update granularity, but not
to adjust the return value of snd_pcm_avail/rewindable().
OK, I understand your viewpoint, and the phrase "some indicator of the
actual rewind granularity and/or safeguard ... should be enough for PA
to be able to pick a suitable default latency" from David indicates that
he has a similar opinion.
Now the remaining question is: can the proposed heuristic (minimum
period size for a given sample rate, number of channels and sample
format) be useful as an upper-bound approximation of the pointer update
granularity for cards that are "rewindable even further than the nearest
period"?
Aha, thanks for the explanation. Now I understand that approximation idea.
I don't know if that's a reasonable approximation, but even if it is,
how would you determine if a card actually has that pointer granularity,
or if the pointer granularity varies with period size? (I e without
actually running a stream and measure it)
--
David Henningsson, Canonical Ltd.
https://launchpad.net/~diwic
Alexander E. Patrakov
2014-09-09 09:31:46 UTC
Permalink
Post by David Henningsson
Post by Alexander E. Patrakov
Post by Clemens Ladisch
Post by Alexander E. Patrakov
|---------|---------P----h----p---------|-a-------|---------|
So, what should alsa-lib return for snd_pcm_avail() and
snd_pcm_rewind()?
The driver only knows that "P" is already used, can infer that "p" isn't
used yet, and knows nothing about samples in the middle.
Indeed. However, the DMA pointer moves asynchronously, so it is possible
that it has already moved beyond p when snd_pcm_rewindable() returns.
For the samples between P and p, the risk is larger than for those after
p, but p is not a boundary where the risk abruptly decreases.
It would make sense to report the pointer update granularity, but not
to adjust the return value of snd_pcm_avail/rewindable().
OK, I understand your viewpoint, and the phrase "some indicator of the
actual rewind granularity and/or safeguard ... should be enough for PA
to be able to pick a suitable default latency" from David indicates that
he has a similar opinion.
Now the remaining question is: can the proposed heuristic (minimum
period size for a given sample rate, number of channels and sample
format) be useful as an upper-bound approximation of the pointer update
granularity for cards that are "rewindable even further than the nearest
period"?
Aha, thanks for the explanation. Now I understand that approximation idea.
I don't know if that's a reasonable approximation, but even if it is,
how would you determine if a card actually has that pointer granularity,
or if the pointer granularity varies with period size? (I e without
actually running a stream and measure it)
Currently, as you have already said, we have no such information. This
information is, however, static for a given card model and should, in
the future, come from the kernel. Therefore:

1. We need a new flag alongside SNDRV_PCM_INFO_BATCH that kernel drivers
would set, and alsa-lib to act upon. As indicated in the following
posts, SNDRV_PCM_INFO_BATCH means a different and not-useful-here thing:

http://mailman.alsa-project.org/pipermail/alsa-devel/2014-March/073816.html
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-March/073817.html

2. We need a volunteer to crawl through kernel sources and mark drivers
that cannot report the pointer position with a better-than-one-period
granularity.

3. Until this is done, we have to either assume that all cards are good,
or that all cards are bad, or maybe misuse the SNDRV_PCM_INFO_BATCH flag
as a pessimistic approximation of what we want (and document this
approximation) if anyone thinks that such misuse will be beneficial in
the short term.

This leaves the question of "old kernel + new alsa-lib" open.
--
Alexander E. Patrakov
Raymond Yau
2014-09-21 02:02:57 UTC
Permalink
Post by Alexander E. Patrakov
Post by David Henningsson
Post by Alexander E. Patrakov
Post by Clemens Ladisch
Post by Alexander E. Patrakov
|---------|---------P----h----p---------|-a-------|---------|
So, what should alsa-lib return for snd_pcm_avail() and
snd_pcm_rewind()?
The driver only knows that "P" is already used, can infer that "p" isn't
used yet, and knows nothing about samples in the middle.
Indeed. However, the DMA pointer moves asynchronously, so it is possible
that it has already moved beyond p when snd_pcm_rewindable() returns.
For the samples between P and p, the risk is larger than for those after
p, but p is not a boundary where the risk abruptly decreases.
It would make sense to report the pointer update granularity, but not
to adjust the return value of snd_pcm_avail/rewindable().
OK, I understand your viewpoint, and the phrase "some indicator of the
actual rewind granularity and/or safeguard ... should be enough for PA
to be able to pick a suitable default latency" from David indicates that
he has a similar opinion.
Now the remaining question is: can the proposed heuristic (minimum
period size for a given sample rate, number of channels and sample
format) be useful as an upper-bound approximation of the pointer update
granularity for cards that are "rewindable even further than the nearest
period"?
Aha, thanks for the explanation. Now I understand that approximation idea.
I don't know if that's a reasonable approximation, but even if it is,
how would you determine if a card actually has that pointer granularity,
or if the pointer granularity varies with period size? (I e without
actually running a stream and measure it)
Currently, as you have already said, we have no such information. This
information is, however, static for a given card model and should, in the
Post by Alexander E. Patrakov
1. We need a new flag alongside SNDRV_PCM_INFO_BATCH that kernel drivers
would set, and alsa-lib to act upon. As indicated in the following posts,
SNDRV_PCM_INFO_BATCH means a different and not-useful-here thing:
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-March/073816.html
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-March/073817.html
Post by Alexander E. Patrakov
2. We need a volunteer to crawl through kernel sources and mark drivers
that cannot report the pointer position with a better-than-one-period
granularity.
Post by Alexander E. Patrakov
3. Until this is done, we have to either assume that all cards are good,
or that all cards are bad, or maybe misuse the SNDRV_PCM_INFO_BATCH flag as
a pessimistic approximation of what we want (and document this
approximation) if anyone thinks that such misuse will be beneficial in the
short term.
Post by Alexander E. Patrakov
This leaves the question of "old kernel + new alsa-lib" open.
--
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/soc/soc-generic-dmaengine-pcm.c?id=478028e088d6a94666d8a776be2cd2291faf3bbd

a) Set the SNDRV_PCM_INFO_BATCH if the granularity is per period or worse.
b) Fallback to the (race condition prone) period counting if the driver
does not support any residue reporting.

Seem soc already have this granularity

How can the granularity worse more than one period ?

https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/tree/include/linux/dmaengine.h

enum dma_residue_granularity {
DMA_RESIDUE_GRANULARITY_DESCRIPTOR = 0,
DMA_RESIDUE_GRANULARITY_SEGMENT = 1,
DMA_RESIDUE_GRANULARITY_BURST = 2,
};

There are three type of granularity

Does this mean the those sound card can report
DMA_RESIDUE_GRANULARITY_BURST and driver use readl in pcm pointer callback ?

A few PCI sound cards use SG buffer including hda

It seem that pulseaudio expect the driver support
DMA_RESIDUE_GRANULARITY_BURST for rewind/ timer scheduling

/**
* enum dma_residue_granularity - Granularity of the reported transfer
residue
* @DMA_RESIDUE_GRANULARITY_DESCRIPTOR: Residue reporting is not support.
The DMA channel is only able to tell whether a descriptor has been
completed or not, which means residue reporting is not supported by this
channel. The residue field of the dma_tx_state field will always be 0.
* @DMA_RESIDUE_GRANULARITY_SEGMENT: Residue is updated after each
successfully completed segment of the transfer (For cyclic transfers this
is after each period). This is typically implemented by having the
hardware generate an interrupt after each transferred segment and then the
drivers updates the outstanding residue by the size of the segment. Another
possibility is if the hardware supports scatter-gather and the segment
descriptor has a field which gets set after the segment has been
completed. The driver then counts the number of segments without the flag
set to compute the residue.
* @DMA_RESIDUE_GRANULARITY_BURST: Residue is updated after each transferred
burst. This is typically only supported if the hardware has a progress
register of some sort (E.g. a register with the current read/write address
or a register with the amount of bursts/beats/bytes that have been
transferred or still need to be transferred).
*/
Lars-Peter Clausen
2014-09-22 13:20:25 UTC
Permalink
On 09/21/2014 04:02 AM, Raymond Yau wrote:
[...]
Post by Raymond Yau
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/soc/soc-generic-dmaengine-pcm.c?id=478028e088d6a94666d8a776be2cd2291faf3bbd
a) Set the SNDRV_PCM_INFO_BATCH if the granularity is per period or worse.
b) Fallback to the (race condition prone) period counting if the driver
does not support any residue reporting.
Seem soc already have this granularity
How can the granularity worse more than one period ?
The granularity is the granularity that the DMA driver is able to report. In
some cases the DMA driver is only able to tell whether a transfer has
finished or not, but is not able tell any intermediate position. In case of
a cyclic transfer the transfer never stops, so the DMA driver will only ever
tell us that it hasn't finished the transfer yet. But the DMA driver will
still generate a interrupt after every finished period. So we use this as a
fallback mechanism and simply increase the pointer position after each
interrupt by one period size. So userspace will never see a granularity that
is worse than one descriptor. This is just something kernel internal.

On a side node, this period counting scheme is prone to race conditions and
we strongly discourage new drivers from using it. Its mainly for supporting
old DMA drivers which do not implement progress reporting (yet).
Post by Raymond Yau
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/tree/include/linux/dmaengine.h
enum dma_residue_granularity {
DMA_RESIDUE_GRANULARITY_DESCRIPTOR = 0,
DMA_RESIDUE_GRANULARITY_SEGMENT = 1,
DMA_RESIDUE_GRANULARITY_BURST = 2,
};
There are three type of granularity
Only SEGMENT and BURST granularity are relevant for audio userspace.

For ALSA we'd probably see different kind of categories of granularity
though. E.g. one field which is the unit of the granularity

PERIOD: The granularity is a multiple of the period size
FRAME: The granularity is a multiple of the frame size
BYTE: The granularity is a fixed number of bytes
MS: The granularity is a fixed amount of time

and then a second field that has a integer number of the granularity in that
unit.


And then there is of course the issue that for some chips you can trade
granularity for e.g. power efficiency or similar (e.g. by changing the
transfer burst size). And this needs to be modeled somehow as well.
Post by Raymond Yau
Does this mean the those sound card can report
DMA_RESIDUE_GRANULARITY_BURST and driver use readl in pcm pointer callback ?
A few PCI sound cards use SG buffer including hda
It seem that pulseaudio expect the driver support
DMA_RESIDUE_GRANULARITY_BURST for rewind/ timer scheduling
Yes. This is why we set the BATCH flag if the granularity is not
DMA_RESIDUE_GRANULARITY_BURST so for example pulse audio can disable timer
scheduling.

- Lars
Alexander E. Patrakov
2014-09-22 13:36:55 UTC
Permalink
Post by Lars-Peter Clausen
Post by Raymond Yau
Does this mean the those sound card can report
DMA_RESIDUE_GRANULARITY_BURST and driver use readl in pcm pointer callback ?
A few PCI sound cards use SG buffer including hda
It seem that pulseaudio expect the driver support
DMA_RESIDUE_GRANULARITY_BURST for rewind/ timer scheduling
Yes. This is why we set the BATCH flag if the granularity is not
DMA_RESIDUE_GRANULARITY_BURST so for example pulse audio can disable
timer scheduling.
For the record, disabling timer-based scheduling is IMHO a matter of
further discussion. As long as there is enough safeguard, I think that
timer-based scheduling can still be used, and is useful. A living proof
is the whole story with the snd-usb-audio driver where (justified)
addition of the BATCH flag was perceived as a performance regression and
not as a fix to some real and obvious problem.

I do agree that such devices need to be marked up with the BATCH flag,
so that PulseAudio chooses reasonable hardware parameters.
--
Alexander E. Patrakov
Lars-Peter Clausen
2014-09-22 13:44:08 UTC
Permalink
Post by Lars-Peter Clausen
Post by Raymond Yau
Does this mean the those sound card can report
DMA_RESIDUE_GRANULARITY_BURST and driver use readl in pcm pointer callback ?
A few PCI sound cards use SG buffer including hda
It seem that pulseaudio expect the driver support
DMA_RESIDUE_GRANULARITY_BURST for rewind/ timer scheduling
Yes. This is why we set the BATCH flag if the granularity is not
DMA_RESIDUE_GRANULARITY_BURST so for example pulse audio can disable
timer scheduling.
For the record, disabling timer-based scheduling is IMHO a matter of further
discussion. As long as there is enough safeguard, I think that timer-based
scheduling can still be used, and is useful. A living proof is the whole
story with the snd-usb-audio driver where (justified) addition of the BATCH
flag was perceived as a performance regression and not as a fix to some real
and obvious problem.
I do agree that such devices need to be marked up with the BATCH flag, so
that PulseAudio chooses reasonable hardware parameters.
When I tried things it looks like it was more than just the safeguard
threshold. It looked as if the algorithm that adjusts the wake-up time gets
confused if the granularity is not good enough and just wakes up at the
wrong point which leads to glitches. This can probably be fixed though by
modifying the algorithm to take the granularity into account.

- Lars
Raymond Yau
2014-09-23 08:29:55 UTC
Permalink
Post by Alexander E. Patrakov
Post by Lars-Peter Clausen
Post by Raymond Yau
Does this mean the those sound card can report
DMA_RESIDUE_GRANULARITY_BURST and driver use readl in pcm pointer callback ?
A few PCI sound cards use SG buffer including hda
It seem that pulseaudio expect the driver support
DMA_RESIDUE_GRANULARITY_BURST for rewind/ timer scheduling
Yes. This is why we set the BATCH flag if the granularity is not
DMA_RESIDUE_GRANULARITY_BURST so for example pulse audio can disable
timer scheduling.
The resolution of pulseaudio volume is higher than the number of steps of
the hardware volume control, this mean any volume change by user force
pulseaudio to rewind because of the change in software volume

As user won't expect the volume change is delayed by one second

Those drivers should not use 2 periods as graunulaity is one period which
is 170ms ~1 second if you are running video conference (e.g. Google
hangout) when video is 15~30 frames per second

The safeguard can only be decreased by reduce the period size

Is it feasible for pulseaudio to use more periods with suitable period
size/time for the requested latency when there is one and only one client
when the sound card cannot provide precise DMA position instead of maximum
period size/time ?
Post by Alexander E. Patrakov
For the record, disabling timer-based scheduling is IMHO a matter of
further discussion. As long as there is enough safeguard, I think that
timer-based scheduling can still be used, and is useful. A living proof is
the whole story with the snd-usb-audio driver where (justified) addition of
the BATCH flag was perceived as a performance regression and not as a fix
to some real and obvious problem.

The point is some drivers use .periods_min = 1

Pulseaudio select minimum number of period

interrupt driven mode should not use one period per buffer since
granularity is one period

one period per buffer work only when sound card can report granularity
which is less than one period

aplay won't use one period per buffer , default is four but fallback to two
when driver use .periods_max = 2

http://0pointer.net/blog/projects/pulse-glitch-free.html

NFRAGS must >= 2
Alexander E. Patrakov
2014-09-23 10:22:54 UTC
Permalink
Post by Raymond Yau
Post by Alexander E. Patrakov
Post by Lars-Peter Clausen
Post by Raymond Yau
Does this mean the those sound card can report
DMA_RESIDUE_GRANULARITY_BURST and driver use readl in pcm pointer callback ?
A few PCI sound cards use SG buffer including hda
It seem that pulseaudio expect the driver support
DMA_RESIDUE_GRANULARITY_BURST for rewind/ timer scheduling
Yes. This is why we set the BATCH flag if the granularity is not
DMA_RESIDUE_GRANULARITY_BURST so for example pulse audio can disable
timer scheduling.
The resolution of pulseaudio volume is higher than the number of steps
of the hardware volume control, this mean any volume change by user
force pulseaudio to rewind because of the change in software volume
As user won't expect the volume change is delayed by one second
Absolutely correct, and a similar thing was already discussed. The
interesting part of the discussion starts here:

http://lists.freedesktop.org/archives/pulseaudio-discuss/2014-April/020462.html

Please disregard my "factor of 1000" statement - it is no longer true.
Post by Raymond Yau
Those drivers should not use 2 periods as graunulaity is one period
which is 170ms ~1 second if you are running video conference (e.g.
Google hangout) when video is 15~30 frames per second
The safeguard can only be decreased by reduce the period size
Also correct for the BATCH cards. However, please note that for the
BATCH cards PulseAudio looks at the default-fragment-size-msec setting
from daemon.conf by default.
Post by Raymond Yau
Is it feasible for pulseaudio to use more periods with suitable period
size/time for the requested latency when there is one and only one
client when the sound card cannot provide precise DMA position instead
of maximum period size/time ?
Probably not, because the rewinds are exposed in the public PulseAudio
APIs (in particular, via the last two parameters of pa_stream_write()).
So it definitely should not use the maximum period time, but should also
not derive one from the client-requested latency. A hard-coded default
or a default from the config (i.e. the current situation) is therefore
as good as one can get on BATCH cards regarding the period size.
Post by Raymond Yau
Post by Alexander E. Patrakov
For the record, disabling timer-based scheduling is IMHO a matter of
further discussion. As long as there is enough safeguard, I think that
timer-based scheduling can still be used, and is useful. A living proof
is the whole story with the snd-usb-audio driver where (justified)
addition of the BATCH flag was perceived as a performance regression and
not as a fix to some real and obvious problem.
The point is some drivers use .periods_min = 1
Pulseaudio select minimum number of period
It does not do that on BATCH cards. Or if it does, it is a bug.

As for the rest of your arguments, you are just stating obvious and
correct things, so I see no point in quoting them.

Indeed, most of the work on PulseAudio side would mean choosing the
correct period size, number of periods, wakeup threshold and rewind
granularity for each possible situation. I should just do it when the
needed API appears on the ALSA side :)
--
Alexander E. Patrakov
Clemens Ladisch
2014-09-09 13:45:35 UTC
Permalink
can the proposed heuristic (minimum period size for a given sample
rate, number of channels and sample format) be useful as an upper-
bound approximation of the pointer update granularity for cards that
are "rewindable even further than the nearest period"?
No; USB precsion depends on the URB size, which is roughly proportional
to (but smaller than) the period size.


Regards,
Clemens
Alexander E. Patrakov
2014-09-09 15:55:05 UTC
Permalink
Post by Clemens Ladisch
can the proposed heuristic (minimum period size for a given sample
rate, number of channels and sample format) be useful as an upper-
bound approximation of the pointer update granularity for cards that
are "rewindable even further than the nearest period"?
No; USB precsion depends on the URB size, which is roughly proportional
to (but smaller than) the period size.
Thanks for the explanation. It was very useful.

At this point, I am tempted to not express this "less than the period
size but depends on it" rule, and classify USB audio devices as
"Rewindable down to the period size", even though it is a pessimization.

Also (sorry for the off-topic) this explanation completely crystallizes
my opinion about possible misuse of the SNDRV_PCM_INFO_BATCH flag in the
snd-usb-audio driver or PulseAudio. My opinion now is: there is no
misuse on either side.
--
Alexander E. Patrakov
Takashi Iwai
2014-09-09 16:09:32 UTC
Permalink
At Tue, 09 Sep 2014 21:55:05 +0600,
Post by Alexander E. Patrakov
Post by Clemens Ladisch
can the proposed heuristic (minimum period size for a given sample
rate, number of channels and sample format) be useful as an upper-
bound approximation of the pointer update granularity for cards that
are "rewindable even further than the nearest period"?
No; USB precsion depends on the URB size, which is roughly proportional
to (but smaller than) the period size.
Thanks for the explanation. It was very useful.
At this point, I am tempted to not express this "less than the period
size but depends on it" rule, and classify USB audio devices as
"Rewindable down to the period size", even though it is a pessimization.
Also (sorry for the off-topic) this explanation completely crystallizes
my opinion about possible misuse of the SNDRV_PCM_INFO_BATCH flag in the
snd-usb-audio driver or PulseAudio. My opinion now is: there is no
misuse on either side.
Well, it's not entirely a "misuse" in either side. The flag is merely
a hint, and the current attitude of PA is in a safer side of the
behavior this flag may indicate. Whether it's the best choice is
another question, of course.

After all, we certainly need some API to expose the granularity. This
has been asked since long time ago, but it wasn't done simply because
the demands (and specifications) haven't been clarified.

Also, for rewinding, we may need to consider about the FIFO or other
things on top of signal processing path. If the granularity defines
the granularity of the hwptr *update*, it wouldn't be enough for
defining a safe guard range. Currently, most of hardware have a small
amount of FIFO that can be mostly ignored, fortunately, though.


Takashi
David Henningsson
2014-09-07 23:12:52 UTC
Permalink
Post by Alexander E. Patrakov
Hello.
(TL;DR: nothing really new except the strawman proposal about threads
and the note about interaction of variable sample rate with rewindability)
As Takashi Iwai told me that the audio miniconference is for discussion
only, and not for presentation of anything, I guess that I need to
present the plans and options now. That's why this e-mail. The goal
here, besides merely presenting the plan, is to identify points that
everyone agrees upon, so that they are not discussed pointlessly at the
miniconference. Also, this e-mail serves as a justification for the
pending seemingly-destructive work.
First, the status quo. If anyone disagrees with the facts below, please
complain loudly, before I make any conclusions!
1. PulseAudio does not call snd_pcm_rewindable(), because for some ALSA
plugins it crashed. This crash is completely fixed in alsa-lib 1.0.28,
but in some cases snd_pcm_rewindable() still returns wrong results.
I don't know if we ever tried snd_pcm_rewindable() in PulseAudio.
Post by Alexander E. Patrakov
2. PulseAudio blindly assumes that it can rewind up to hwbuf_frames -
(snd_pcm_avail() + rewind_safeguard) frames. The rewind safeguard is
needed due to reasons that I don't completely understand, but one of
them is imprecise reporting of the hardware pointer, and another one is
that the hardware transfers several bytes at a time, and the bytes we
need to overwrite may be already cached by the hardware.
Pierre added the rewind safeguard due to DMA controller problems. IIRC,
some DMA controllers go nuts (such as breaking the stream, causing
interrupt storms, or something else seriously buggy) when trying to
write to data that the DMA controller is just about to transfer. Pierre
(now cc:ed) would know more about this than I do, though.

In my world, since this is a very hardware near problem, ALSA rather
than PulseAudio should take these kinds of problems into account when
reporting back snd_pcm_rewindable() so PulseAudio does not have to.
--
David Henningsson, Canonical Ltd.
https://launchpad.net/~diwic
Pierre-Louis Bossart
2014-09-09 19:56:09 UTC
Permalink
Post by David Henningsson
Post by Alexander E. Patrakov
2. PulseAudio blindly assumes that it can rewind up to hwbuf_frames -
(snd_pcm_avail() + rewind_safeguard) frames. The rewind safeguard is
needed due to reasons that I don't completely understand, but one of
them is imprecise reporting of the hardware pointer, and another one is
that the hardware transfers several bytes at a time, and the bytes we
need to overwrite may be already cached by the hardware.
Pierre added the rewind safeguard due to DMA controller problems. IIRC,
some DMA controllers go nuts (such as breaking the stream, causing
interrupt storms, or something else seriously buggy) when trying to
write to data that the DMA controller is just about to transfer. Pierre
(now cc:ed) would know more about this than I do, though.
In my world, since this is a very hardware near problem, ALSA rather
than PulseAudio should take these kinds of problems into account when
reporting back snd_pcm_rewindable() so PulseAudio does not have to.
Sorry to chime-in late on this, my real job keeps me busy.
I would like to highlight that there is a fundamental conflict between
requirements here.
- for power consumption optimization, you want to let the hardware
prefetch audio samples opportunistically and buffer as much as possible
as close as possible to the serial link. Given the pressure on power
optimizations these days, no one should expect the hardware buffering to
reduce - and this buffering could vary depending on the platform power
modes with the position of the DMA read pointer becoming less predictable.
- for low-latency and user interaction, you want to rewind the write
pointer as much as possible.

The question is 'how much'. The reason why i introduced this
'rewind_safeguard' was to factor in a good-enough latency that no one
would ever complain about and yet large enough to avoid using stale data.
We could ask drivers to provide a conservative estimate of this
safeguard, but asking for a dynamic query of a safe position or an
empirical determination based on the min period size is really asking
for trouble. At a given point, if you really need very low-latency, i.e.
sub ms, you will need a dedicated configuration that probably doesn't
rely on rewinds and you probably don't use PulseAudio in the first place.
Alexander E. Patrakov
2014-09-10 05:38:06 UTC
Permalink
Post by Pierre-Louis Bossart
Post by David Henningsson
Post by Alexander E. Patrakov
2. PulseAudio blindly assumes that it can rewind up to hwbuf_frames -
(snd_pcm_avail() + rewind_safeguard) frames. The rewind safeguard is
needed due to reasons that I don't completely understand, but one of
them is imprecise reporting of the hardware pointer, and another one is
that the hardware transfers several bytes at a time, and the bytes we
need to overwrite may be already cached by the hardware.
Pierre added the rewind safeguard due to DMA controller problems. IIRC,
some DMA controllers go nuts (such as breaking the stream, causing
interrupt storms, or something else seriously buggy) when trying to
write to data that the DMA controller is just about to transfer. Pierre
(now cc:ed) would know more about this than I do, though.
In my world, since this is a very hardware near problem, ALSA rather
than PulseAudio should take these kinds of problems into account when
reporting back snd_pcm_rewindable() so PulseAudio does not have to.
Sorry to chime-in late on this, my real job keeps me busy.
Better late than never.
Post by Pierre-Louis Bossart
I would like to highlight that there is a fundamental conflict between
requirements here.
- for power consumption optimization, you want to let the hardware
prefetch audio samples opportunistically and buffer as much as possible
as close as possible to the serial link. Given the pressure on power
optimizations these days, no one should expect the hardware buffering to
reduce - and this buffering could vary depending on the platform power
modes with the position of the DMA read pointer becoming less predictable.
- for low-latency and user interaction, you want to rewind the write
pointer as much as possible.
Spot-on.
Post by Pierre-Louis Bossart
The question is 'how much'. The reason why i introduced this
'rewind_safeguard' was to factor in a good-enough latency that no one
would ever complain about and yet large enough to avoid using stale data.
We could ask drivers to provide a conservative estimate of this
safeguard, but asking for a dynamic query of a safe position or an
empirical determination based on the min period size is really asking
for trouble. At a given point, if you really need very low-latency, i.e.
sub ms, you will need a dedicated configuration that probably doesn't
rely on rewinds and you probably don't use PulseAudio in the first place.
Well, now there are two people criticizing the heuristic. I would like
to retract it, but I currently can't, because there are no other
constructive proposals.

The best solution would indeed be if a driver provides a conservative
estimate of the rewind safeguard, but the problem is that nobody will
add this to old drivers for no-longer-manufactured hardware. So we
either need some other heuristic here, or we have to allow rewinding
further than the nearest period only on cards where drivers explicitly
say OK by providing a safeguard value. But the later proposal kills
power-efficient PulseAudio on old kernels that simply don't know how to
provide this information. Surely you won't like it.
--
Alexander E. Patrakov
Lars-Peter Clausen
2014-09-08 07:34:07 UTC
Permalink
On 09/07/2014 05:16 PM, Alexander E. Patrakov wrote:
[...]
Post by Alexander E. Patrakov
2. PulseAudio blindly assumes that it can rewind up to hwbuf_frames -
(snd_pcm_avail() + rewind_safeguard) frames. The rewind safeguard is needed
due to reasons that I don't completely understand, but one of them is
imprecise reporting of the hardware pointer, and another one is that the
hardware transfers several bytes at a time, and the bytes we need to
overwrite may be already cached by the hardware.
Also at the time you will process the result form snd_pcm_avail() will be
more than what was reported as the audio DMA keeps going in the background.

[...]
Post by Alexander E. Patrakov
For the allegedly existing cards where the pointer granularity is always the
same as the period size, Raymond's method will not work. However, I don't
know any concrete example of such card (I didn't look at snd-firewire in
detail, though). And in any case, my opinion is that this "pointer
granularity = period size" limitation on such cards can be treated as a
driver bug that needs to be fixed (by someone who knows the driver, not by
me). On cards where pointer updates happen only on interrupts, the driver
should not configure the card in such a way that one period visible to the
userspace corresponds to one interrupt. Instead, it should always configure
the card for the minimum possible period size, and report only part of the
period interrupts to period_elapsed(). I.e.: userspace asked for 2 periods
64 ms each, but let's configure the card for, say, 64 periods 2 ms each, and
use the "extra" interrupts only for updating the pointer.
There are quite a few ASoC drivers that do have this restriction. For some
of the drivers the restriction comes from the hardware itself for others it
is while the hardware in theory supports it nobody bothered so far to
implement this.

I do not agree that this should be treated as a driver bug and that those
drivers should increase the interrupt rate. Mainly because this is policy
and the interrupt rate is configured by the application by setting the
period size and the driver should not have to guess what kind of latency
requirements the application has. If the application has lower latency
requirements it needs to set a smaller period size. Unnecessarily increasing
the interrupt rate has for example has a impact on power consumption which
is something you'd like to keep low on battery powered devices.

We should though expose the granularity with which the pointer is updated to
userspace so it can make educated decisions on how to configure the period size.

- Lars
David Henningsson
2014-09-08 07:59:00 UTC
Permalink
Post by Alexander E. Patrakov
Hello.
(TL;DR: nothing really new except the strawman proposal about threads
and the note about interaction of variable sample rate with rewindability)
So, having looked this through another time, it looks like we have three
categories of ALSA devices, from rewindability point of view:

1) Not rewindable at all.

2) Rewindable down to the period size.

3) Rewindable even further than the nearest period, down to DMA
transfer sizes or something else. This also requires the .pointer to
have better granularity than the period size.

So I think any is_seekable() call or flag should indicate whether the
device is case 1), 2) or 3). And for case 3) perhaps also some indicator
of the actual rewind granularity and/or safeguard. This should be enough
for PA to be able to pick a suitable default latency.

Case 1) is simple. Just let snd_pcm_rewindable return 0 and
snd_pcm_rewind fail. If it doesn't, just fix it. (The extplug problem
could be solved by PA having ifdefs depending on alsa-lib version,
rather than making snd_pcm_rewind and snd_pcm_rewindable behave
inconsistent.)

For case 2) you seem to suggest to emulate case 3) by using either a
low-latency thread, or by increasing the number of interrupts from the
hardware. Either method will inevitably increase power consumption, and
the former might also increase the risk of glitches. Therefore I think
this is replacing something bad with something worse, because I would
value low power consumption higher than better rewinding.

Could we enable this functionality by explicit request by the
application? Probably. E g, if the application sets a low period size
but also sets the "disable period interrupts" flag, that could be an
indicator that it wants lots of interrupts just to update the pointer,
but nothing else. Maybe that's even the behaviour today (haven't checked).

The low-latency thread approach could be implemented by a separate
ioplug layer, so that people who want it could open
"flexiblerewind:plug:hw:0" instead of "plug:hw:0".

However, with my PA hat on, I would still say no to having PA use either
of those by default, for power consumption reasons.

With that in mind I suggest, just as you, that we add
.rewind/.rewindable callbacks to the ioplug layer. Any ioplug using
.transfer needs to implement that if it wants to support rewinding,
otherwise that ioplug would fall into category 1).

Does that make sense? It was a long email, so I might have missed
something. :-)
--
David Henningsson, Canonical Ltd.
https://launchpad.net/~diwic
Alexander E. Patrakov
2014-09-08 08:46:54 UTC
Permalink
08.09.2014 13:59, David Henningsson wrote:

(First of all, a big thanks for being the first to talk constructively
about the important problem of rewindability classes!)
Post by David Henningsson
Post by Alexander E. Patrakov
Hello.
(TL;DR: nothing really new except the strawman proposal about threads
and the note about interaction of variable sample rate with
rewindability)
So, having looked this through another time, it looks like we have three
1) Not rewindable at all.
2) Rewindable down to the period size.
3) Rewindable even further than the nearest period, down to DMA
transfer sizes or something else. This also requires the .pointer to
have better granularity than the period size.
So I think any is_seekable() call or flag should indicate whether the
device is case 1), 2) or 3). And for case 3) perhaps also some indicator
of the actual rewind granularity and/or safeguard. This should be enough
for PA to be able to pick a suitable default latency.
So far, I agree. The big question now is: do we have enough correct data
from the kernel to decide between cases 2 and 3? I am definitely not
qualified enough to answer this or to add such interface, though.
Post by David Henningsson
Case 1) is simple. Just let snd_pcm_rewindable return 0 and
snd_pcm_rewind fail. If it doesn't, just fix it. (The extplug problem
could be solved by PA having ifdefs depending on alsa-lib version,
rather than making snd_pcm_rewind and snd_pcm_rewindable behave
inconsistent.)
OK. But I don't think that the ifdef proposal is sufficient. If I
understand correctly, it only covers the "new PA on old alsa-lib"
problem, because we can't add these ifdefs to old PA. That's still
progress! But the "keep the old implementation of rewind" was for the
opposite problem - old PA on new alsa-lib which rightfully refuses to
rewind.
Post by David Henningsson
For case 2) you seem to suggest to emulate case 3) by using either a
low-latency thread, or by increasing the number of interrupts from the
hardware. Either method will inevitably increase power consumption, and
the former might also increase the risk of glitches. Therefore I think
this is replacing something bad with something worse, because I would
value low power consumption higher than better rewinding.
There is a slight misunderstanding above. I indeed proposed (and
retracted the proposal when I knew that it happens on mobile devices) to
emulate case 3 in case 2 by increasing the number of interrupts from the
hardware. The proposal of a low-latency background thread was about
emulating case 3 on case 1, and, so far, I have not retracted it, but I
may do so in the future.

As for power consumption increase, the argument is essentially a
duplicate of the phrase "the downside is that all programs (not only
those intending to do rewinds) now pay the cost of the background
low-latency thread" from my original mail. In other words, I partially
agree. This, if enabled unconditionally, indeed would lead to
unnecessary power consumption increase for those applications that don't
use rewinds. But for applications that want low latency or dynamic
latency, there is no other way to get it, so it is wrong to talk about
power consumption increase for such applications.
Post by David Henningsson
Could we enable this functionality by explicit request by the
application? Probably. E g, if the application sets a low period size
but also sets the "disable period interrupts" flag, that could be an
indicator that it wants lots of interrupts just to update the pointer,
but nothing else. Maybe that's even the behaviour today (haven't checked).
A very good idea.
Post by David Henningsson
The low-latency thread approach could be implemented by a separate
ioplug layer, so that people who want it could open
"flexiblerewind:plug:hw:0" instead of "plug:hw:0".
Also a good idea.
Post by David Henningsson
However, with my PA hat on, I would still say no to having PA use either
of those by default, for power consumption reasons.
I would also say no, but for a different reason (because, as I wrote
above, the power-consumption reason is valid only for software that
never wants to rewind and does not need low/dynamic latency, and PA does
not fall into this category). My argument is that, in the case of
PulseAudio, these emulations would just create an unnecessary layer of
complexity. PulseAudio actively wants to know the device type (via the
is_seekable() call), so it is reasonable for it to also deal with
consequences, without the need for additional processing in the driver
or for an extra thread.
Post by David Henningsson
With that in mind I suggest, just as you, that we add
.rewind/.rewindable callbacks to the ioplug layer. Any ioplug using
.transfer needs to implement that if it wants to support rewinding,
otherwise that ioplug would fall into category 1).
Does that make sense? It was a long email, so I might have missed
something. :-)
In fact, _I_ missed something when writing the original e-mail. I forgot
to address your phrase from
Post by David Henningsson
I understand that you have a mathematically perfect approach to this, as
well as other algorithms. This would indeed be the best goal, but given
an imperfect world, where we're forced to choose between
1) no rewinding at all
2) imperfect rewinding in the sense that it sometimes can produce
hearable artifacts
...I'm not sure 1) is always the right choice...
Should we have a snd_pcm_open (or whatever else) flag for accepting
imperfect rewind results? [anyway, I would say "no" to using it in
PulseAudio]
--
Alexander E. Patrakov
David Henningsson
2014-09-08 09:26:31 UTC
Permalink
Post by Alexander E. Patrakov
(First of all, a big thanks for being the first to talk constructively
about the important problem of rewindability classes!)
Post by David Henningsson
Post by Alexander E. Patrakov
Hello.
(TL;DR: nothing really new except the strawman proposal about threads
and the note about interaction of variable sample rate with
rewindability)
So, having looked this through another time, it looks like we have three
1) Not rewindable at all.
2) Rewindable down to the period size.
3) Rewindable even further than the nearest period, down to DMA
transfer sizes or something else. This also requires the .pointer to
have better granularity than the period size.
So I think any is_seekable() call or flag should indicate whether the
device is case 1), 2) or 3). And for case 3) perhaps also some indicator
of the actual rewind granularity and/or safeguard. This should be enough
for PA to be able to pick a suitable default latency.
So far, I agree. The big question now is: do we have enough correct data
from the kernel to decide between cases 2 and 3? I am definitely not
qualified enough to answer this or to add such interface, though.
I'm quite certain that we don't have enough information coming through
from the kernel. It's something we need to add if we end up deciding to
implement hw_params_is_seekable.
Post by Alexander E. Patrakov
Post by David Henningsson
Case 1) is simple. Just let snd_pcm_rewindable return 0 and
snd_pcm_rewind fail. If it doesn't, just fix it. (The extplug problem
could be solved by PA having ifdefs depending on alsa-lib version,
rather than making snd_pcm_rewind and snd_pcm_rewindable behave
inconsistent.)
OK. But I don't think that the ifdef proposal is sufficient. If I
understand correctly, it only covers the "new PA on old alsa-lib"
problem, because we can't add these ifdefs to old PA. That's still
progress! But the "keep the old implementation of rewind" was for the
opposite problem - old PA on new alsa-lib which rightfully refuses to
rewind.
Post by David Henningsson
For case 2) you seem to suggest to emulate case 3) by using either a
low-latency thread, or by increasing the number of interrupts from the
hardware. Either method will inevitably increase power consumption, and
the former might also increase the risk of glitches. Therefore I think
this is replacing something bad with something worse, because I would
value low power consumption higher than better rewinding.
There is a slight misunderstanding above. I indeed proposed (and
retracted the proposal when I knew that it happens on mobile devices) to
emulate case 3 in case 2 by increasing the number of interrupts from the
hardware. The proposal of a low-latency background thread was about
emulating case 3 on case 1, and, so far, I have not retracted it, but I
may do so in the future.
Right. I guess the low-latency background thread can be used to emulate
3) on top of both 1) and 2).
Post by Alexander E. Patrakov
This, if enabled unconditionally, indeed would lead to
unnecessary power consumption increase for those applications that don't
use rewinds. But for applications that want low latency or dynamic
latency, there is no other way to get it, so it is wrong to talk about
power consumption increase for such applications.
Uhm, is the set of applications that want "low or dynamic latency"
necessarily the same as the set that want rewinds?

But that might be more of an academic question, I guess PA is the only
one that does rewinds currently. Or are there more apps out there that
do rewinds?

I'm not sure I buy this point completely, but I can't come up with a
good counterexample either, so maybe you're right.
Post by Alexander E. Patrakov
My argument is that, in the case of
PulseAudio, these emulations would just create an unnecessary layer of
complexity. PulseAudio actively wants to know the device type (via the
is_seekable() call), so it is reasonable for it to also deal with
consequences, without the need for additional processing in the driver
or for an extra thread.
Agreed.
Post by Alexander E. Patrakov
Post by David Henningsson
I understand that you have a mathematically perfect approach to this, as
well as other algorithms. This would indeed be the best goal, but given
an imperfect world, where we're forced to choose between
1) no rewinding at all
2) imperfect rewinding in the sense that it sometimes can produce
hearable artifacts
...I'm not sure 1) is always the right choice...
Should we have a snd_pcm_open (or whatever else) flag for accepting
imperfect rewind results? [anyway, I would say "no" to using it in
PulseAudio]
Well, if we had one, and it defaulted to the old behaviour, I suppose
that would solve the new alsa-lib on old PA problem? But I'm not sure
it's worth the added complexity.
--
David Henningsson, Canonical Ltd.
https://launchpad.net/~diwic
Alexander E. Patrakov
2014-09-08 10:21:02 UTC
Permalink
Post by David Henningsson
Post by Alexander E. Patrakov
(First of all, a big thanks for being the first to talk constructively
about the important problem of rewindability classes!)
Post by David Henningsson
Post by Alexander E. Patrakov
Hello.
(TL;DR: nothing really new except the strawman proposal about threads
and the note about interaction of variable sample rate with
rewindability)
So, having looked this through another time, it looks like we have three
1) Not rewindable at all.
2) Rewindable down to the period size.
3) Rewindable even further than the nearest period, down to DMA
transfer sizes or something else. This also requires the .pointer to
have better granularity than the period size.
So I think any is_seekable() call or flag should indicate whether the
device is case 1), 2) or 3). And for case 3) perhaps also some indicator
of the actual rewind granularity and/or safeguard. This should be enough
for PA to be able to pick a suitable default latency.
So far, I agree. The big question now is: do we have enough correct data
from the kernel to decide between cases 2 and 3? I am definitely not
qualified enough to answer this or to add such interface, though.
I'm quite certain that we don't have enough information coming through
from the kernel. It's something we need to add if we end up deciding to
implement hw_params_is_seekable.
OK, but, given that we need this interface in order to avoid seeking on
the DTS encoder (and to avoid creating a big buffer on it), I'd rather
have this sooner than later. As a compromise, let's design the API with
case 2 in mind, but only return cases 1 and 3 until we have enough
information from the kernel to detect case 2 reliably. And, unless
someone sends out a constructive proposal, let's defer the discussion of
case 2 detection to the miniconference.

(And users do complain about rewinding on dcaenc, because this rewind
creates one or two invalid DTS frames in the middle of every volume
change, and some receivers are slow to resynchronize).
Post by David Henningsson
Post by Alexander E. Patrakov
This, if enabled unconditionally, indeed would lead to
unnecessary power consumption increase for those applications that don't
use rewinds. But for applications that want low latency or dynamic
latency, there is no other way to get it, so it is wrong to talk about
power consumption increase for such applications.
Uhm, is the set of applications that want "low or dynamic latency"
necessarily the same as the set that want rewinds?
It is a superset. Applications that want low latency don't necessarily
want rewinds, but they want high interrupt rate => high power
consumption anyway. Applications that want dynamic latency will do
rewinds, because this is the only way to get it, except always
defaulting to sometimes-unnecessary low latency.
Post by David Henningsson
But that might be more of an academic question, I guess PA is the only
one that does rewinds currently. Or are there more apps out there that
do rewinds?
Right now it is indeed an academic question. However, earlier in this
year I was contacted by Andrew Eikum who was trying to use rewinds in
Wine and insisted that it is an ALSA bug when I told him that they
cannot work on the default device. I am not sure whether I convinced
him, because I did not use the "unsend bluetooth packets" wording back
then. I cannot quote him now, because this was via IM, so quoting would
be impolite. And, the log is on the other PC anyway.
Post by David Henningsson
I'm not sure I buy this point completely, but I can't come up with a
good counterexample either, so maybe you're right.
Post by Alexander E. Patrakov
My argument is that, in the case of
PulseAudio, these emulations would just create an unnecessary layer of
complexity. PulseAudio actively wants to know the device type (via the
is_seekable() call), so it is reasonable for it to also deal with
consequences, without the need for additional processing in the driver
or for an extra thread.
Agreed.
Post by Alexander E. Patrakov
Post by David Henningsson
I understand that you have a mathematically perfect approach to this, as
well as other algorithms. This would indeed be the best goal, but given
an imperfect world, where we're forced to choose between
1) no rewinding at all
2) imperfect rewinding in the sense that it sometimes can produce
hearable artifacts
...I'm not sure 1) is always the right choice...
Should we have a snd_pcm_open (or whatever else) flag for accepting
imperfect rewind results? [anyway, I would say "no" to using it in
PulseAudio]
Well, if we had one, and it defaulted to the old behaviour, I suppose
that would solve the new alsa-lib on old PA problem? But I'm not sure
it's worth the added complexity.
Understood. Yes, it would solve the problem, and I have no opinion so
far whether it is worth the cost of complexity and of the bad default.

This phrase from the documentation of snd_pcm_rewindable is also
something to think about:

"""
Note: The snd_pcm_rewind() can accept bigger value than returned by this
function. But it is not guaranteed that output stream will be consistent
with bigger value.
"""
--
Alexander E. Patrakov
Clemens Ladisch
2014-09-09 08:43:11 UTC
Permalink
Post by Alexander E. Patrakov
This phrase from the documentation of snd_pcm_rewindable is also
"""
Note: The snd_pcm_rewind() can accept bigger value than returned by
this function. But it is not guaranteed that output stream will be
consistent with bigger value.
This phrase again assumes a ring buffer, i.e., it says that
1) there is a buffer, and applications can write _anywhere_ in it, but
2) if a plugin reduces the allowed range (with snd_pcm_rewindable()),
it is likely that the plugin will not notice that some part of the
buffer has changed.

(The first part is not true for most unrewindable plugins.)


Regards,
Clemens
Raymond Yau
2014-09-11 03:49:22 UTC
Permalink
Post by Alexander E. Patrakov
On my desktop PC, on snd-hda-intel with analog outputs for S16LE stereo,
the granularity is 32 bytes (= 8 samples), and I get the pointer
granularity of 64 bytes (=16 samples) over HDMI. The minimum period size is
32 samples in both cases.

Do you mean hda-Intel does not support arbritray period size when you say
the granularity is 32 bytes ?

However the granularity of the emulated hda sound card inside any VM depend
on the vm and the backend audio system and sound card
A. C. Censi
2014-09-11 04:19:47 UTC
Permalink
Post by Raymond Yau
However the granularity of the emulated hda sound card inside any VM
depend on the vm and the backend audio system and sound card

Obviously it is a "virtual granularity".

ACC
Post by Raymond Yau
Post by Alexander E. Patrakov
On my desktop PC, on snd-hda-intel with analog outputs for S16LE stereo,
the granularity is 32 bytes (= 8 samples), and I get the pointer
granularity of 64 bytes (=16 samples) over HDMI. The minimum period size is
32 samples in both cases.
Do you mean hda-Intel does not support arbritray period size when you say
the granularity is 32 bytes ?
However the granularity of the emulated hda sound card inside any VM depend
on the vm and the backend audio system and sound card
_______________________________________________
Alsa-devel mailing list
http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
Raymond Yau
2014-09-13 09:15:38 UTC
Permalink
Post by A. C. Censi
Post by Raymond Yau
However the granularity of the emulated hda sound card inside any VM
depend on the vm and the backend audio system and sound card
Post by A. C. Censi
Obviously it is a "virtual granularity".
Seem emulate dma transfer

http://git.kernel.org/cgit/virt/kvm/qemu-kvm.git/plain/hw/intel-hda.c

But not all backend can provide same granularity

http://git.kernel.org/cgit/virt/kvm/qemu-kvm.git/tree/audio
Alexander E. Patrakov
2014-09-11 05:28:18 UTC
Permalink
Post by Alexander E. Patrakov
On my desktop PC, on snd-hda-intel with analog outputs for S16LE
stereo, the granularity is 32 bytes (= 8 samples), and I get the pointer
granularity of 64 bytes (=16 samples) over HDMI. The minimum period size
is 32 samples in both cases.
Do you mean hda-Intel does not support arbritray period size when you
say the granularity is 32 bytes ?
However the granularity of the emulated hda sound card inside any VM
depend on the vm and the backend audio system and sound card
The precise meaning is defined here:

http://mailman.alsa-project.org/pipermail/alsa-devel/2014-May/076475.html

Of course the VM result would be different.
--
Alexander E. Patrakov
Raymond Yau
2014-09-11 06:21:26 UTC
Permalink
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
On my desktop PC, on snd-hda-intel with analog outputs for S16LE
stereo, the granularity is 32 bytes (= 8 samples), and I get the pointer
granularity of 64 bytes (=16 samples) over HDMI. The minimum period size
is 32 samples in both cases.
Do you mean hda-Intel does not support arbritray period size when you
say the granularity is 32 bytes ?
However the granularity of the emulated hda sound card inside any VM
depend on the vm and the backend audio system and sound card
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-May/076475.html
Of course the VM result would be different.
-
http://git.kernel.org/cgit/virt/kvm/qemu-kvm.git/plain/hw/hda-audio.c

/* audio in/out widget */
case AC_VERB_SET_CHANNEL_STREAMID:
st = a->st + node->stindex;
if (st->node == NULL) {
goto fail;
}
hda_audio_set_running(st, false);
st->stream = (payload >> 4) & 0x0f;
st->channel = payload & 0x0f;
dprint(a, 2, "%s: stream %d, channel %d\n",
st->node->name, st->stream, st->channel);
hda_audio_set_running(st, a->running_real[st->output * 16 +
st->stream]);
hda_codec_response(hda, true, 0);
break;

It seem some vm just emulate start/stop of playback/capture , it does not
update any hda controller registers, this mean that it won't support rewind
Raymond Yau
2014-09-13 08:57:06 UTC
Permalink
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
On my desktop PC, on snd-hda-intel with analog outputs for S16LE
stereo, the granularity is 32 bytes (= 8 samples), and I get the pointer
granularity of 64 bytes (=16 samples) over HDMI. The minimum period size
is 32 samples in both cases.
Do you mean hda-Intel does not support arbritray period size when you
say the granularity is 32 bytes ?
However the granularity of the emulated hda sound card inside any VM
depend on the vm and the backend audio system and sound card
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-May/076475.html
Do you mean the different hda controllers may have different granularity ?

Do your two hda controllers have different Fifo size ?
Alexander E. Patrakov
2014-09-13 10:43:26 UTC
Permalink
Post by Raymond Yau
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
On my desktop PC, on snd-hda-intel with analog outputs for S16LE
stereo, the granularity is 32 bytes (= 8 samples), and I get the pointer
granularity of 64 bytes (=16 samples) over HDMI. The minimum period size
is 32 samples in both cases.
Do you mean hda-Intel does not support arbritray period size when you
say the granularity is 32 bytes ?
However the granularity of the emulated hda sound card inside any VM
depend on the vm and the backend audio system and sound card
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-May/076475.html
Do you mean the different hda controllers may have different granularity ?
Yes. My two hda controllers have different granularity.
Post by Raymond Yau
Do your two hda controllers have different Fifo size ?
If you mean the result of snd_pcm_hw_params_get_fifo_size(), then both
cards return 0. I call this function after snd_pcm_hw_params(), as
recommended by the documentation.
--
Alexander E. Patrakov
Raymond Yau
2014-09-13 11:33:14 UTC
Permalink
Post by Alexander E. Patrakov
Post by Raymond Yau
Post by Alexander E. Patrakov
On my desktop PC, on snd-hda-intel with analog outputs for S16LE
stereo, the granularity is 32 bytes (= 8 samples), and I get the pointer
granularity of 64 bytes (=16 samples) over HDMI. The minimum period size
is 32 samples in both cases.
Do you mean hda-Intel does not support arbritray period size when you
say the granularity is 32 bytes ?
However the granularity of the emulated hda sound card inside any VM
depend on the vm and the backend audio system and sound card
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-May/076475.html
Post by Alexander E. Patrakov
Post by Raymond Yau
Do you mean the different hda controllers may have different granularity ?
Yes. My two hda controllers have different granularity.
Post by Raymond Yau
Do your two hda controllers have different Fifo size ?
If you mean the result of snd_pcm_hw_params_get_fifo_size(), then both
cards return 0. I call this function after snd_pcm_hw_params(), as
recommended by the documentation.

Do your hda controllers OSDnFIFOS register match the granularity ?

3.3.40 Offset 90: {IOB}SDnFIFOS – Input/Output/Bidirectional Stream
Descriptor n FIFO Size

Length: 2 bytes

Table 39. Stream Descriptor n FIFO Size Bit Type Reset Description 15:0 RO
Imp.Dep

FIFO Size (FIFOS): Indicates the maximum number of bytes that could be
fetched by the controller at one time. This is the maximum number of bytes
that may have been DMA‟d into memory but not yet transmitted on the link,
and is also the maximum possible value that the LPIB count will increase by
at one time. This number may be static to indicate a static buffer size,
or may change after the data format has been programmed if the controller
is able to vary its FIFO size based on the stream format. If it is able to
change value after the data format has been programmed, the value update
must happen immediately before the next read of the FIFOS register, and
remain static until the next programming of data format.
Alexander E. Patrakov
2014-09-13 11:36:00 UTC
Permalink
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
Post by Raymond Yau
Post by Alexander E. Patrakov
On my desktop PC, on snd-hda-intel with analog outputs for S16LE
stereo, the granularity is 32 bytes (= 8 samples), and I get the
pointer
Post by Alexander E. Patrakov
Post by Raymond Yau
granularity of 64 bytes (=16 samples) over HDMI. The minimum
period size
Post by Alexander E. Patrakov
Post by Raymond Yau
is 32 samples in both cases.
Do you mean hda-Intel does not support arbritray period size
when you
Post by Alexander E. Patrakov
Post by Raymond Yau
say the granularity is 32 bytes ?
However the granularity of the emulated hda sound card inside any VM
depend on the vm and the backend audio system and sound card
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-May/076475.html
Post by Alexander E. Patrakov
Post by Raymond Yau
Do you mean the different hda controllers may have different
granularity ?
Post by Alexander E. Patrakov
Yes. My two hda controllers have different granularity.
Post by Raymond Yau
Do your two hda controllers have different Fifo size ?
If you mean the result of snd_pcm_hw_params_get_fifo_size(), then
both cards return 0. I call this function after snd_pcm_hw_params(), as
recommended by the documentation.
Do your hda controllers OSDnFIFOS register match the granularity ?
I don't know. Please send a program or a kernel patch that prints this
value (including units - samples or bytes), otherwise I won't be able to
answer.
--
Alexander E. Patrakov
Alexander E. Patrakov
2014-09-13 18:35:30 UTC
Permalink
Post by Alexander E. Patrakov
1. PulseAudio does not call snd_pcm_rewindable(), because for some ALSA
plugins it crashed. This crash is completely fixed in alsa-lib 1.0.28,
but in some cases snd_pcm_rewindable() still returns wrong results.
Bad news: even in 1.0.28, snd_pcm_rewindable() crashes for the file
plugin due to recursion. So I was wrong when saying "fixed completely" :(
Post by Alexander E. Patrakov
3. On the hw plugin, I could demonstrate two other bugs regarding
snd_pcm_rewindable(): stale data and bogus negative return values.
For all of the above issues, I have sent patches.
Post by Alexander E. Patrakov
=== On the rewind safeguard ===
Result 1: it has been decided that the return value of
snd_pcm_rewindable() is not changed, and the safeguard is returned by a
separate function. This would require documentation changes for
snd_pcm_rewindable(), though, as it officially no longer returns "safe
count of frames which can be rewinded". I have difficulty designing a
better wording what the function actually means now.

Result 2: the proposed heuristic has been rightfully and convincingly
busted, but no alternatives were proposed. It is the top priority to get
some constructive proposal here where to get the data, or a fallback
plan for alsa-lib. A fallback plan is also needed for old kernels if the
constructive proposal involves kernel changes.
Post by Alexander E. Patrakov
=== On non-rewindability of the rate plugin ===
Nothing more to discuss.
Post by Alexander E. Patrakov
=== On possibly-incomplete rewindability of the file plugin ===
Nothing more to discuss.
Post by Alexander E. Patrakov
=== On bogus rewindability of ladspa and extplug plugins ===
No conclusion. The proposed hack to leave the .rewind implementation in
place for use by old PulseAudio has met some opposition, but not such
definite and convincing opposition as to the rewind-safeguard and
low-latency-thread proposals. The proposed alternative (to make a
"disallow imperfect rewinds" flag, off by default) means more work for
no gain, and David also doubts whether such flag is worth the
complexity. It's also notable that nobody explicitly said "let's regress
old pulseaudio on new alsa-lib on the obscure dca plugin in the name of
clean code" - which would have also closed the question :)

In fact, I am not really sure that everyone understands the problem.
Post by Alexander E. Patrakov
=== On bogus rewindability of some ioplug-based plugins ===
The "low-latency-thread in ioplug" idea has been rightfully busted, and
transformed into the "flexiblerewind" plugin idea. However, that plugin
found no users (as PulseAudio won't use it), and thus there is no
incentive for me or anyone else to implement it. Nothing left to discuss.
Post by Alexander E. Patrakov
=== On the pulse plugin ===
We have reached an agreement that .rewindable and .rewind should be
added to ioplug and used by the pulse plugin. Nothing more to discuss.
Post by Alexander E. Patrakov
=== On communication of non-rewindability to the program ===
We have very good progress here: recognition of three rewindability
classes that seemingly nobody objects to. The wanted behavior of
PulseAudio for each rewindability class is not something we currently
fully agree upon, though, but I think that it will be more appropriate
to discuss further when someone sends a PulseAudio patch making use of
these classes. I.e. not now. It is very good that there is at least one
more person (Pierre Bossart) who understands the inherent conflict of
requirements.

Transition plan remains to be discussed, but that's the same discussion
as the one called for in the "rewind safeguard" part of the post.
Post by Alexander E. Patrakov
=== On the programmer expectations ===
(social issue)
Resolved (was misinterpretation on my side). I have to submit the
documentation patch.
--
Alexander E. Patrakov
Raymond Yau
2014-09-14 11:37:29 UTC
Permalink
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
=== On the rewind safeguard ===
Result 1: it has been decided that the return value of
snd_pcm_rewindable() is not changed, and the safeguard is returned by a
separate function.

It is unlikely to return any value which is safe, it is the responsiability
of the application to decide how much can be rewind

If pulseaudio assume 20ms process time is require to process two seconds of
audio and sleep for 1980ms, it should not assume cpu have infinite power

what is the purpose of the timestamp ?

Can the timestamp use to predict when will next period update occur if the
timestamp is obtained at previous period update ?

Why snd_pcm_rewind cannot return error when application just set stop
threshold to buffer size and rewind more than the stop threshold ?

:This would require documentation changes for snd_pcm_rewindable(), though,
as it officially no longer returns "safe count of frames which can be
rewinded". I have difficulty designing a better wording what the function
actually means now.

The word "safe" should be removed
Alexander E. Patrakov
2014-09-14 12:07:23 UTC
Permalink
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
=== On the rewind safeguard ===
Result 1: it has been decided that the return value of
snd_pcm_rewindable() is not changed, and the safeguard is returned by a
separate function.
It is unlikely to return any value which is safe, it is the
responsiability of the application to decide how much can be rewind
You are placing a responsibility on an application without giving it any
means to make an informed decision. E.g. 4 ms is OK on snd-hda-intel,
but definitely not OK on ymfpci even on infinitely fast CPU (because of
the fixed 5 ms interrupt interval). The whole question here is: how is
an application supposed to know that?
Post by Alexander E. Patrakov
If pulseaudio assume 20ms process time is require to process two seconds
of audio and sleep for 1980ms, it should not assume cpu have infinite power
You are right. The safeguard interval is a sum of the CPU-independent
but card-specific part (which is what is being talked about) and a
CPU-specific part (which the application indeed should know).
Post by Alexander E. Patrakov
what is the purpose of the timestamp ?
Can the timestamp use to predict when will next period update occur if
the timestamp is obtained at previous period update ?
See http://0pointer.de/blog/projects/pulse-glitch-free.html
Post by Alexander E. Patrakov
Why snd_pcm_rewind cannot return error when application just set stop
threshold to buffer size and rewind more than the stop threshold ?
I think it does. The problem (due to which we need a safeguard) is that
the card-specific minimum number of not-really-rewindable samples
exists. E.g., for ymfpci, that would be 5 ms.
Post by Alexander E. Patrakov
:This would require documentation changes for snd_pcm_rewindable(),
though, as it officially no longer returns "safe count of frames which
can be rewinded". I have difficulty designing a better wording what the
function actually means now.
The word "safe" should be removed
That's insufficient. The returned value is valid only in the ring-buffer
model for a card that fetches samples absolutely uniformly one at a
time, i.e. without any batching, updates the pointer at every
played-back sample, and doesn't have any "don't overwrite what I am
DMA-ing or I will kill you with IRQs" quirk. I.e., even with the
infinitely fast CPU, an attempt to rewind as much as that function
returns (and then overwrite) will yield a glitch (that may or may not be
detected as xrun even if xrun detection is enabled).
--
Alexander E. Patrakov
Raymond Yau
2014-09-15 02:43:53 UTC
Permalink
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
=== On the rewind safeguard ===
Result 1: it has been decided that the return value of
snd_pcm_rewindable() is not changed, and the safeguard is returned by a
separate function.
It is unlikely to return any value which is safe, it is the
responsiability of the application to decide how much can be rewind
You are placing a responsibility on an application without giving it any
means to make an informed decision. E.g. 4 ms is OK on snd-hda-intel, but
definitely not OK on ymfpci even on infinitely fast CPU (because of the
fixed 5 ms interrupt interval). The whole question here is: how is an
application supposed to know that?
Take a look at patent US 20100131783

System and Method of Dynamically Switching Queue Threshold

HDA may have different fifo threshold in different power states, the
granularity is not fixed

Twice the minimum period size/time is not any over estimate

Glitching still occurs at switch sink / change in power state when you
allow sound card run with lowest latency ?

I don't think your proposal of having three different class of granularity
is good idea
Takashi Iwai
2014-09-15 09:19:38 UTC
Permalink
At Sun, 14 Sep 2014 18:07:23 +0600,
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
=== On the rewind safeguard ===
Result 1: it has been decided that the return value of
snd_pcm_rewindable() is not changed, and the safeguard is returned by a
separate function.
It is unlikely to return any value which is safe, it is the
responsiability of the application to decide how much can be rewind
You are placing a responsibility on an application without giving it any
means to make an informed decision. E.g. 4 ms is OK on snd-hda-intel,
but definitely not OK on ymfpci even on infinitely fast CPU (because of
the fixed 5 ms interrupt interval). The whole question here is: how is
an application supposed to know that?
Well, maybe the word "safeguard" is somewhat confusing to be used as
a driver API. There is no "safety" at all there. There is only
"theoretically minimal" (and it often lies even if the hardware chip
says so). How much value to be taken as "safeguard" is rather a
choice by each application or sound backend.

Right now, the only information we give from the sound driver is
INFO_BATCH flag. And I agree with a bit more detailed information to
be exposed from the driver -- but only if possible. This must be an
optional information and not mandatory.


Takashi
Alexander E. Patrakov
2014-09-15 09:58:42 UTC
Permalink
Post by Takashi Iwai
At Sun, 14 Sep 2014 18:07:23 +0600,
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
=== On the rewind safeguard ===
Result 1: it has been decided that the return value of
snd_pcm_rewindable() is not changed, and the safeguard is returned by a
separate function.
It is unlikely to return any value which is safe, it is the
responsiability of the application to decide how much can be rewind
You are placing a responsibility on an application without giving it any
means to make an informed decision. E.g. 4 ms is OK on snd-hda-intel,
but definitely not OK on ymfpci even on infinitely fast CPU (because of
the fixed 5 ms interrupt interval). The whole question here is: how is
an application supposed to know that?
Well, maybe the word "safeguard" is somewhat confusing to be used as
a driver API. There is no "safety" at all there. There is only
"theoretically minimal" (and it often lies even if the hardware chip
says so). How much value to be taken as "safeguard" is rather a
choice by each application or sound backend.
I agree with this "bad wording" remark. Let's talk about the
"theoretically minimal non-rewindable amount of samples" from now on.
Post by Takashi Iwai
Right now, the only information we give from the sound driver is
INFO_BATCH flag. And I agree with a bit more detailed information to
be exposed from the driver -- but only if possible. This must be an
optional information and not mandatory.
Here I mostly agree.

Indeed, this INFO_BATCH flag exists. However, its documentation is so
vague that IMHO a documentation patch is needed to legitimize its use as
a "you must not leave less than one period when rewinding" indicator.
Also, the documentation says about snd_pcm_hw_params_is_batch(): "This
function should only be called when the configuration space contains a
single configuration" - I believe that it is an error (and PulseAudio
calls it without obeying this restriction), because this function is
helpful exactly for choosing hardware parameters such as period size.
May I write "if the configuration space contains more than one
configuration, the result indicates whether a configuration exists where
such double-buffering is done"? Also the documentation talks about
"hardware", should I extend it to the whole plugin chain?

As for the desire to export other information if available, it is
certainly good.

What remains not fully understood for me is the claim that the
information already exposed by every driver (in the form of the minimal
period size) is not useful. I understand that two people are against
this idea, so it must be bad. But I must understand why. Is it because
the minimum period size reported by some drivers (which ones are
suspected, if any?) may be a lie?

Clemens Ladisch mentioned USB as a counterexample, but it is a batch
device for which the current period size is more relevant than the
minimal one. Pierre-Louis Bossart just said that it is "really asking
for trouble" without much explanation.

Can anyone name a sound controller that is not of a batch variety, where
the theoretically minimal non-rewindable amount of samples is, at some
settings, higher than the reported minimum period size at the same
number of channels, sample rate and sample format?

[and yes, I understand that I can always cheat and say: let's change the
definition of INFO_BATCH so that it includes such cases]
--
Alexander E. Patrakov
Takashi Iwai
2014-09-15 10:08:12 UTC
Permalink
At Mon, 15 Sep 2014 15:58:42 +0600,
Post by Alexander E. Patrakov
Post by Takashi Iwai
At Sun, 14 Sep 2014 18:07:23 +0600,
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
Post by Alexander E. Patrakov
=== On the rewind safeguard ===
Result 1: it has been decided that the return value of
snd_pcm_rewindable() is not changed, and the safeguard is returned by a
separate function.
It is unlikely to return any value which is safe, it is the
responsiability of the application to decide how much can be rewind
You are placing a responsibility on an application without giving it any
means to make an informed decision. E.g. 4 ms is OK on snd-hda-intel,
but definitely not OK on ymfpci even on infinitely fast CPU (because of
the fixed 5 ms interrupt interval). The whole question here is: how is
an application supposed to know that?
Well, maybe the word "safeguard" is somewhat confusing to be used as
a driver API. There is no "safety" at all there. There is only
"theoretically minimal" (and it often lies even if the hardware chip
says so). How much value to be taken as "safeguard" is rather a
choice by each application or sound backend.
I agree with this "bad wording" remark. Let's talk about the
"theoretically minimal non-rewindable amount of samples" from now on.
Post by Takashi Iwai
Right now, the only information we give from the sound driver is
INFO_BATCH flag. And I agree with a bit more detailed information to
be exposed from the driver -- but only if possible. This must be an
optional information and not mandatory.
Here I mostly agree.
Indeed, this INFO_BATCH flag exists. However, its documentation is so
vague that IMHO a documentation patch is needed to legitimize its use as
a "you must not leave less than one period when rewinding" indicator.
Also, the documentation says about snd_pcm_hw_params_is_batch(): "This
function should only be called when the configuration space contains a
single configuration" - I believe that it is an error (and PulseAudio
calls it without obeying this restriction), because this function is
helpful exactly for choosing hardware parameters such as period size.
May I write "if the configuration space contains more than one
configuration, the result indicates whether a configuration exists where
such double-buffering is done"? Also the documentation talks about
"hardware", should I extend it to the whole plugin chain?
The whole hw_params_*() descriptions are something like bot-style
tweets, we need a bit better texts there :)

The batch parameter was (and partly is) indeed very ambiguous. Its
usage has been materialized a few years ago.
Post by Alexander E. Patrakov
As for the desire to export other information if available, it is
certainly good.
What remains not fully understood for me is the claim that the
information already exposed by every driver (in the form of the minimal
period size) is not useful. I understand that two people are against
this idea, so it must be bad. But I must understand why. Is it because
the minimum period size reported by some drivers (which ones are
suspected, if any?) may be a lie?
A kind of yes. Many drivers, especially the old ones, set the minimal
period size without actually knowing the real limit. It tends to be
smaller than the hardware really supports. This is, in most cases,
just because no hardware spec defines that. So, it can't be blindly
taken as the bottom line, unfortunately. That's why I suggested the
new field would be optional; we simply don't know the value.


Takashi
Post by Alexander E. Patrakov
Clemens Ladisch mentioned USB as a counterexample, but it is a batch
device for which the current period size is more relevant than the
minimal one. Pierre-Louis Bossart just said that it is "really asking
for trouble" without much explanation.
Can anyone name a sound controller that is not of a batch variety, where
the theoretically minimal non-rewindable amount of samples is, at some
settings, higher than the reported minimum period size at the same
number of channels, sample rate and sample format?
[and yes, I understand that I can always cheat and say: let's change the
definition of INFO_BATCH so that it includes such cases]
--
Alexander E. Patrakov
Pierre-Louis Bossart
2014-09-15 17:01:26 UTC
Permalink
Post by Takashi Iwai
Post by Alexander E. Patrakov
What remains not fully understood for me is the claim that the
information already exposed by every driver (in the form of the minimal
period size) is not useful. I understand that two people are against
this idea, so it must be bad. But I must understand why. Is it because
the minimum period size reported by some drivers (which ones are
suspected, if any?) may be a lie?
A kind of yes. Many drivers, especially the old ones, set the minimal
period size without actually knowing the real limit. It tends to be
smaller than the hardware really supports. This is, in most cases,
just because no hardware spec defines that. So, it can't be blindly
taken as the bottom line, unfortunately. That's why I suggested the
new field would be optional; we simply don't know the value.
Agree with Takashi, even recent audio IP tend to be reused in various
ways (buses, system agents, arbiters, DMA controllers, DDR controllers)
and no one really knows what the rewind granularity is, it's not a
metric that's tracked.

I actually liked the heuristic that's present in PulseAudio: constrain
the safeguard to 256 bytes or 1ms (the last part would actually be
fixed, it's currently not dependent on the sink actual rate and number
of channels). we could introduce an optional query drivers for this but
I wonder if it's worth the effort.
-Pierre
Alexander E. Patrakov
2014-09-15 17:14:41 UTC
Permalink
Post by Pierre-Louis Bossart
Post by Takashi Iwai
Post by Alexander E. Patrakov
What remains not fully understood for me is the claim that the
information already exposed by every driver (in the form of the minimal
period size) is not useful. I understand that two people are against
this idea, so it must be bad. But I must understand why. Is it because
the minimum period size reported by some drivers (which ones are
suspected, if any?) may be a lie?
A kind of yes. Many drivers, especially the old ones, set the minimal
period size without actually knowing the real limit. It tends to be
smaller than the hardware really supports. This is, in most cases,
just because no hardware spec defines that. So, it can't be blindly
taken as the bottom line, unfortunately. That's why I suggested the
new field would be optional; we simply don't know the value.
Agree with Takashi, even recent audio IP tend to be reused in various
ways (buses, system agents, arbiters, DMA controllers, DDR controllers)
and no one really knows what the rewind granularity is, it's not a
metric that's tracked.
I actually liked the heuristic that's present in PulseAudio: constrain
the safeguard to 256 bytes or 1ms (the last part would actually be
fixed, it's currently not dependent on the sink actual rate and number
of channels). we could introduce an optional query drivers for this but
I wonder if it's worth the effort.
OK, a direct question then.

The "256 bytes or 1 ms" heuristic is known not to work on ymfpci. Should
we bump it to "5.3 ms or 256 samples", or make configurable, or ask the
ymfpci maintainer to add the INFO_BATCH flag to that driver?
--
Alexander E. Patrakov
Takashi Iwai
2014-09-15 18:08:39 UTC
Permalink
At Mon, 15 Sep 2014 23:14:41 +0600,
Post by Alexander E. Patrakov
Post by Pierre-Louis Bossart
Post by Takashi Iwai
Post by Alexander E. Patrakov
What remains not fully understood for me is the claim that the
information already exposed by every driver (in the form of the minimal
period size) is not useful. I understand that two people are against
this idea, so it must be bad. But I must understand why. Is it because
the minimum period size reported by some drivers (which ones are
suspected, if any?) may be a lie?
A kind of yes. Many drivers, especially the old ones, set the minimal
period size without actually knowing the real limit. It tends to be
smaller than the hardware really supports. This is, in most cases,
just because no hardware spec defines that. So, it can't be blindly
taken as the bottom line, unfortunately. That's why I suggested the
new field would be optional; we simply don't know the value.
Agree with Takashi, even recent audio IP tend to be reused in various
ways (buses, system agents, arbiters, DMA controllers, DDR controllers)
and no one really knows what the rewind granularity is, it's not a
metric that's tracked.
I actually liked the heuristic that's present in PulseAudio: constrain
the safeguard to 256 bytes or 1ms (the last part would actually be
fixed, it's currently not dependent on the sink actual rate and number
of channels). we could introduce an optional query drivers for this but
I wonder if it's worth the effort.
OK, a direct question then.
The "256 bytes or 1 ms" heuristic is known not to work on ymfpci. Should
we bump it to "5.3 ms or 256 samples", or make configurable, or ask the
ymfpci maintainer to add the INFO_BATCH flag to that driver?
Adding INFO_BATCH would be a quick solution, indeed, although this
isn't 100% right.


Takashi
Raymond Yau
2014-09-18 01:15:42 UTC
Permalink
Post by Alexander E. Patrakov
What remains not fully understood for me is the claim that the
information already exposed by every driver (in the form of the minimal
period size) is not useful. I understand that two people are against
this idea, so it must be bad. But I must understand why. Is it because
the minimum period size reported by some drivers (which ones are
suspected, if any?) may be a lie?
Does this mean the granularity of most drivers are only one period since
most of them cannot reporte the dma position in realtime ?

(e.g. pointer callback of Intel8x0 use a timeout loop to read the last
valid index)

The safeguard will be two periods

If some HDA have FIFO size of 192 bytes which is more than the minimum
period bytes

Should we limit the start threshold to FIFO threshold or FIFO size ?

Does it need Brust length = 1 for those hda controller to support arbitrary
period size ?

Seem only some hda controller can trigger DMA transfer at 1/3 or 2/3 FIFO
buffer at different power states and report the dma position in pointer
callback

Does snd-oxygen provide this position with granularity which is less than
the minimum period size ?
Alexander E. Patrakov
2014-09-21 09:22:42 UTC
Permalink
Post by Takashi Iwai
Post by Alexander E. Patrakov
What remains not fully understood for me is the claim that the
information already exposed by every driver (in the form of the
minimal
Post by Alexander E. Patrakov
period size) is not useful. I understand that two people are
against
Post by Alexander E. Patrakov
this idea, so it must be bad. But I must understand why. Is it
because
Post by Alexander E. Patrakov
the minimum period size reported by some drivers (which ones are
suspected, if any?) may be a lie?
Does this mean the granularity of most drivers are only one period since
most of them cannot reporte the dma position in realtime ?
I guess you are right.
Post by Takashi Iwai
(e.g. pointer callback of Intel8x0 use a timeout loop to read the last
valid index)
The safeguard will be two periods
Here I don't see how that can be right. Even if the position is updated
at each period interrupt, we always know which period is currently
playing. So the theoretical minimum safeguard is exactly one period. Add
1 ms on top if you want to account for scheduler glitches.
Post by Takashi Iwai
If some HDA have FIFO size of 192 bytes which is more than the minimum
period bytes
Should we limit the start threshold to FIFO threshold or FIFO size ?
I think that in this case it is a bug to report such small minimum
period size.
Post by Takashi Iwai
Does it need Brust length = 1 for those hda controller to support
arbitrary period size ?
Seem only some hda controller can trigger DMA transfer at 1/3 or 2/3
FIFO buffer at different power states and report the dma position in
pointer callback
I am not a specialist in HDA hardware.
Post by Takashi Iwai
Does snd-oxygen provide this position with granularity which is less
than the minimum period size ?
I have a friend with a Xonar card, will ask him to perform a test for
you when he appears online.
--
Alexander E. Patrakov
Clemens Ladisch
2014-09-21 09:53:58 UTC
Permalink
Post by Raymond Yau
Does snd-oxygen provide this position with granularity which is less than
the minimum period size ?
Yes. Its DMA controller uses a burst size of 32 bytes. The pointer registers
are documented as reporting the position with 4-byte accuracy, but there were
problems when the period/buffer sizes were not aligned to 32 bytes. I've set
the minimum period size to 64 bytes, just to be safe.


Regards,
Clemens
Alexander E. Patrakov
2014-09-21 10:56:47 UTC
Permalink
Post by Clemens Ladisch
Post by Raymond Yau
Does snd-oxygen provide this position with granularity which is less than
the minimum period size ?
Yes. Its DMA controller uses a burst size of 32 bytes. The pointer registers
are documented as reporting the position with 4-byte accuracy, but there were
problems when the period/buffer sizes were not aligned to 32 bytes. I've set
the minimum period size to 64 bytes, just to be safe.
Well, my friend has tested the card anyway, and I am afraid that the
situation is in fact more complex than documented. Look how fast (8
samples per ioctl in a busy loop!) the card eats the first 256 samples.
This is very suspicious, and I am not really sure whether we can
actually perform a rewind over these 256 samples.

The "usual" step of the pointer updates (via snd_pcm_avail) is indeed 8
samples = 32 bytes with some fine-grained updates sometimes showing up
in the middle, which matches the documented burst size.
--
Alexander E. Patrakov
Raymond Yau
2014-09-22 03:27:17 UTC
Permalink
Post by Clemens Ladisch
Post by Raymond Yau
Does snd-oxygen provide this position with granularity which is less than
the minimum period size ?
Yes. Its DMA controller uses a burst size of 32 bytes. The pointer registers
are documented as reporting the position with 4-byte accuracy, but there were
problems when the period/buffer sizes were not aligned to 32 bytes. I've set
the minimum period size to 64 bytes, just to be safe.
The brust size is usually added to the constriants but the application
cannot deduce this value

err = snd_pcm_hw_constraint_step(runtime, 0,
SNDRV_PCM_HW_PARAM_PERIOD_BYTES, 32);

For those audio controllers have FIFO buffer, the application must keep the
FIFO buffer full at any time for glitch free.

Someone may argue that keeping at least one frame above FIFO threshold is
just enough

http://mailman.alsa-project.org/pipermail/alsa-devel/2014-September/081509.html

I don't understand why the minimum buffer size can be less than FIFO size
Loading...