Discussion:
crash/reboot with rawmidi on ice1712 dual opteron
Takashi Iwai
2007-04-24 12:54:41 UTC
Permalink
At Wed, 18 Apr 2007 17:15:41 +0200,
- system is RH Enterprise AS 4 update 4
- the soundcard is in a 64-bit PCI-X slot (so few soundcards fit
at all)
- pcm with ice1712 is working fine
- I disabled /proc/sys/kernel/panic_on_oops, but it still rebooted
the same way. Is there another way to prevent rebooting on panic
or so?
- by using printk with serial console, I could trace the reboot to
occur at the very first outb() call in mpu401_uart.c:64
with data 0, addr 0x304C
The address sounds a bit strange to me, but maybe depending on BIOS.
Check /proc/ioports whether this really is within the range of the
corresponding soundcard.


Takashi
Florian
2007-04-24 13:27:23 UTC
Permalink
Thanks for the reply. The relevant excerpt from /proc/ioports is

3000-3fff : PCI Bus #02
3000-303f : 0000:02:01.0
3000-303f : ICE1712
3040-305f : 0000:02:01.0
3040-305f : ICE1712
3060-306f : 0000:02:01.0
3060-306f : ICE1712
3070-307f : 0000:02:01.0
3070-307f : ICE1712
[full listing at end of message]

so I guess 304C is in range of the M-Audio Audiophile 24/96. Any
other ideas what I can try, either in ALSA code or elsewhere?

Thanks,
Florian
Post by Takashi Iwai
At Wed, 18 Apr 2007 17:15:41 +0200,
- system is RH Enterprise AS 4 update 4
- the soundcard is in a 64-bit PCI-X slot (so few soundcards fit
at all)
- pcm with ice1712 is working fine
- I disabled /proc/sys/kernel/panic_on_oops, but it still rebooted
the same way. Is there another way to prevent rebooting on panic
or so?
- by using printk with serial console, I could trace the reboot to
occur at the very first outb() call in mpu401_uart.c:64
with data 0, addr 0x304C
The address sounds a bit strange to me, but maybe depending on BIOS.
Check /proc/ioports whether this really is within the range of the
corresponding soundcard.
0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-006f : keyboard
0070-0077 : rtc
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
02f8-02ff : serial
0376-0376 : ide1
0378-037a : parport0
037b-037f : parport0
03c0-03df : vga+
03f8-03ff : serial
04d0-04d1 : pnp 00:05
1000-10ff : 0000:00:07.5
1000-10ff : AMD AMD8111
1100-117f : pnp 00:05
1180-11ff : pnp 00:05
1400-143f : 0000:00:07.5
1400-143f : AMD AMD8111
1440-145f : 0000:00:07.2
1440-145f : amd8111_smbus2
1460-146f : 0000:00:07.1
1460-1467 : ide0
1468-146f : ide1
2000-2fff : PCI Bus #01
2000-200f : 0000:01:02.0
2000-200f : sata_sil
2010-2013 : 0000:01:02.0
2010-2013 : sata_sil
2014-2017 : 0000:01:02.0
2014-2017 : sata_sil
2018-201f : 0000:01:02.0
2018-201f : sata_sil
2020-2027 : 0000:01:02.0
2020-2027 : sata_sil
3000-3fff : PCI Bus #02
3000-303f : 0000:02:01.0
3000-303f : ICE1712
3040-305f : 0000:02:01.0
3040-305f : ICE1712
3060-306f : 0000:02:01.0
3060-306f : ICE1712
3070-307f : 0000:02:01.0
3070-307f : ICE1712
4000-4fff : PCI Bus #81
4000-4fff : PCI Bus #83
4000-40ff : 0000:83:04.0
4400-44ff : 0000:83:04.0
4800-48ff : 0000:83:04.1
4c00-4cff : 0000:83:04.1
8000-8003 : PM1a_EVT_BLK
8004-8005 : PM1a_CNT_BLK
8008-800b : PM_TMR
8010-8015 : ACPI CPU throttle
8020-8023 : GPE0_BLK
80b0-80b7 : GPE1_BLK
80e0-80ef : amd756_smbus
Takashi Iwai
2007-04-24 13:39:41 UTC
Permalink
At Tue, 24 Apr 2007 15:27:23 +0200,
Post by Florian
Thanks for the reply. The relevant excerpt from /proc/ioports is
3000-3fff : PCI Bus #02
3000-303f : 0000:02:01.0
3000-303f : ICE1712
3040-305f : 0000:02:01.0
3040-305f : ICE1712
3060-306f : 0000:02:01.0
3060-306f : ICE1712
3070-307f : 0000:02:01.0
3070-307f : ICE1712
[full listing at end of message]
so I guess 304C is in range of the M-Audio Audiophile 24/96. Any
other ideas what I can try, either in ALSA code or elsewhere?
What shows /proc/asound/cards? Does it point 0x3000 or 0x3040?


Takashi
Florian
2007-04-24 13:43:54 UTC
Permalink
Hi Takashi,

the ice1712 points to 0x3040.

Thanks,
Florian
Post by Takashi Iwai
At Tue, 24 Apr 2007 15:27:23 +0200,
Post by Florian
Thanks for the reply. The relevant excerpt from /proc/ioports is
3000-3fff : PCI Bus #02
3000-303f : 0000:02:01.0
3000-303f : ICE1712
3040-305f : 0000:02:01.0
3040-305f : ICE1712
3060-306f : 0000:02:01.0
3060-306f : ICE1712
3070-307f : 0000:02:01.0
3070-307f : ICE1712
[full listing at end of message]
so I guess 304C is in range of the M-Audio Audiophile 24/96. Any
other ideas what I can try, either in ALSA code or elsewhere?
What shows /proc/asound/cards? Does it point 0x3000 or 0x3040?
Takashi
--
Florian Bomers
bome.com

-------------------------------------------------------
Music Software, Development Tools: http://www.bome.com
Java Sound extensions, plugins: http://www.tritonus.org
The Java Sound Resources: http://www.jsresources.org
-------------------------------------------------------
Please quote this email in your reply. Thanks!
Takashi Iwai
2007-04-24 13:51:20 UTC
Permalink
At Tue, 24 Apr 2007 15:43:54 +0200,
Post by Florian
Hi Takashi,
the ice1712 points to 0x3040.
Hm... could you show the output of "lspci -v" (regarding ice1712) ?
I wonder why 0x3000-0x303f is ignored.


Takashi
Post by Florian
Thanks,
Florian
Post by Takashi Iwai
At Tue, 24 Apr 2007 15:27:23 +0200,
Post by Florian
Thanks for the reply. The relevant excerpt from /proc/ioports is
3000-3fff : PCI Bus #02
3000-303f : 0000:02:01.0
3000-303f : ICE1712
3040-305f : 0000:02:01.0
3040-305f : ICE1712
3060-306f : 0000:02:01.0
3060-306f : ICE1712
3070-307f : 0000:02:01.0
3070-307f : ICE1712
[full listing at end of message]
so I guess 304C is in range of the M-Audio Audiophile 24/96. Any
other ideas what I can try, either in ALSA code or elsewhere?
What shows /proc/asound/cards? Does it point 0x3000 or 0x3040?
Takashi
--
Florian Bomers
bome.com
-------------------------------------------------------
Music Software, Development Tools: http://www.bome.com
Java Sound extensions, plugins: http://www.tritonus.org
The Java Sound Resources: http://www.jsresources.org
-------------------------------------------------------
Please quote this email in your reply. Thanks!
Florian
2007-04-24 14:12:10 UTC
Permalink
lspci -v shows:

02:01.0 Multimedia audio controller: VIA Technologies Inc. ICE1712
[Envy24] PCI Multi-Channel I/O Controller (rev 02)
Subsystem: VIA Technologies Inc. M-Audio Delta Audiophile
Flags: bus master, medium devsel, latency 64, IRQ 20
I/O ports at 3040 [size=32]
I/O ports at 3070 [size=16]
I/O ports at 3060 [size=16]
I/O ports at 3000 [size=64]
Capabilities: [80] Power Management version 1

Florian
Post by Takashi Iwai
At Tue, 24 Apr 2007 15:43:54 +0200,
Post by Florian
Hi Takashi,
the ice1712 points to 0x3040.
Hm... could you show the output of "lspci -v" (regarding ice1712) ?
I wonder why 0x3000-0x303f is ignored.
Takashi
Post by Florian
Thanks,
Florian
Post by Takashi Iwai
At Tue, 24 Apr 2007 15:27:23 +0200,
Post by Florian
Thanks for the reply. The relevant excerpt from /proc/ioports is
3000-3fff : PCI Bus #02
3000-303f : 0000:02:01.0
3000-303f : ICE1712
3040-305f : 0000:02:01.0
3040-305f : ICE1712
3060-306f : 0000:02:01.0
3060-306f : ICE1712
3070-307f : 0000:02:01.0
3070-307f : ICE1712
[full listing at end of message]
so I guess 304C is in range of the M-Audio Audiophile 24/96. Any
other ideas what I can try, either in ALSA code or elsewhere?
What shows /proc/asound/cards? Does it point 0x3000 or 0x3040?
Takashi
--
Florian Bomers
bome.com
-------------------------------------------------------
Music Software, Development Tools: http://www.bome.com
Java Sound extensions, plugins: http://www.tritonus.org
The Java Sound Resources: http://www.jsresources.org
-------------------------------------------------------
Please quote this email in your reply. Thanks!
_______________________________________________
Alsa-devel mailing list
http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
--
Florian Bomers
bome.com

-------------------------------------------------------
Music Software, Development Tools: http://www.bome.com
Java Sound extensions, plugins: http://www.tritonus.org
The Java Sound Resources: http://www.jsresources.org
-------------------------------------------------------
Please quote this email in your reply. Thanks!
Takashi Iwai
2007-04-24 14:23:59 UTC
Permalink
At Tue, 24 Apr 2007 16:12:10 +0200,
Post by Florian
02:01.0 Multimedia audio controller: VIA Technologies Inc. ICE1712
[Envy24] PCI Multi-Channel I/O Controller (rev 02)
Subsystem: VIA Technologies Inc. M-Audio Delta Audiophile
Flags: bus master, medium devsel, latency 64, IRQ 20
I/O ports at 3040 [size=32]
I/O ports at 3070 [size=16]
I/O ports at 3060 [size=16]
I/O ports at 3000 [size=64]
Capabilities: [80] Power Management version 1
Ah OK, it's non-linear...

When the hang-up occurs at the first write, it must be in
snd_mpu401_uart_cmd(). At the very beginning, it calls
mpu->write(mpu, 0x00, MPU401D(mpu));
Try to comment out this and see what happens.

Do I understand correctly that this bug happens when you open a
rawmidi device for read, e.g.
% cat /dev/snd/midiC0D0 > /dev/null
??


Takashi
Florian
2007-04-24 14:53:20 UTC
Permalink
Post by Takashi Iwai
When the hang-up occurs at the first write, it must be in
snd_mpu401_uart_cmd(). At the very beginning, it calls
mpu->write(mpu, 0x00, MPU401D(mpu)); Try to comment out this and
see what happens.
I had tried that - I think that I just commented out the reset
command. It would not crash or reboot, but it did not haver
functionality either.
Post by Takashi Iwai
Do I understand correctly that this bug happens when you open a
rawmidi device for read, e.g. % cat /dev/snd/midiC0D0 > /dev/null
yes. I usually used
amidi -p hw:0 -d
Post by Takashi Iwai
Perhaps an easiest but foolishest way to trace this is to put
printk at each io-port access and any other important points, and
give some sleep at each point, then watch the kernel message.
You can get rid of spin_lock_*() around that, just for testing.
I've done this until I traced it to the first outb() call, i.e. the
initialization mentioned above. The first outb() will cause the reboot.

Florian
Post by Takashi Iwai
??
Takashi
--
Florian Bomers
bome.com

-------------------------------------------------------
Music Software, Development Tools: http://www.bome.com
Java Sound extensions, plugins: http://www.tritonus.org
The Java Sound Resources: http://www.jsresources.org
-------------------------------------------------------
Please quote this email in your reply. Thanks!
Takashi Iwai
2007-04-24 15:04:06 UTC
Permalink
At Tue, 24 Apr 2007 16:53:20 +0200,
Post by Florian
Post by Takashi Iwai
When the hang-up occurs at the first write, it must be in
snd_mpu401_uart_cmd(). At the very beginning, it calls
mpu->write(mpu, 0x00, MPU401D(mpu)); Try to comment out this and
see what happens.
I had tried that - I think that I just commented out the reset
command.
The reset command contains a series of writes. The write access (zero
to 0x304c) is the very first part, and this isn't always necessary.
For example, trident doesn't like this sequence. So, just commenting
out this write should be fairly harmless to the later behavior.

So, commenting only the first zero write is worth to try (if you
didn't do yet).
Post by Florian
It would not crash or reboot, but it did not haver
functionality either.
Post by Takashi Iwai
Do I understand correctly that this bug happens when you open a
rawmidi device for read, e.g. % cat /dev/snd/midiC0D0 > /dev/null
yes. I usually used
amidi -p hw:0 -d
Post by Takashi Iwai
Perhaps an easiest but foolishest way to trace this is to put
printk at each io-port access and any other important points, and
give some sleep at each point, then watch the kernel message.
You can get rid of spin_lock_*() around that, just for testing.
I've done this until I traced it to the first outb() call, i.e. the
initialization mentioned above. The first outb() will cause the reboot.
And this causes an immediate reboot, not panic or oops, right?
You shouldn't do this kind of debug on X but on VGA console, BTW.


Takashi
Florian
2007-04-24 17:51:50 UTC
Permalink
IT WORKED! I fixed it in this way:

mpu401_uart.c:229

if (mpu->hardware != MPU401_HW_TRID4DWAVE
&& mpu->hardware != MPU401_HW_ICE1712) {
mpu->write(mpu, 0x00, MPU401D(mpu));
/*snd_mpu401_uart_clear_rx(mpu);*/
}

I don't know if this will work on non-AMD machines and if it will
work on all ice1712 machines... Next week I can test it with
different M-Audio cards on single-processor machines (Pentium and AMD).

Thanks a lot!
Florian
Post by Takashi Iwai
At Tue, 24 Apr 2007 16:53:20 +0200,
Post by Florian
Post by Takashi Iwai
When the hang-up occurs at the first write, it must be in
snd_mpu401_uart_cmd(). At the very beginning, it calls
mpu->write(mpu, 0x00, MPU401D(mpu)); Try to comment out this and
see what happens.
I had tried that - I think that I just commented out the reset
command.
The reset command contains a series of writes. The write access (zero
to 0x304c) is the very first part, and this isn't always necessary.
For example, trident doesn't like this sequence. So, just commenting
out this write should be fairly harmless to the later behavior.
So, commenting only the first zero write is worth to try (if you
didn't do yet).
Post by Florian
It would not crash or reboot, but it did not haver
functionality either.
Post by Takashi Iwai
Do I understand correctly that this bug happens when you open a
rawmidi device for read, e.g. % cat /dev/snd/midiC0D0 > /dev/null
yes. I usually used
amidi -p hw:0 -d
Post by Takashi Iwai
Perhaps an easiest but foolishest way to trace this is to put
printk at each io-port access and any other important points, and
give some sleep at each point, then watch the kernel message.
You can get rid of spin_lock_*() around that, just for testing.
I've done this until I traced it to the first outb() call, i.e. the
initialization mentioned above. The first outb() will cause the reboot.
And this causes an immediate reboot, not panic or oops, right?
You shouldn't do this kind of debug on X but on VGA console, BTW.
Takashi
--
Florian Bomers
bome.com

-------------------------------------------------------
Music Software, Development Tools: http://www.bome.com
Java Sound extensions, plugins: http://www.tritonus.org
The Java Sound Resources: http://www.jsresources.org
-------------------------------------------------------
Please quote this email in your reply. Thanks!
Daniel James
2007-04-25 09:12:46 UTC
Permalink
Hi Florian,
Post by Florian
mpu401_uart.c:229
if (mpu->hardware != MPU401_HW_TRID4DWAVE
&& mpu->hardware != MPU401_HW_ICE1712) {
mpu->write(mpu, 0x00, MPU401D(mpu));
/*snd_mpu401_uart_clear_rx(mpu);*/
}
I don't know if this will work on non-AMD machines and if it will
work on all ice1712 machines...
I can test it here - which version is your patch against?

Cheers!

Daniel
Florian
2007-04-25 13:07:44 UTC
Permalink
it's the hg version from last Monday. In any case, just look for the
string "MPU401_HW_TRID" - it only appears once in the file, then
add this condition:

&& mpu->hardware != MPU401_HW_ICE1712) {

One other thing: this fix does not fix MIDI output. Unfortunately I
can't test this now.

Florian
Post by Daniel James
Hi Florian,
Post by Florian
mpu401_uart.c:229
if (mpu->hardware != MPU401_HW_TRID4DWAVE
&& mpu->hardware != MPU401_HW_ICE1712) {
mpu->write(mpu, 0x00, MPU401D(mpu));
/*snd_mpu401_uart_clear_rx(mpu);*/
}
I don't know if this will work on non-AMD machines and if it will
work on all ice1712 machines...
I can test it here - which version is your patch against?
Cheers!
Daniel
_______________________________________________
Alsa-devel mailing list
http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
--
Florian Bomers
bome.com

-------------------------------------------------------
Music Software, Development Tools: http://www.bome.com
Java Sound extensions, plugins: http://www.tritonus.org
The Java Sound Resources: http://www.jsresources.org
-------------------------------------------------------
Please quote this email in your reply. Thanks!
Takashi Iwai
2007-04-25 13:11:39 UTC
Permalink
At Wed, 25 Apr 2007 15:07:44 +0200,
Post by Florian
it's the hg version from last Monday. In any case, just look for the
string "MPU401_HW_TRID" - it only appears once in the file, then
&& mpu->hardware != MPU401_HW_ICE1712) {
One other thing: this fix does not fix MIDI output.
I don't know of the MIDI output problem? Descriptions?


Takashi
Florian
2007-04-25 13:56:25 UTC
Permalink
Post by Takashi Iwai
I don't know of the MIDI output problem? Descriptions?
it's the same symptom: MIDI input works, but opening MIDI output
will reboot immediately. I haven't found a similar line to exclude
for MIDI output...

Florian
Post by Takashi Iwai
At Wed, 25 Apr 2007 15:07:44 +0200,
Post by Florian
it's the hg version from last Monday. In any case, just look for the
string "MPU401_HW_TRID" - it only appears once in the file, then
&& mpu->hardware != MPU401_HW_ICE1712) {
One other thing: this fix does not fix MIDI output.
I don't know of the MIDI output problem? Descriptions?
Takashi
--
Florian Bomers
bome.com

-------------------------------------------------------
Music Software, Development Tools: http://www.bome.com
Java Sound extensions, plugins: http://www.tritonus.org
The Java Sound Resources: http://www.jsresources.org
-------------------------------------------------------
Please quote this email in your reply. Thanks!
Daniel James
2007-04-24 14:23:52 UTC
Permalink
Hi Takashi, hi Florian,

I found a similar MIDI crashing bug on a dual Opteron machine in 2005,
even with only one processor installed, which I reported at the time on
alsa-devel. The card was an M-Audio Audiophile 24/96, normally reliable.

I couldn't replicate this problem on an Asus single processor Opteron
board, so I concluded it was a quirk of my dual socket Tyan S2875
motherboard:

http://article.gmane.org/gmane.linux.alsa.devel/24682/
http://article.gmane.org/gmane.linux.alsa.devel/25323/

Last time I tested it, earlier this year I think, the bug was still
there. I even flashed the BIOS of the Tyan board in case it was a BIOS
bug, but it made no difference. I installed a 32-bit distro and that
made no difference either.

Maybe there is something more generally wrong here, which only affects
dual-processor AMD64 hardware. I can make the Tyan machine available
over SSH if that helps, it has a fixed IP address.

Cheers!

Daniel
Takashi Iwai
2007-04-24 14:32:36 UTC
Permalink
At Tue, 24 Apr 2007 15:23:52 +0100,
Post by Daniel James
Hi Takashi, hi Florian,
I found a similar MIDI crashing bug on a dual Opteron machine in 2005,
even with only one processor installed, which I reported at the time on
alsa-devel. The card was an M-Audio Audiophile 24/96, normally reliable.
I couldn't replicate this problem on an Asus single processor Opteron
board, so I concluded it was a quirk of my dual socket Tyan S2875
http://article.gmane.org/gmane.linux.alsa.devel/24682/
http://article.gmane.org/gmane.linux.alsa.devel/25323/
Last time I tested it, earlier this year I think, the bug was still
there. I even flashed the BIOS of the Tyan board in case it was a BIOS
bug, but it made no difference. I installed a 32-bit distro and that
made no difference either.
Maybe there is something more generally wrong here, which only affects
dual-processor AMD64 hardware. I can make the Tyan machine available
over SSH if that helps, it has a fixed IP address.
The most important thing is to find out what triggers which result.
As far as I understand from Florian's analysis, the io-port access
results in a machine reboot, not a kernel panic or so. It's scary
because the controls is completely out of kernel.

Perhaps an easiest but foolishest way to trace this is to put printk
at each io-port access and any other important points, and give some
sleep at each point, then watch the kernel message. You can get rid
of spin_lock_*() around that, just for testing.


Takashi
Daniel James
2007-04-24 15:03:56 UTC
Permalink
Hi Takashi,
Post by Takashi Iwai
The most important thing is to find out what triggers which result.
As far as I understand from Florian's analysis, the io-port access
results in a machine reboot, not a kernel panic or so.
In my case, I saw a hard lock-up as soon as I typed the name of any MIDI
program, even 'aconnect'. There was no panic or log information, just a
complete freeze.

Cheers!

Daniel
Loading...