Comment 85 for bug 351990

Revision history for this message
Martin Olsson (mnemo) wrote :

@juhuu, yea that's strange actually. For now, let me updated the main bug to clarify this.

It might of course be that we're dealing with two different bugs being triggered by the same repro steps (one XAA bug that causes a black screen on Wolfgang's hardware and another bug that causes a lockup for EXA). It's extremely hard to know for sure. My gut feeling (and it's not much more than that) tells me that once we figure out what's special about that webpage we can fix both bugs. I've been trying various things here so far, like for example using wget to download all images off that site to see if there any of them can cause the crash in isolation but no luck so far. I've also tried various versions of the upstream DDX (I was hoping we'd find a range to bisect) but unfortunately I think that the default operational mode for the -ati driver was XAA in intrepid so it might very well be that EXA has been freeze free in the past (and in that case we can't bisect and that makes it a lot harder). I'm not sure more data and/or bug sorting can help us get closer to a fix right now. Personally, I'll focus on testing different versions of the driver now (6.12 branch master first and then maybe the radeon rewrite branch if I can get that to build). Finally, whether we can ship an update for jaunty depends on how disruptive / risky the fix is, there are many -ati users who has got perfectly stable systems and we cannot risk introducing regressions on their systems (there is a very strict testing process for shipping updates, you can read about it under topic "SRU" in the wiki).

Long term there are good news for bugs like this one though because the -ati / -radeon writer that is currently being developed has a special debugging facility that allows the GPU command buffer to be dumped to a file. If we just had that feature today, this bug would be much much simpler to understand because then we could see exactly the set of commands that were sent to the GPU (that made it stop responding). The problem today is that today the only thing we can see is that the driver is waiting for the GPU to finish "some task" that never finishes but we can't see what that task is. When compiz is ON this manifests itself as an ioctl() call that never returns and when compiz is OFF it manifests itself as very high CPU activity in xorg itself but that only happens because what xorg is doing is that it's repeatedly asked the GPU "dude, are you done yet?". It just keeps asking again again but the GPU is never idle so it can never start to process that next request. Even though the symptoms look very different, it's actually the same bug. Because of the lack of a good debugging facility, this type of bug is actually quite hard to fix/narrow down and I won't hold my breath for a fix (even though I hope for one just as much as you do).

Maybe in the meantime you can run with "DRI" "off" or "NoAccel" "true" or something like that?