Frequent segmentation faults

Bug #692635 reported by Severin H
256
This bug affects 47 people
Affects Status Importance Assigned to Milestone
LottaNZB
Confirmed
High
LottaNZB Development Team

Bug Description

Some users, including myself, experience regular crashes with various stack traces like the following. They are the main reason why the release of LottaNZB 0.6 has been delayed for such a long time. It seems like the bug has still not been resolved.

Unfortunately, I don't have any clue about the underlying cause and debugging is hard because they occur in the middle of the downloading process. The fact that they originate from the low-level library GDK makes things even worse. I wonder whether there's a problem in LottaNZB's code that cause the API of GDK to be used in an invalid way or whether there's actually a bug in GDK.

Attempts have been made to ensure that the GTK lock is acquired whenever GUI-related code is executed, but this measure has either not been thorough enough or the bug caused by something completely different.

/home/todd/projects/lottanzb/lottanzb/lottanzb/core/__init__.py:95: PangoWarning: shaping failure, expect ugly output. shape-engine='BasicEngineFc', font='DejaVu Sans 7.03125', text=''
  gtk.main()
**
Gdk:ERROR:/build/buildd/gtk+2.0-2.20.1/gdk/gdkregion-generic.c:1108:miUnionNonO: assertion failed: (y1 < y2)

Severin H (severinh)
Changed in lottanzb:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → LottaNZB Development Team (lottanzb)
milestone: none → 0.6
Revision history for this message
Severin H (severinh) wrote :

Unless someone comes up with a solution sooner, I'll try to pinpoint the crash to an operation performed by LottaNZB (if possible) and maybe obtain a proper stacktrace using gdb. After having done a web search, it seems to me that LottaNZB is definitely not the only application affected by this mysterious bug, but I haven't seen any solutions so far.

Revision history for this message
Severin H (severinh) wrote :

After having waited for about 20 minutes for a crash to happen, I was finally capable of obtaining a proper stacktrace.

Revision history for this message
Severin H (severinh) wrote :

Just came up with the following theory:

gui/main/info_bar.py connects to the general hub in order to be notified changes to the download speed and to update the corresponding label in the GUI. It seems like this handler is run in the same thread as the one that updated the download speed value in the general hub, which is not necessarily the main thread. It might be possible that updating the GUI state from another thread results in rare race conditions throughout the execution of the application. Thus, forcing the handler to run in the main thread might solve the problem.

I'll make this change and look for other such occurrences. After that, we'll see whether we still experience these mysterious crashes.

Revision history for this message
Todd Allen (speedebikes) wrote :

Severin, sounds like a good theory. Writing multi-threaded code is rather challenging and prone to bugs. Whenever data can be accessed by multiple threads great care must be taken. It's usually best to isolate tasks (such as updating the gui) to a single thread.

Revision history for this message
Severin H (severinh) wrote :

Just came across a rather helpful text regarding thread-awareness of GTK: http://blogs.operationaldynamics.com/andrew/software/gnome-desktop/gtk-thread-awareness.html

Looks it would be a good idea to go over the whole application and check whether all interactions are protected by the GDK lock, either implicitly within the main thread or explicitly using gtk.gdk.lock.

Revision history for this message
Severin H (severinh) wrote :

@Todd: I didn't see your comment before. I definitely agree that it's hard to get multi-threaded applications 100% right. And debugging is often a nightmare.

Correction for my previous comment: interactions *with GTK*.

Revision history for this message
Severin H (severinh) wrote :

Also, LottaNZB spawns incredible amounts of short-lived threads, namely one for each query to SABnzbd (at least one per second). For the sake of peformance, one might consider some form of pooling, or evaluate whether falling back to running only one query at a time negatively affects the UI's responsiveness.

Revision history for this message
Todd Allen (speedebikes) wrote :

@Severin: It's been slow going for me so far, but I read through http://docs.python.org/tutorial/ and am starting to get a feel for Python. I can now run lottanzb under the eclipse/pydev debugger. I'm not yet happy with what I can do in eclipse but I'm making progress.

I was able to see threads spawning off about 1 per second as you described. That was with an empty download queue. Tomorrow I'll do some exploring with what happens with an active downhload queue and when items are being added/deleted from the queue.

My gut feeling is that 1 spawn per second while ugly is no big deal. At least that would be the case at the OS level for Windows or Linux. However, I imagine it is possible though hopefully unlikely that the overhead of spawning Python threads is much higher. I don't yet know what sort of profiling or performance benchmarking tools are readily available but it should be fairly easy to come up with a way to measure the performance cost of spawning/killing threads.

But the more urgent issue is to get an understanding of what the threads are doing and whether they are violating the synchronization requirements of the code and data that they access. It sounds like you are already looking. Please post anything you find.

Severin H (severinh)
Changed in lottanzb:
milestone: 0.6 → 0.6.1
Severin H (severinh)
summary: - Crashes due to Gdk:ERROR...miUnionNonO: assertion failed: (y1 < y2)
+ Frequent segmentation faults
description: updated
Revision history for this message
Severin H (severinh) wrote :

@Todd: Even though I think that most of the GUI code now properly acquires the GTK lock, crashes still seem to occur, even though from my perspective, they've become less frequent. Considering the number of people who have reported such crashes, it will definitely be necessary to further investigate the problem.

If you're interested, the module used to handle the GTK locking mechanism is 'lottanzb.util.gtk_extras'.

description: updated
Revision history for this message
Antoine Dessaigne (antoine-dessaigne) wrote :

My LottaNZB just crashes but sabnzb is still running. There is absolutely no information in the ~.config/lottanzb/log file near the time it crashes. Is there a way to set the log level of LottaNZB to debug in case this happens again ?

Revision history for this message
Severin H (severinh) wrote :

~/.config/lottanzb/log already contains all debug messages, no matter if LottaNZB was launched using --debug or not. The problem is that the crashes are not caused by unhandled exceptions in Python, but by nasty segmentation faults in the low-level GUI libaries (GTK, etc.) used by LottaNZB. This is why the problem is so hard to track down. The only possible way to get somewhat reasonable data is by running LottaNZB using GDB and wait for a crash to happen, so that one is given a stacktrace.

Revision history for this message
fafa_its (fafa-its) wrote :

[code]
/usr/share/lottanzb/lottanzb/core/__init__.py:95: PangoWarning: shaping failure, expect ugly output. shape-engine='BasicEngineFc', font='DejaVu Sans 7.03125', text=''
  gtk.main()
**
Gdk:ERROR:/build/buildd/gtk+2.0-2.24.4/gdk/gdkregion-generic.c:1110:miUnionNonO: assertion failed: (y1 < y2)
Aborted
[/code]

I got the following error on ubuntu 11.04

Revision history for this message
aproposnix (aproposnix) wrote :

Hi Guys, any progress being made on this? I'm still having the issue on Natty. Unfortunately I've had to revert to 0.5.4 for some time now. I'd love to upgrade to version .6.1

Revision history for this message
Stéphane Maniaci (stephh) wrote :

Hey Harry,

I'm working on a GTK3 branch of LottaNZB, that might solve the probem. We'll see when it's finished and working.

Revision history for this message
JerryH (jerry-metalcat) wrote :

Same here, lots of this :

Traceback (most recent call last):
  File "/usr/share/lottanzb/lottanzb/backend/interface/__init__.py", line 214, in internal_handler
    handler(interface, query, *args)
  File "/usr/share/lottanzb/lottanzb/backend/hubs/polling.py", line 54, in internal_handler
    handler(query)
  File "/usr/share/lottanzb/lottanzb/backend/hubs/statistics.py", line 72, in on_history_query
    value = DataSize.from_string(query.response[response_key])
  File "/usr/share/lottanzb/lottanzb/backend/hubs/__init__.py", line 130, in from_string
    raise ValueError("Invalid data size.")
ValueError: Invalid data size.

till it eventually seg faults

Revision history for this message
rawdmon (raw-dmon) wrote :

This issue is still present and it has been a long time since it was reported. It seems to only happen when actually downloading something. There must be something in the code that handles the actual status updates between lottanzb and sabnzbd that are causing these crashes. This app was amazing back when it used to use hellanzb. I would very honestly not mind if it went back to using hellanzb and if the developers of the project started providing support for maintaining hellanzb as well. I just checked out the hellanzb page and it would appear that developers are starting to use trac to manage the project. The last update is from just yesterday. When it used hellanzb it was a nice, clean, simple, compact solution. Since it has switched to using sabnzbd it has become both bloated and unreliable.

Revision history for this message
rawdmon (raw-dmon) wrote :

Here's an strace of the crash if it helps...

Gdk:ERROR:/build/buildd/gtk+2.0-2.24.4/gdk/gdkregion-generic.c:1110:miUnionNonO: assertion failed: (y1 < y2)
 <unfinished ...>
+++ killed by SIGABRT +++
Aborted

Revision history for this message
JerryH (jerry-metalcat) wrote :

If there was a vote, I'd go for hellanzb too :)

Revision history for this message
aproposnix (aproposnix) wrote :

what I have noticed is that if I keep lottanzb running in the indicator taskbar (i.e. "do not show ") it usually stays open the entire download. I can even open it and check it but if I have it ope for 30-60 seconds, it almost always crashes.

I miss the hellanzb days as well but we need to respect the developer.

Revision history for this message
Sander Tuit (avirulence) wrote : Re: [Bug 692635] Re: Frequent segmentation faults

Hi all. This is not a problem with Sabnzbd, but with pyGTK. The problem
doesn't lie with the back-end, but with our implementation. The problem is
that pyGTK should offer a high-level solution and these kinds of problems
are not supposed to be even possible. We're still busy trying to track down
the cause of this problem.

Sander

On Mon, Oct 17, 2011 at 8:18 PM, harry <email address hidden> wrote:

> what I have noticed is that if I keep lottanzb running in the indicator
> taskbar (i.e. "do not show ") it usually stays open the entire download.
> I can even open it and check it but if I have it ope for 30-60 seconds,
> it almost always crashes.
>
> I miss the hellanzb days as well but we need to respect the developer.
>
> --
> You received this bug notification because you are a member of LottaNZB
> Development Team, which is a bug assignee.
> https://bugs.launchpad.net/bugs/692635
>
> Title:
> Frequent segmentation faults
>
> Status in LottaNZB • Usenet Downloader:
> Confirmed
>
> Bug description:
> Some users, including myself, experience regular crashes with various
> stack traces like the following. They are the main reason why the
> release of LottaNZB 0.6 has been delayed for such a long time. It
> seems like the bug has still not been resolved.
>
> Unfortunately, I don't have any clue about the underlying cause and
> debugging is hard because they occur in the middle of the downloading
> process. The fact that they originate from the low-level library GDK
> makes things even worse. I wonder whether there's a problem in
> LottaNZB's code that cause the API of GDK to be used in an invalid way
> or whether there's actually a bug in GDK.
>
> Attempts have been made to ensure that the GTK lock is acquired
> whenever GUI-related code is executed, but this measure has either not
> been thorough enough or the bug caused by something completely
> different.
>
> /home/todd/projects/lottanzb/lottanzb/lottanzb/core/__init__.py:95:
> PangoWarning: shaping failure, expect ugly output.
> shape-engine='BasicEngineFc', font='DejaVu Sans 7.03125', text=''
> gtk.main()
> **
> Gdk:ERROR:/build/buildd/gtk+2.0-2.20.1/gdk/gdkregion-generic.c:1108:miUnionNonO:
> assertion failed: (y1 < y2)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/lottanzb/+bug/692635/+subscriptions
>

Revision history for this message
Sander Tuit (avirulence) wrote :

Hi all.

This is not a problem with Sabnzbd, but with pyGTK. The problem doesn't lie with the back-end, but with our implementation. The problem is that pyGTK should offer a high-level solution and these kinds of problems are not supposed to be even possible. We're still busy trying to track down the cause of this problem.

Sander

Revision history for this message
Todd Allen (speedebikes) wrote :

I very briefly started to look into the problem but in the process came to
realize that sabnzb works well by itself using its web browser interface -
unlike hellanzb which really benefited from the added interface of
lottanzb. Sabnzb's browser interface is not quite as pretty as lottanzb but
it has a lot more functionality and the interface is somewhat customizable.

I found it satisfied my needs sufficiently such that my desire to work on
and use lottanzb evaporated. Sabnzb isn't perfect and I could imagine a
future version of lottanzb might eclipse it, but my guess is that is
unlikely to be easy or happen soon.

Revision history for this message
pazuzuthewise (pazuzuthewise) wrote :

I confirm the observation by harry (clarifyubuntu), that lottanzb works perfectly when minimized to the appindicator ("show Lottanzb¨), but crashes frequently when it's window is shown (even if no download is running).

A strange behavior I've noticed (may be pertinent or no), is that with lottanzb pinned to the unity launcher, it's launcher icon is highlighted (showing that the program is running) only if the program window is not hidden. When the program is hidden from view, it's unity launcher loses the background highlighting, as if the program was not running. When invoked by clicking on the appindicator, the unity launcher becomes animated with the launching animation.

I'm running oneiric x32, with the pae kernel.

Revision history for this message
pb (pe-ba) wrote :

Having had the same issue I added a print before the line where the Value error occurs.
Apparently, a string where it fails is u"44.3 G". Looking at the code I assume the string is meant to be "44.3 GB". Crudely fixing it by appending a B if there is a G at the end

107 if value.endswith("G"):
108 value += "B"

before doing the regular expression seems to fix it for me. I wouldn't call that a fix though - more a hint to the actual cause. Something somewhere generates G not GB....

3.0.0-20-generic #34-Ubuntu SMP Tue May 1 17:24:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux, Linux Mint 12, lottanzb 0.6-1ubuntu1

Revision history for this message
rawdmon (raw-dmon) wrote :

@pb: you could have specific the exact file/line that you saw that issue at. I'm having to duplicate the work that you've already done to track down the line now so that I can take a stab at tracing this back to a root cause.

Revision history for this message
rawdmon (raw-dmon) wrote :

nevermind, I noticed that someone posted it higher up...

/usr/share/lottanzb/lottanzb/backend/hubs/__init__.py around line 120

Revision history for this message
rawdmon (raw-dmon) wrote :

I tried the crude fix suggested just to see if it would do anything...

/usr/share/lottanzb/lottanzb/backend/hubs/__init__.py

line 107:

        if value.endswith("K") or value.endswith("M") or value.endswith("G") \
          or value.endswith("T") or value.endswith("P"):
            value += "B"

Still crashing the same as before.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.