How do I enter non-ASCII letters in typography extension?

Asked by Hachmann on 2015-03-04

I've been playing around with the typography extension, and couldn't find a way to add a German 'ä'.

I tried to enter the letter directly, resulting in

Traceback (most recent call last):
  File "new_glyph_layer.py", line 54, in <module>
    e.affect()
  File "/usr/share/inkscape/extensions/inkex.py", line 268, in affect
    self.effect()
  File "new_glyph_layer.py", line 43, in effect
    layer.set(inkex.addNS('label', 'inkscape'), 'GlyphLayer-'+char)
  File "lxml.etree.pyx", line 746, in lxml.etree._Element.set (src/lxml/lxml.etree.c:42955)
  File "apihelpers.pxi", line 547, in lxml.etree._setAttributeValue (src/lxml/lxml.etree.c:19025)
  File "apihelpers.pxi", line 1395, in lxml.etree._utf8 (src/lxml/lxml.etree.c:26485)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

 and I tried the Unicode-Code for ä, which is U+00E4, with and without the U+ prefix, and also 228, but all this does is create several layers named GlyphLayer-U, GlyphLayer-+, GlyphLayer-0, GlyphLayer-0, etc.

So how does this work?

(Inkscape 0.91 on LM17.1, 64bit)

Question information

Language:
English Edit question
Status:
Solved
For:
Inkscape Edit question
Assignee:
No assignee Edit question
Solved by:
Hachmann
Solved:
2015-03-16
Last query:
2015-03-16
Last reply:
su_v (suv-lp) said : #1

<off-topic comment>
The Typography extensions were commissioned work (paid development), and never fully finished nor further maintained by its author to the best of my knowledge (the client decided to move the focus to fontforge for font design, abandoning the idea to get inkscape up to the task in a timely manner). The documentation for the committed state of the extension suite has yet to be written ... i.e. you are somewhat on your own in discovering how it is supposed to work, what works, what is missing (not fully implemented), and what are bugs.
</off-topic comment>

Hachmann (marenhachmann) said : #2

Thank you, suv.

I also put this question into the blog of the person who wrote the extension now, maybe he'll be able to shed some light onto this.

Hachmann (marenhachmann) said : #3

No reply from the author of the extension until now - thanks again, suv!

me-kell (me-kell) said : #4

At least in Inkscape 0.91 you can find the extension in "<inkscape>/share/extensions/new_glyph_layer.py"

The extension works as follows:
1. It defines/adds an option "unicodechars" to the OptionParser of inkex.Effect (lines 26ff)
2. The entered string in the field "Unicode character" in dialog "2 - Add Glyph Layer" is passed to the option "unicodechars"
3. For every char in "unicodechars" (line 40) a Layer is created (line 42) with label 'GlyphLayer-'+char (line 43)

At this point there is a problem if the char in not unicode or ascii.

A workaround is to decode the char [char.decode('ISO-8859-15')] in line 42 as follows:
layer.set(inkex.addNS('label', 'inkscape'), 'GlyphLayer-'+char.decode('ISO-8859-15'))

In this example I use the 'ISO-8859-15' encoding. Please adapt it to the encoding in your system.

Another workaround would be to adapt the extension and allow to enter ranges, e.g. 1-4,6-8:
As an example, replace line 40 by the following:

for char in [i for r in (r.split('-') for r in unicode_chars.split(',')) for i in range(int(r[0]), int(r[-1]) + 1)]:

me-kell (me-kell) said : #5

Please replace the last line in the comment above by the following:

for char in [str(i) for r in (r.split('-') for r in unicode_chars.split(',')) for i in range(int(r[0]), int(r[-1]) + 1)]:

The list comprehension "[str(i) for r in (r.split('-') for r in unicode_chars.split(',')) for i in range(int(r[0]), int(r[-1]) + 1)]" allows to enter comma separated ranges and creates a list.

being unicode_chars the string "1-4,6-8" the list comprehension above will be the list ["1", "2", "3", "4", "6", "7", "8"]

Hachmann (marenhachmann) said : #6

me-kell, thanks :) At the time, I was only playing around with everything I could find in Inkscape, and eventually ended up trying to create an svg font.

Would you be interested in submitting a fix/patch for this extension, which other users who don't dare to edit python code and inx files could profit from (I'm not entirely sure I could do that...)?

It looks as if you've already gotten quite far (if I understand correctly, only some generalization for the encoding is missing, but I might have gotten that wrong).

Would be great to have a new contributor!

me-kell (me-kell) said : #7

I'm new to Inkscape and the possibilities of Inkscape's extensions (Inkscape SVG Effects).
At the time being I'm investigating extension bundle "Typography": layers2svgfont, new_glyph_layer,
next_glyph_layer, previous_glyph_layer, setup_typography_canvas, svgfont2layers.

I need to make sure that there are no conflicts due to modifications with the other extensions in this "bundle" .
Other extensions in this bundle depend also on this convention (i.e. "GliphLayer-###").

Though I'm not yet sure what would be the best workflow for this extension bundle,
for the time being we could simply decode the char to one of the following:

    sys.stdin.encoding
    sys.getfilesystemencoding()
    locale.getdefaultlocale()[1]

Even if none of them is an optimal solution. In MS-Windows 7 they return:

    sys.stdin.encoding -> cp0
    sys.getfilesystemencoding() -> mbcs
    locale.getdefaultlocale()[1] -> cp1252

But in some Unixes sys.getfilesystemencoding() could return None

Hachmann (marenhachmann) said : #8

I can't help with that, unfortunately - maybe someone on the devel mailing list has some deeper knowledge about character encoding using python on different OSs?

Or you can ask on the user mailing list if someone would like to try it out, so you get info on what happens on different systems. I'd volunteer for Ubuntu testing ;) Or maybe you could just ask people to input the unicode code, instead of the actual char.

me-kell (me-kell) said : #9

Thank you for the hints. I guess I know how python encoding is working.

The problem is: why 'sys.stdin.encoding' is returning 'cp0'?
In other words: which environment is passed to python by Inkscape's subprocess when calling the extensions?

By now, let me do some checks before I'm able to post a meaningfull question.

me-kell (me-kell) said : #10

I can confirm that Inscape is not passing the PYTHONIOENCODING variable to the environment of the python command.
In this case Python guesses the encoding of stdin which as called from Inkscape subprocess has no encoding and therefore results in a non desirable encoding 'cp0'.

Best would be in Inkscape's extension allowed passing environment variables. Otherwise those environment variables should be declared in system wide manner which is not desirable. Can you give me a hint where I can post this issue?

The recommended way in Python is to guess the encoding (1) from 'sys.stdin.encoding' and if this is None (2) from 'sys.getdefaultencoding()'. Unfortunately 'sys.getdefaultencoding()' returns 'ascii' and is not usefull for the case in this extension.
Moreover, the Pyhon version packaged with Inkscape 0.91 is 2.6.5. In this Python version 'sys.stdin.encoding' does not return 'None' as expected but 'cp0', suggesting an encoding which definitely does not exist. It would be advisable to package a new Python Version (e.g. 2.7.10) with next Inkscape release.

For the extension above I suggest to set the encoding as follows:

    encoding = sys.stdin.encoding
    if sys.stdin.encoding == 'cp0' or sys.stdin.encoding is None:
        encoding = locale.getpreferredencoding()

and decode the chars with the guessed encoding

    char.decode(encoding)

That way the extension would take into account the PYTHONIOENCODING variable if set in the system and, if not, the encoding would be determined through the locale.

I'll check if there are other parts in this "bundle" that depend on the encoding set here and submit a patch. BTW, can you give me a link to read how to contribute? Thanks in advance

Hachmann (marenhachmann) said : #11

For discussion, I think the developer mailing list would be the best place:
https://inkscape.org/en/community/mailing-lists/

I'm not sure though if they will want to make any bigger changes to the extensions system, as there is a (far-in-the-future) plan to create a real API for extensions to use: http://wiki.inkscape.org/wiki/index.php/Roadmap (Inkscape 1.4 - so *very* far currently - and I don't know if this change is easy to implement or would entail many other tasks).

For submitting a change (most easily as either a patch file against trunk, or maybe just the extension files in a zip), and getting feedback about it, I believe you should visit the 'bug' section here on lp, create a new bug report (I think there isn't one for this issue yet), explain what you did and why, link to this question, append your changes, and ask for feedback. You'll get extra points (at least from me and other users ;) ) for good documentation!

I couldn't find the process of 'how to commit a change when you're new' documented currently, and - being a member of the inkscape-web team - I know we have a bug report asking to make this info available on our website...

But as I'm not one of the developers, and I'm just writing up here what I've learnt from watching the process, there's probably someone out there who knows about the proper protocol better than I do ;)

Hope I didn't scare you now - it's just that I'm not the best person to ask this. The devs will just tell you if they want something done differently, and they are a friendly (but quite busy) bunch, as far as I can say :)

Hachmann (marenhachmann) said : #12

Btw. I just found that the calendar extension allows you to explicitly choose the system encoding. Maybe that's a possible alternative way to go?

me-kell (me-kell) said : #13

Thank you fot the hint. The calendar extension allows to choose encoding but AFIK the inkscape's extension architecture cannot detect the system encoding and set it as default in the params dialog. This would mean that we should build a dialog with two new params: e.g. a checkbox "use alternative encoding" and "select encoding". This would increase the complexity of this (otherwise not so well documented) bunch of extensions.

I'm inclined to not modify this extension more that absolutely necessary (e.g. as proposed above in #10). I'll post this as a bug and propose a patch.

me-kell (me-kell) said : #14

After manually creating a bug report for this I realized the options "Create bug report" and "Link existing bug".
I link this question to the bug (https://bugs.launchpad.net/inkscape/+bug/1518302).