Kanji Stroke Data

Asked by Enrique Saul Gonzalez

Hello,

My name is Enrique, I'm currently a graduate student at the University of Tokyo. For my thesis project (description at http://www.iii.u-tokyo.ac.jp/~qq69528/webhappyo/, though mostly in Japanese) I need to repurpose an existing kanji recognition engine to fit the requirements of the software I am developing.

However I am a complete newbie to the field of on-line/real-time kanji recognition. Tagaini Jisho has some of the better kanji stroke animations I've seen, so I was wondering if you would be able to answer my question:

Are there any introductory references to the field? I would like to have a better understanding on how kanji stroke data is analyzed and
represented.

Also, if you happen to know about any other projects that would seem to be particularly suited to my needs, I would be very thankful to hear about them. The bulk of the development is being done in C# under the XNA engine and the target hardware is Windows OS Tablet PCs.

Sorry to bother you and thanks in advance if you can answer any of my enquiries.

Question information

Language:
English Edit question
Status:
Solved
For:
Tagaini Jisho Edit question
Assignee:
No assignee Edit question
Solved by:
Gnurou
Solved:
Last query:
Last reply:
Revision history for this message
Best Gnurou (gnurou) said :
#1

Hi Enrique,

This is not really a Tagaini-related question, but since the topic is
interesting let's discuss that publicly.

The kanji stroke data used by Tagaini comes from the KanjiVG project
(http://kanjivg.tagaini.net ). This is a separate project created by
Ulrich Apel and aims at creating a complete description of kanjis
componentization as well as a set of SVG data that describes their
stroke order. This is this data that is used in Tagaini for kanjis
navigation and of course the stroke animations. I am working with
Ulrich to release this data - while it took more time than expected,
it should happen soon. By the meantime, you can read the articles
Ulrich has published on that matter.

http://www.aclweb.org/anthology-new/W/W04/W04-2106.pdf

Another project that is worth noticing is the Taka database, which
also provides stroke animation data: http://www.sf.net/projects/taka

I tend to think that both projects could be useful to create the
necessary data in order to do kanji recognition. However, I have never
done that myself. Most operating systems come with their own kanji
recognition engine, so I see no need to embed one in Tagaini. If you
want some code, then I guess you will want to look at Kanjipad.
Otherwise, a search on Google Scholar should give you a good overview
of the state of the art in that field.

It is my belief though that both KanjiVG and Taka data could be used
in order to implement new recognition engines. Maybe it would be worth
to describe what you want to achieve more precisely.

Alex.

Revision history for this message
Enrique Saul Gonzalez (esaulgd) said :
#2

Hi,

Thanks for your quick answer. You're right about this not being really
the proper forum for the question, sorry about that.

The problem I'm facing is that the requirements for the recognition
engine that I need are diametrally opposite to the direction kanji
recognition research has moved in for decades.

To be more specific: in my project the user will be shown a keyword
and asked to draw the corresponding kanji. The system then has to make
a yes/no judgement about whether the user wrote the correct strokes in
the correct order.

So while almost all engines try to choose the one pattern that best
matches the user's input from among many, even if there are some
variations in stroke and order, my system needs to strictly match
against one predefined pattern.

As you probably realize, the task I need to accomplish is
significantly easier, but I cannot find much prior art.

Thanks for the database links. I particularly like the KanjiVG format
(still trying to figure out the organization of the Taka DB). Do you
have an estimate on the release time of the KanjiVG data?

I can probably use some of the KanjiPad code, but it looks like I'll
have to experiment a lot on my own. Thanks for the advice.

Revision history for this message
Enrique Saul Gonzalez (esaulgd) said :
#3

Thanks Gnurou, that solved my question.

Revision history for this message
Gnurou (gnurou) said :
#4

Hi,

Unfortunately I do not have any knowledge in the area of kanji recognition - but to me it seems indeed that you are trying to solve an easier problem than the regular kanji handwriting recognition. Both KanjiVG and Taka provide vectorial data about the shape of kanjis. You will probably want to check that the user strokes vectors match the reference shapes well enough, and that they are given in the right order.

KanjiVG should be released "soon". But I was already saying that two months ago. :p I have to do some cleaning in the data before making it public, not a big job, but my TODO list has been huge recently... :(

Revision history for this message
seifip (seifip) said :
#5

@Enrique Saul Gonzalez: I'm the author of http://nihongoup.com/ and I'm working on something similar to what you need... Could you please contact me at <email address hidden>? Thnx.