Several documentation-related questions

Asked by Jamie

1) Would you detail the logic behind enforcing unicode other than Python 3.0 compatibility and it's a good idea? In general, I tend to agree except that it may be helpful in some circumstances to be able to store arbitrary binary data without first base64 encoding it. (CouchDB had this issue as well, but now permits binary attachments.) Most likely, searching binary data would be almost meaningless, but perhaps Hyper Estraier could be used as a multi-purpose data store if unicode was not enforce. I.e., rather than enforce unicode, data could be treated as arbitrary byte() streams and return results in the same format as submitted. This will allow users to encode data as they please rather than enforcing any restrictions, and also permits the use of binary objects or at least non-unicode objects for arbitrary fields or even the body. Of course, then data would have to be converted back to unicode by hand after retrieval.

2) Is it possible to extend the documentation with more examples and/or greater API docs? It is very difficult to find out good information (short of undocumented unit tests) on how to do things like attribute searches, or even simple stuff like how many documents are in the index.

3) Is it possible to use estmaster with an index created using HyPy?

4) Reads and writes in another process seem to block when a DB is open. Also, occasionally I receive "Access Denied" with no other details when attempting to open a database. Is this a HyPy or HyperEstraier error?

5) Is a HyPy mailing list or IRC channel a more appropriate forum for these questions?

Finally, I must congratulate you on a well-written and designed system. It seems to be very Pythonic and reduces some of the complexity of the C++ interface without seeming to affect the flexibility and power.

Regards,
Jamie

Question information

Language:
English Edit question
Status:
Answered
For:
Hypy Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:

This question was originally filed as bug #320994.

Revision history for this message
Cory Dodt (corydodt) said :
#1

Hi Jamie,

Good list of issues. Most of my answers boil down to "please open a
separate bug for that", but see below :)

On Sat, Jan 24, 2009 at 4:56 PM, Jamie <email address hidden> wrote:
>
> Public bug reported:
>
> 1) Would you detail the logic behind enforcing unicode other than Python
> 3.0 compatibility and it's a good idea? In general, I tend to agree
> except that it may be helpful in some circumstances to be able to store
> arbitrary binary data without first base64 encoding it. (CouchDB had
> this issue as well, but now permits binary attachments.) Most likely,
> searching binary data would be almost meaningless, but perhaps Hyper
> Estraier could be used as a multi-purpose data store if unicode was not
> enforce. I.e., rather than enforce unicode, data could be treated as
> arbitrary byte() streams and return results in the same format as
> submitted. This will allow users to encode data as they please rather
> than enforcing any restrictions, and also permits the use of binary
> objects or at least non-unicode objects for arbitrary fields or even the
> body. Of course, then data would have to be converted back to unicode by
> hand after retrieval.

The Unicode enforcement comes from my experience using some of the
Divmod products such as Axiom, and from using Storm which is based on
Axiom and adopted the same philosophy toward Unicode. In essence it
boils down to this: *force developers to think about encodings*. The
alternative is "allow developers to be stupid and introduce bugs,
which they might blame on you". This is also implicitly Python 3.0's
philosophy, so at least I'm in good company thinking this.

If your API forces you to use unicode, you are forced to decode your
byte strings, which forces you to think about the encoding your byte
strings are in, which forces you to write applications that are
explicit about encodings. Byte-encoded text belongs on disks and
network wires; once text finds its way inside programs, it should all
be Unicode except right at the boundaries where bytes need to be
converted.

I don't think that philosophically it's a good idea for a special
purpose library like this to be something it's not, e.g. a
multi-purpose data store. That said, if you can propose a different
API I can live with (or better yet implement it, with unit tests :-) I
will probably put it in. Possibly a flag on addText / addHiddenText?
Or allow those APIs to accept a Bytes object, so you're forced to
think about why you wanted to put Bytes into this API meant for text..
 This should become a separate bug if you want to do that. (FYI:
Axiom, at least, does have a Byte type iirc, so it would be
hypocritical for me to defer to them on philosophy and then implement
it half-assed.)

> 2) Is it possible to extend the documentation with more examples and/or
> greater API docs? It is very difficult to find out good information
> (short of undocumented unit tests) on how to do things like attribute
> searches, or even simple stuff like how many documents are in the index.

This is something I should target to 0.9 or so. Want to submit a
separate bug for this listing briefly the things you'd like to see
documented better? e.g.

- attribute searches
- len() on indexes
- ...

> 3) Is it possible to use estmaster with an index created using HyPy?

Hypy doesn't do anything unusual with indexes, or even really interact
with them on that level - all access to the database, including
creation, is done through Hyper Estraier APIs. So yeah, it should
work, but I haven't tried it. However, Hypy itself doesn't yet
support any of that federated search stuff, so an application that
accessed an index through Hypy would not, e.g., be able to eclipse
results; but any *other* application that accessed the same index
would be able to use it that way with no problems.

> 4) Reads and writes in another process seem to block when a DB is open.
> Also, occasionally I receive "Access Denied" with no other details when
> attempting to open a database. Is this a HyPy or HyperEstraier error?

When a database is open, it's locked by Hyper Estraier, and the APIs
Hypy uses obey those locks. I think there might be a way to get a
non-exclusive lock on it (possibly if you open it read-only?). Open a
separate bug for this issue and I'll work on it, it's definitely a
priority for me that Hypy applications should work well in a
multi-process environment, since I use Twisted in my applications.
Worst case scenario, I should be able to document some kind of
workaround for it.

> 5) Is a HyPy mailing list or IRC channel a more appropriate forum for
> these questions?

I'm not getting much feedback on it, so I don't think I'd get much
traction from a public forum, yet. This bug tracker is fine for now.

> Finally, I must congratulate you on a well-written and designed system.
> It seems to be very Pythonic and reduces some of the complexity of the
> C++ interface without seeming to affect the flexibility and power.

Thanks! About 4 years ago there was another Python interface to Hyper
Estraier written, called Hype. It fell off the Internet in the last
2-3 years, and search in Python has sucked ever since. I'm hoping
Hypy can become something of a standard in the community, so I'm
paying attention to presentation details.

C

Revision history for this message
Jamie (jamieson-becker) said :
#2

do you have an offline email? mine is jamieson aht jamiesonbecker daht
cahm, (just pretend you're from boston ;-))

Revision history for this message
Cory Dodt (corydodt) said :
#3

OK, resolving this one. I will open a new one for documentation. It looks like:

1) Unicode to UTF-8: I was already doing this "correctly" I guess, according to http://hyperestraier.sourceforge.net/uguide-en.html

2) is a real problem, I will open a new bug.

3) was informational, nothing to do (If you want to send me any stories about using estmaster (even informally) I will include something in the docs about them.)

4) went away on its own?

5) was informational

So everything here has been resolved once I open a "fix the docs" bug.

Revision history for this message
Cory Dodt (corydodt) said :
#4

Made this a question, since it was.

Revision history for this message
Cory Dodt (corydodt) said :
#5

Answered above.

Can you help with this problem?

Provide an answer of your own, or ask Jamie for more information if necessary.

To post a message you must log in.