Scrapers class cannot begin with a number

Asked by Ged Walsh on 2009-12-17

eg

class 1977(BasicScraper):
    latestUrl = 'http://www.1977thecomic.com/'
    imageUrl = 'http://www.1977thecomic.com/%s'
    imageSearch = compile(r'<img src="(http://www.1977thecomic.com/comics-1977/.+?)"')
    prevSearch = compile(r'<a href="(.+?)"><span class="prev">')
    help = 'Index format: yyyy/mm/dd/strip-name'

Windows, Python 2.6.4, Twisted 9.0.0

Question information

Language:
English Edit question
Status:
Answered
For:
Dosage Edit question
Assignee:
No assignee Edit question
Last query:
2009-12-17
Last reply:
2010-01-12

This question was originally filed as bug #497951.

Ged Walsh (bleedingheart) said : #1
Tristan Seligmann (mithrandi) said : #2

This is a fundamental consequence of scrapers being defined as Python classes; a python identifier (variable name, function name, class name, etc.) cannot start with a digit.

We can probably do something about this, at least partially, through some kind of "alias" mechanism, but this has not yet been implemented.

Ged Walsh (bleedingheart) said : #3

Thanks for info.

Cheap workaround is to begin identifier with an underscore and set name. eg

class _1997(BasicScraper):
    name = '1997'
    latestUrl = 'http://www.1977thecomic.com/'
    imageUrl = 'http://www.1977thecomic.com/%s'
    imageSearch = compile(r'<img src="(http://www.1977thecomic.com/comics-1977/.+?)"')
    prevSearch = compile(r'<a href="(.+?)"><span class="prev">')
    help = 'Index format: yyyy/mm/dd/strip-name'

Ged Walsh (bleedingheart) said : #4

comic.py

49c49
< comicDir = os.path.join(basepath, self.moduleName.replace('/', os.sep))
---
> comicDir = os.path.join(basepath, self.moduleName.replace('/', os.sep).replace('_', ''))

Tristan Seligmann (mithrandi) said : #5

As Ged mentions above, the correct way to do this is to use a valid Python identifier for the class name, and then set the name attribute on the class to override this. Prepending the number with an underscore might not be the best convention because this usually indicates a private name in Python convention, but I'm not sure what would be a better idea.

Can you help with this problem?

Provide an answer of your own, or ask Ged Walsh for more information if necessary.

To post a message you must log in.