Ubuntu

I need a text editor similar to Windlows Notepad that works in Ubuntu 10.10 and will open 50 MB files.

Asked by alan detwiler on 2011-03-11

Where can I get a text editor similar to Windlows Notepad that works in Ubuntu 10.10 and will open 50 MB files. Gedit will only open files less than about 4 MB. The OpenOffice word processor apparently will not open large files. I much prefer a very simple text editor, as simple as Notepad, with a graphical user interface. Would prefer a free application.

Question information

Language:
English Edit question
Status:
Solved
For:
Ubuntu software-center Edit question
Assignee:
No assignee Edit question
Solved by:
Manfred Hampl
Solved:
2011-03-16
Last query:
2011-03-16
Last reply:
2011-03-16
Curtis Hovey (sinzui) said : #1

Ubuntu ships with vim, a command line modal editor. gvim, the gui version, is installable from the software center.

alan detwiler (gizmocrafter) said : #2

I installed gvim. When I use it to open files larger than about 4 MB, garbage is displayed with no readable text. I assumed that gvim is not capable of displaying large files.

If you need read-write access to large text files, you can install cream (text editor):

http://cream.sourceforge.net/featurelist.html

You can install it with the following Terminal command:

sudo apt-get update && sudo apt-get install cream

If you only need read-only access to the large text file, you can install rowscope using the following Terminal commands:

wget wget http://sourceforge.net/projects/rowscope/files/1.0/rowscope_1_0_linux_gtk_32.jar

java -jar rowscope_1_0_linux_gtk_32.jar

More info about rowscope is here:

http://rowscope.sourceforge.net/usage.html

alan detwiler (gizmocrafter) said : #4

I installed cream. Cream opened a 4MB file okay. On 2 other files larger than 4 MB, the files opened, but between all the text characters there are extra characters in light blue; lots of @s and ^s. What's with that.

mycae (mycae) said : #5

alan:

the "@" and ^ symbols are non-printing characters, such as "bell" "data link escape" "record separator" and so-forth.

The problem is probably that your text file contains a mix of binary and non-binary data, and attempting to decode the bistream that is the file using the charachterset that is auto-chosen is failing.

There are two possible reasons
1) The auto-detection routine for charachter encoding is wrong. Do you know the encoding used by the file? (ASCII, UTF-8/16/32/whatever, Latin-1, etc etc)
2) your file actually contains binary data that is not supposed to be interpreted as text -- what is the file? I quite often open large files in either gedit (~200MB) or vim. Its not speedy, but it does work.

alan detwiler (gizmocrafter) said : #6

The file was created in Microsoft Notepad by keying in letters to make words. So (I'm guessing) it's just plain ASCII.

Andre Mangan (kyphi) said : #7

If you have "Wine" installed you will find Wine Notepad there.

alan detwiler (gizmocrafter) said : #8

I installed wine. Used wine notepad to open a file of about 1 MB. It worked okay. Tried a couple of files over 5 MB and the up/down scroll bar would not work properly. The bar can be dragged to move down for the first 1 quarter of the file. After that the display of text jumps back to the beginning of the file. And when the scroll bar is released, it snaps back up to the start of file position. The scroll bar cannot be used to view more than the first 1 or so MB of a large file.

mycae (mycae) said : #9

I take it you didn't key in 50MB worth of letters -- thats a lot of keystrokes!

>The file was created in Microsoft Notepad by keying in letters to make words. So (I'm >guessing) it's just plain ASCII.

Not quite. Windows, unlike every other OS, uses a two-byte sequence to denote endlines, carriage-return and line-feed (think old typewriters). All unix machines (this includes mac), use line-feed only. So the extra ^@ symbols denote the bonus byte.

https://secure.wikimedia.org/wikipedia/en/wiki/Newline#Common_problems

Now, you can fix this in vim by setting the filetype from "unix" to "dos", or by using the "dos2unix" tool at the command line, or by piping the contents of your file through the dos2unix program (1,$ !dos2unix) should do it; without the parenthesis. You need the dos2unix program installed.

alan detwiler (gizmocrafter) said : #10

I did type in all 50 MB, over something like 6 years. So how come with smaller files no extra characters appear when using gvim but on larger files, extra characters do appear, and the extra characters are not just at beginnings of lines or ends of lines. Every letter has 2 extra characters. Same thing happens with cream. And with changing file type in gvim from unix to dos does not help. I didn't try changing file type in vim (I didn't see how to do that) but I suspect the same result would happen as happens with gvim.

mycae (mycae) said : #11

Goodness! That is a lot of keystrokes!

>So how come with smaller files no extra characters appear when using gvim but on larger files, extra characters do
>appear, and the extra characters are not just at beginnings of lines or ends of lines. Every letter has 2 extra
>characters. Same thing happens with cream

I suspect this has to do with the encoding detection. There may be a low probability in your data of hitting a non-ascii (eg UTF8) character for some reason. For example, the TM symbol, the copyright symbol, and lots of maths symbols are not in the ascii set (eg TM is at 2122; which is outside of the extended ascii 0->255 range).

I am unsure how vim does the encoding detection, but it may just use a short sequence from the beginning of the file, or random byte sequences through your file, rather than the full lot. If it misses/hits this special char, then it may alter the behaviour.

Can you report the fileencoding and file format values that vim/gvim is using; to do read your file? You can do this with the command ":set fileencoding" and ":set fileformat"

Are these the same on the small and large files?

Andre Mangan (kyphi) said : #12

The two text editors that are reputed to be able to handle very large files are vim and emacs.

Quote from Wikipedia on emacs:
Handling large files on 32 bit systems is still a weak point for Emacs. Before version 23.2 Emacs could handle files up to around 256 MB, with 23.2 this was raised to around 512 MB. Emacs on 64 bit systems does not suffer from this problem, it can open files up to 1024 petabytes.

Emacs 23.1 is in synaptic.

alan detwiler (gizmocrafter) said : #13

I've never used vim, gvim, cream, emacs, or any other command line oriented text editors. So I don't know where you keying in what and in what exact format. I just spent half an hour trying to get ":set fileencoding" to do something in gvim. All I got was error messages, not found messages and such. I would rather have a gui text editor similar to Microsoft Notepad than spend many hours reading how to use a command line editor. Is there no such thing in Linux that works with 50 MB files?

Andre Mangan (kyphi) said : #14

At the following URL you can compare the features of all the text editors for all platforms:

http://en.wikipedia.org/wiki/Comparison_of_text_editors

For your information, MS Notepad's file handling capacity is limited to 45 to 54 Kb (sourced from http://support.microsoft.com/kb/59578).

Perhaps installing more memory may help.

alan detwiler (gizmocrafter) said : #15

That limit on Notepad is incorrect. I very often open a 40-some MB file using MS Notepad. It opens and edits without problems. The 50 KB limit may have been true of version of Notepad before Windlows XP.

Maybe the additional memory would help. I am using 512M. That's double the amount that the Canonical web says is needed. It would seem, however, that 256 MB over the minimum would at least enable using files of 10 MB, which do not load properly for me with vim, gvim, or cream.

marcus aurelius (adbiz) said : #16

notepad does INDEED have a file size limit. it conks out when the file gets large. you must be thinking of wordpad which has a limit as well.

i know of people using openoffice writer to write theses which are several 100 megs, so i don't see why you're having problems with your 40 mb file.

it is recommended that you have 512 megs of ram minimum to run ubuntu. maybe you're thinking of some other operating system that only requires 256.

alan detwiler (gizmocrafter) said : #17

I just checked the Ubuntu 10.10 CD that Canonical sent to me about 3 weeks ago. It says on the CD jacket that 256 MB ram is minimum. No mention is made that the CD contain a stripped-down version. However, it does say on the CD jacket "this edition requires at least 256 MB ram". After installing from the CD, the system automatically reinstalled via a download. I suppose that download requires the 512 MB ram.

I'm sure I'm using Windows Notepad to load and edit 40 MB files. I've been doing so almost daily for a couple of years. Perhaps Notepad can only load large files if sufficient memory is available.

I suppose the thing for me to do is to install more ram on my Ubuntu powered system and see if that fixes the problem.

So, I'll post again, probably several weeks from now, if I decide to invest $50, or whatever, for new memory.

mycae (mycae) said : #18

I don't think additional memory will solve your problem; i am reasonably convinced its a software issue.

Apologies for pointing you at vim; you originally mentioned you had installed gvim, so I assumed you were still keeping with it.

If you can upload an example file somewhere that exhibits the problem (maybe it contains private information, so you cant), then we can have a look at it. The only reason you should ever see non-printing chars in any text editor is if there is an encoding problem.

With an example file, we may be able to generate a set of steps to guide you.

alan detwiler (gizmocrafter) said : #19

There's a 40 MB file that won't display correctly in GVim and there's a roughly 10MB that won't open in gedit. Which do you want and how do I upload it.

mycae (mycae) said : #20

Try this host.

http://2shared.com/

If that doesn't work, here is a list:
http://filesharing.wikidot.com/compare:one-click-hosters

mycae (mycae) said : #21

P.S. I won't be able to look at this for at least 12 hrs; someone else might be able to help you in the interim however.

alan detwiler (gizmocrafter) said : #22

Gedit will not display the following file:
http://www.2shared.com/document/DpnDUnvB/Menu.html

gVim displays this file with very many extra non printing characters:
http://www.2shared.com/document/XtIG9ggV/good_eats_working_copy.html

Manfred Hampl (m-hampl) said : #23

I looked at your first file with MS Notespad on XP and I see 20 very long lines of unprintable characters near the end (search for "Apple cilantro seed and", it starts 8 lines below that string). So maybe your file is already somewhat broken, leading to the loading problems with some editors?

I have not yet opened the second file.

Best Manfred Hampl (m-hampl) said : #24

I found a similar file corruption also in the middle of the second file, search for "of 1 ear sweet corn, cutting"

So it seems to me that the problem is not the absolute file size, but strange characters in the file.

Apparently all the text editors that you tried crash if there are certain unprintable characters in the text, whilst MS Notepad still can open these files.

alan detwiler (gizmocrafter) said : #25

I loaded the two files in MS Notepad and deleted the garbage. Both files now display correctly in gvim and gedit.

Previously, I had put both files onto the ubuntu computer by copying them from a flashdrive. Those files on the flash drive both had the corrupted sections. Do you have any insight on how I can avoid this in the future. Get a new flash drive? Use System/Administration/Disk utility/Check file system? to repair the flash drive? Use some other method other than a flash drive to backup and tranfer files?

How did you locate the bad sections? That's a lot of text to scroll through visually.

Thanks,
Alan Detwiler

alan detwiler (gizmocrafter) said : #26

Thanks Manfred Hampl, that solved my question.

mycae (mycae) said : #27

Hi Alan,

Glad to see that your problem has been solved. Yes, I would recommend discarding the flash drive.

Manfred Hampl (m-hampl) said : #28

@Alan Re: "How did you locate the bad sections? That's a lot of text to scroll through visually."

When I opened the first file in Notepad I noticed, that the horizontal scroll bar indicated that there was at least one very long line in the file (setting - no automatic line wrapping)
So I pulled down the vertical scroll bar with the mouse as slow as I thought I need to get a glimpse of a line that extends to the right of the visuable window. And there it was. It took me another minute to identify a string near that part that is unique to guide you there.

And searching the second file did also not take much longer than it's download on my slow internet line ;-)

I have not checked if the two part described above are the only corruption in the files. I assume that when opening them in Notepad with 'no automatic line wrapping', if the horizontal scroll bar does no more indicate overlength lines, you can be rather confident that the files are ok again. (I think you have to toggle line wrap back and forth to update the horizontal scroll bar size.)

alan detwiler (gizmocrafter) said : #29

Thanks for your help Manfred. I may have never thought of checking the files that way.

Jeruvy (jeruvy) said : #30

Many times I like to compare files when copying to flash drives.

$ diff -f file1 file2

Any results means it didn't copy exactly.

You could also store md5/sha1/sha256 hash's of the file and compare them also.

Cheers.

alan detwiler (gizmocrafter) said : #31

I've never used text commands so it would take me awhile to learn how to use $ diff -f
For now I'll probable just open the file in gedit. If it opens, I suppose that's an indication that the file is not corrupt.