finding what files are stored in a specific physical range on disk

Asked by Eliah Kagan

How do I find out what files are stored in a specific range on a specific physical disk? I have a 1.8 TiB partition, and I need to find out which of the files in it have any (some or all) of their data stored in the first 713,996 KiB of a particular partition.

Here's why I have to do that:

ek@Del:~$ lsb_release -a; uname -a; echo; mount | grep /dev/sdb; df -h | grep /dev/sdb; sudo fdisk -l /dev/sdb
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 11.04
Release: 11.04
Codename: natty
Linux Del 2.6.38-11-generic #50-Ubuntu SMP Mon Sep 12 21:18:14 UTC 2011 i686 i686 i386 GNU/Linux

/dev/sdb1 on /media/dAlembertian type ext4 (rw,commit=0)
/dev/sdb1 1.8T 1.4T 318G 82% /media/dAlembertian

Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
19 heads, 24 sectors/track, 8568046 cylinders
Units = cylinders of 456 * 512 = 233472 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x40004468

   Device Boot Start End Blocks Id System
/dev/sdb1 * 1 3132 713996 17 Hidden HPFS/NTFS

I created this icky situation by accidentally writing a CD .iso image to the wrong device (actually, to the "right" device, but on the wrong computer--I failed to realize that I was SSHed into a different machine!). I terminated dd a couple of seconds after launching it, so it is unlikely that the entire ~700 MiB range was written--but unless there is an efficient way to find out how much was written (the dd command does not provide this information when it quits from SIGINT), I figured that I'd better assume that it was.

All the important data on the drive are backed up. But some are unimportant enough that I can live without them, but would prefer to have them. And more importantly, some of the backups would be very time-consuming to restore (for example, I would have to go and re-rip a number of DVD's and audio CD's, redownload various content, and so forth).

Since the partition with the old layout was still mounted[1], I went ahead and copied its entire contents to a separate 1.8 TiB drive. Now, even if I unmount /dev/sdb1 or reboot, I will still have the data readily accessible on that drive. But I fear that some of the data are probably corrupted, that is, that a number of files have had some of their data overwritten by the data written from the CD .iso image.

That is why I want to be able to get a list of files that have some or all of their data stored in the first 713,996 KiB of /dev/sdb1. Then I can just restore those files. That would be enormously easier and faster than attempting to restore all 1.4 TiB. Of course, if there is a way for me to narrow down the overwritten portion of the disk to less than 713,996 KiB and discover what files have some of their data in the smaller portion, that would be even better.

[1] In case this somewhat strange situation is better explained other than in my own words, here's the message that is printed out when gparted enumerates the partition table on /dev/sdb: "Partition(s) 1 on /dev/sdb have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes." Conveniently, the kernel still knows about / believes in the old partition table, so I was able to copy the data off it (sort of like cartoon characters not falling until they look down and see they have run off the edge of a cliff).

Question information

Language:
English Edit question
Status:
Solved
For:
Ubuntu util-linux Edit question
Assignee:
No assignee Edit question
Solved by:
Eliah Kagan
Solved:
Last query:
Last reply:
Revision history for this message
Ubfan (ubfan1) said :
#1

Compare a list of your last full backup to the files you rescued, the diff would be the lost files (or not backed up).
Do a filessystem repair before dismounting, that may be a one way operation otherwise.
Since all (most?) of the inode and dir info may have been wiped out along with the files, I know no way to directly recover them, unless you had a backup list of them somewhere.
For important text files, you could dd the 700M, split it up into editable chunks, identify interesting chunks with grep, and edit out the text.
Sigh, these things to happen, but at least you could rescue the rest of the disk.

Revision history for this message
Bob Bib (bobbib) said :
#2

You can try to make an image from that area with dd, then use testdisk / photorec (http://www.cgsecurity.org/wiki/Main_Page) to do something with it.
No results are guaranteed though.

Revision history for this message
Eliah Kagan (degeneracypressure) said :
#3

@Ubfan
"Compare a list of your last full backup to the files you rescued, the diff would be the lost files (or not backed up)."

The backups (where the data are important) are sufficient to avoid information loss, but they are not of the same file structure; the data is not even encoded in the same way in many cases.

However, I do not thing this approach would work, because as far as I can tell, listing the files on the drive doesn't have anything missing at all. I can still access all the files on the drive, as far as I can tell. My concern is that some of the files have probably been partially overwritten. That is why I need to know where they are physically stored. For each file, if it was physically stored in a part of the partition that was possibly written to by dd, then I cannot trust their integrity. If it was not, then I can trust its integrity.

"Since all (most?) of the inode and dir info may have been wiped out along with the files, I know no way to directly recover them, unless you had a backup list of them somewhere."

I don't want to recover them. I just want to know what files, of the ones that I copied off to the other 1.8 TiB drive (which I really do think is all of them), are intact. Then I'll restore/recreate the rest of them from backups.

"For important text files, you could dd the 700M, split it up into editable chunks, identify interesting chunks with grep, and edit out the text."

I am happy to say that the backups are sufficiently accessible as to be easier (and much more likely to succeed) than such an operation.

"Do a filessystem repair before dismounting, that may be a one way operation otherwise."

How should I go about that? (I'd like to wait to do it until I have obtained the information I'm looking for -- about the physical locations of the data in the files on the disk -- but I suppose that, however that goes, I will then attempt a repair with the hope of saving some 20 hours that it would otherwise take to copy the 1.4 TiB of data back from the other 1.8 TiB drive, which is attached via USB 2.0.)

@Bob Bib
"You can try to make an image from that area with dd, then use testdisk / photorec (http://www.cgsecurity.org/wiki/Main_Page) to do something with it."

I could do that, but as I said, I don't want to attempt to rescue any files that I have not already rescued. I want to find out which of the rescued files has data stored in the first 700 MiB of the partition (or in a smaller amount, if there is some way to find out how much of the new partition was actually written).

I still have the system up and running, and the partition mounted. Any ideas?

Revision history for this message
Ubfan (ubfan1) said :
#4

If you can see all the files, look to their inode information, I think that contains the file location, (or was it the directory entry, I forget).

Revision history for this message
Eliah Kagan (degeneracypressure) said :
#5

Do you know of any existing utility that shows this information, or do I have to write a POSIX API using program to access and process it?

Revision history for this message
Ubfan (ubfan1) said :
#6

Try a package like istat. After some googling, I was amazed this important info seems to be buried so deeply.

Revision history for this message
Ubfan (ubfan1) said :
#7

Take a look at dumpe2fs also

Revision history for this message
Eliah Kagan (degeneracypressure) said :
#8

istat (provided by the package called sleuthkit) does not seem to support ext4 filesystems. I am guessing that from the error message reproduced below along with other text from the Terminal provided context, but also from http://sleuthkit.org ("...can be used to analyze NTFS, FAT, HFS+, Ext2, Ext3, UFS1, and UFS2 file systems and several volume system types.").

ek@Del:/media/dAlembertian/Anime/[Froth-Bite_Menclave]_Toshokan_Sensou_-_01-12_[1280x720_H.264_AAC]$ ls -i | head -1
786491 [Froth-Bite_Menclave]_Toshokan_Sensou_-_01_[1280x720_H.264_AAC][9352C863].mkv
ek@Del:/media/dAlembertian/Anime/[Froth-Bite_Menclave]_Toshokan_Sensou_-_01-12_[1280x720_H.264_AAC]$ mount | grep /media/dAlembertian
/dev/sdb1 on /media/dAlembertian type ext4 (rw,commit=0)
ek@Del:/media/dAlembertian/Anime/[Froth-Bite_Menclave]_Toshokan_Sensou_-_01-12_[1280x720_H.264_AAC]$ sudo istat /dev/sdb1 786491
inode: 786491
Allocated
Group: 96
Generation Id: 227488711
uid / gid: 1000 / 1002
mode: rrw-r--r--
Flags:
size: 359455176
num of links: 1

Inode Times:
Accessed: Tue Oct 18 03:13:19 2011
File Modified: Fri Aug 5 18:32:36 2011
Inode Modified: Fri Aug 5 18:32:36 2011

Direct Blocks:

Error reading file: Error in metadata structure (unix: Indirect block address too large: 1421435139)
ek@Del:/media/dAlembertian/Anime/[Froth-Bite_Menclave]_Toshokan_Sensou_-_01-12_[1280x720_H.264_AAC]$ sudo istat -f ext4 /dev/sdb1 786491
Unsupported file system type: ext4
usage: istat [-B num] [-f fstype] [-i imgtype] [-b dev_sector_size] [-o imgoffset] [-z zone] [-s seconds] [-vV] image inum
 -B num: force the display of NUM address of block pointers
 -z zone: time zone of original machine (i.e. EST5EDT or GMT)
 -s seconds: Time skew of original machine (in seconds)
 -i imgtype: The format of the image file (use '-i list' for supported types)
 -b dev_sector_size: The size (in bytes) of the device sectors
 -f fstype: File system type (use '-f list' for supported types)
 -o imgoffset: The offset of the file system in the image (in sectors)
 -v: verbose output to stderr
 -V: print version
ek@Del:/media/dAlembertian/Anime/[Froth-Bite_Menclave]_Toshokan_Sensou_-_01-12_[1280x720_H.264_AAC]$ sudo istat -f list
Supported file system types:
 ntfs (NTFS)
 fat (FAT (Auto Detection))
 ext (ExtX (Auto Detection))
 iso9660 (ISO9660 CD)
 hfs (HFS+)
 ufs (UFS (Auto Detection))
 raw (Raw Data)
 swap (Swap Space)
 fat12 (FAT12)
 fat16 (FAT16)
 fat32 (FAT32)
 ext2 (Ext2)
 ext3 (Ext3)
 ufs1 (UFS1)
 ufs2 (UFS2)

It does seem to work for some files (http://paste.ubuntu.com/719833/), though I am a bit unsure as to how to interpret the output.

As for dumpe2fs, how is that relevant to the problem of finding out where blocks from an individual file are stored or (better) finding out which individual files correspond to a specified range of blocks (which is what I really need to do).

I had been thinking that if no utility were suitable, then I could write a utility myself. But the errors I got from istat, unless they are just because ext4 is unsupported, suggest to me that this might be an enormous project[1], and that it might be better for me to either restore the data from backups, or live with the (increased, compared to normal) probability that some of it is corrupted.

[1] A utility, to be suitable, would have to be able to print out the blocks associated with thousands of different files, with no errors, so that these data could be processed to produce a list of files containing blocks that may have been overwritten by dd.

Revision history for this message
Ubfan (ubfan1) said :
#9

Use dumpe2fs to look at the area written to and just past.... If the groups are all "empty", or "empty" soon after the start of the 700mibi, maybe NO file data was actually present. The fact that you seem to be able to "see" all the files indicate that this may be the case.
  Starting with the "ls -i" command, getting the inode and then the blocks/indirect blocks... shouldn't be that difficult a programming task -- just ignore any ext4 complications.

Revision history for this message
Launchpad Janitor (janitor) said :
#10

This question was expired because it remained in the 'Open' state without activity for the last 15 days.

Revision history for this message
Eliah Kagan (degeneracypressure) said :
#11

I'm going to go ahead and mark this as Solved. The most recent advice you gave me looked like it would work...but before I had a chance to implement it, there was a power failure and the machine went down. Now the machine--old and already possessed of BIOS problems--doesn't boot. The drive itself should still contain the data, but I'd have to recover the partition first, which I don't consider worthwhile.

Revision history for this message
Eliah Kagan (degeneracypressure) said :
#12

Thanks Ubfan, post #9 would have Solved my problem. ;-)