using xlrd: Error attempting to delete file after wb.release_resources()

Asked by Ron Turrentine on 2018-09-04

OS: Windows 10

This is more of a Jython question that pure Sikuli, but I have not had success in finding answers elsewhere. Hoping that you guys here may be able to help!

My script is attempting to do the following:

1.) Download an Excel workbook (XLSX)
2.) Move the file from the WIndows "downloads" folder to a shared folder
3.) Split the individual worksheets from the original workbook into their own workbooks
4.) Delete the original XLSX file

The error that I am getting is at step #4. It is an error type of "OSError" that says this:

"unlink(): an unknown error occurred: C:\screenshots\SS Case Cancellation Monthly Report.xlsx"

Note that the worksheet splitting is successful. The only failure is the file deletion.

I am using the xlrd library to manipulate the Excel files. The error always occurs on the "os.remove" line.

Here is the portion of my function that is handling the worksheet splits and file deletion:

    def SplitWorksheets(self):
        import xlrd
        from xlutils.copy import copy

        print "Attempting to split worksheets..."

        try:
            x = 0
            base = os.path.splitext(os.path.basename(self.NewFile))[0]

            targetdir = (self.FullPath + "\\" + base) #where you want your new files

            #print base

            if not os.path.exists(targetdir): #makes your new directory
                os.makedirs(targetdir)

            ws_to_split = self.NewFile

            wb = xlrd.open_workbook(ws_to_split, on_demand=True)

            for sheet in wb.sheets(): #cycles through each sheet in the workbook
                newwb = copy(wb) #makes a temp copy of that book
                newwb._Workbook__worksheets = [ worksheet for worksheet in newwb._Workbook__worksheets if worksheet.name == sheet.name ] #keeps only current sheet
                newwb.save(targetdir + "\\" + sheet.name + ".xls") #Saves single sheet as a new wb with its sheet name
                x += 1

            #Closes the workbook, allowing it to be deleted.
            wb.release_resources()

            wb = None

            print "Preparing to delete file: " + self.NewFile
            #Delete original file.
            os.remove(self.NewFile)

            print "Workbook deleted -> " + self.NewFile

            self.NewFile = ""

        except:
            exc_type, exc_val, exc_tb = sys.exc_info()

            ErrorText = "***** ERROR IN SCRIPTNAME = " + sys.argv[0] + " *****\n"
            ErrorText += "Date/Time : " + str(datetime.datetime.now()) + "\n"
            ErrorText += "Host name : " + socket.getfqdn() + "\n"
            ErrorText += "User name : " + getpass.getuser() + "\n"
            ErrorText += "Line Number: " + str(exc_tb.tb_lineno) + "\n"
            ErrorText += "Error Type : " + exc_type.__name__ + "\n"
            ErrorText += "Error Value: " + str(exc_val.message) + "\n"
            ErrorText += "***************************************************************\n\r"

            print ErrorText

            self.LogError()

        else:
            dpmsg = "=*=*=*=*=* " + str(x) + " Worksheets Split from Workbook *=*=*=*=*="

            print dpmsg

        finally:
            sys.exc_clear()

##########################

Any help is greatly appreciated!

Best regards,

Ron Turrentine

Question information

Language:
English Edit question
Status:
Answered
For:
Sikuli Edit question
Assignee:
No assignee Edit question
Last query:
2018-09-05
Last reply:
2018-09-08
RaiMan (raimund-hocke) said : #1

the description of os.remove():
Remove (delete) the file path. If path is a directory, OSError is raised; see rmdir() below to remove a directory. This is identical to the unlink() function documented below. On Windows, attempting to remove a file that is in use causes an exception to be raised;

Your problem definitely is caused by the fact, that the file is not closed in the Windows sense (which is much more restrictive, then on ...ix systems (Mac, Linux, ...).

meaning:
wb.release_resources()

does not do the close() of the file itself.

BTW:
for path creation like
targetdir = (self.FullPath + "\\" + base)

we have
os.path.join(self.FullPath, base)

which saves the path delimiter handling

Ron Turrentine (oldtimerocker) said : #2

Thanks Raiman. I'm going "back to the drawing board" and will doing a bit more research on the proper way to "close" the workbook in xlrd then. From what I've read, "release resources" was *supposed* to do the trick, but it obviously is not releasing the file properly.

Oh, and thank you for the os.path.join tip. I will definitely be using that in my script!

Best,

Ron

Ron Turrentine (oldtimerocker) said : #3

I'm still struggling with this issue. If someone out there has experience with opening XLRD manipulated workbooks in Windows, and then deleting them, I would appreciate your input.

Although I am calling "release_resources" before attempting the file deletion, it appears that the script is still holding tight to the file in memory. From what I can tell from reading the XLRD documentation, that method is the only way to remove the file from memory (other than the script itself ending).

Thanks in advance!

Ron

Manfred Hampl (m-hampl) said : #4

I have searched the web for similar problems like yours.
There are some comments recommending to use the instruction
del wb
instead of
wb = none
Just give it a try. (To be honest, I do not think that it will help.)

Ron Turrentine (oldtimerocker) said : #5

Thanks for you reply Manfred!

I actually found that same suggestion and tried it.
And you are correct, it made no noticeable difference. It's a very strange problem!

If the workbook were the only file in the folder, I would simply try removing the directory and it's contents altogether.
However, there are valuable files inside the directory that I must preserve.

Thanks again,

Ron

Ron Turrentine (oldtimerocker) said : #6

Here is a rather humorous solution, but it works:

My script opens an explorer window to the folder where my Excel file lies (using subprocess.Popen), then finds the Excel icon, right clicks on it, then clicks "Delete".

This is more Sikuli-esque than Pythonic, but at the moment I can't think of any other way. :-/

You can view a screenshot of the script here: https://imgur.com/a/1YZSH45

I'd rather use os.remove, though, if I can figure out how to get it to work.

Thanks!

Ron

Manfred Hampl (m-hampl) said : #7

If deleting in an explorer window works, then it might also be possible to do it in a command line window with

run('del ' + ws_to_split )

Ron Turrentine (oldtimerocker) said : #8

Manfred,

Thanks so much! I actually tried something similar by using "os.system".
Namely: os.system(" del /f " + ws_to_split). It was not successful, but I have not tried it with the run command. I will definitely give that a go this morning and then report back on how it goes.

Thanks again!

Ron

Ron Turrentine (oldtimerocker) said : #9

I think I may have found a clue as to the possible root cause of my issue.

I tried deleting the file using the subprocess library. I tried it with both the "Popen" And "call" methods. Both failed, giving me an error of "OSError: [Errno 2] No such file or directory ". Both the file & directory indeed exist.

So I decided to try a simple "dir" command on the directory in question. It also returned a "No such file or directory" error.

Intrigued, I decided to simplify my test. I ran the following script to run a simple "dir" command at the root of my C: drive (the script follows this post). It also returned a "No such file or directory" error (see the notes following the script for the verbose error returned by Sikuli).

Am I running this command with the correct syntax? Or is there possibly a problem with the subprocess on Windows 10?

Raiman, would love to hear your input on this.

Thanks,

Ron

####################

import subprocess

ret = subprocess.Popen(["dir","c:\\"], stdout=subprocess.PIPE).communicate()

print ret

###################

[error] script [ TestPopen ] stopped with error at line --unknown--
[error] Error caused by: Traceback (most recent call last): File "\\corpDpt02\CHMShare\Testing_Services\Automation Comparison\Sikuli Scripts\TestPopen.sikuli\TestPopen.py", line 3, in <module> ret = subprocess.Popen(["dir","c:\\"], stdout=subprocess.PIPE).communicate() File "C:\Users\bye8557\AppData\Roaming\Sikulix\Lib\subprocess.py", line 859, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users\bye8557\AppData\Roaming\Sikulix\Lib\subprocess.py", line 1369, in _execute_child raise OSError(errno.ENOENT, os.strerror(errno.ENOENT)) OSError: [Errno 2] No such file or directory

RaiMan (raimund-hocke) said : #10

the Python docs about subprocess.Popen say:

On Windows with shell=True, the COMSPEC environment variable specifies the default shell. The only time you need to specify shell=True on Windows is when the command you wish to execute is built into the shell (e.g. dir or copy). You do not need shell=True to run a batch file or console-based executable.

So dive into the docs about the possible parameters.

I finally found the culprit to the issue by using subprocess.Popen. It returned a much more useful error than os.remove did.

The error it is returning is: "The process cannot access the file becaus4e it is being used by another process."

This makes me believe that the XLRD "release_resources" method for the workbook is not really truly releasing the workbook from memory to allow me to delete it.

Any other suggestions on how I can clear the workbook from memory in order to successfully delete it?

Ron

PS: Here's my working function using Popen that finally helped me see the issue:

    def DelFile(self):
        from subprocess import Popen, PIPE, STDOUT

        try:

            command = "cmd.exe /c del /f \"" + self.NewFile + "\""

            output = Popen(command, stderr=PIPE)

            ret = output.communicate()

            if ret:
                raise StandardError(str(ret[1]))

        except:

            exc_type, exc_val, exc_tb = sys.exc_info()

            ErrorText = "***** ERROR IN SCRIPTNAME = " + sys.argv[0] + " *****\n"
            ErrorText += "Date/Time : " + str(datetime.datetime.now()) + "\n"
            ErrorText += "Host name : " + socket.getfqdn() + "\n"
            ErrorText += "User name : " + getpass.getuser() + "\n"
            ErrorText += "Line Number: " + str(exc_tb.tb_lineno) + "\n"
            ErrorText += "Error Type : " + exc_type.__name__ + "\n"
            ErrorText += "Error Value: " + str(exc_val.message) + "\n"
            ErrorText += "***************************************************************\n\r"

            print ErrorText

            self.LogError()

###############################

And here is the output:

***** ERROR IN SCRIPTNAME = //corpDpt02/CHMShare/Testing_Services/Automation Comparison/Sikuli Scripts/SurgCancDownloadOnly.sikuli *****
Date/Time : 2018-09-07 09:09:55.462000
Host name : CORPBYE8557A.hca.corpad.net
User name : BYE8557
Line Number: 66
Error Type : StandardError
Error Value: The process cannot access the file because it is being used by another process.

***************************************************************

RaiMan (raimund-hocke) said : #12

I looked into the xlrd sources (those bundled with SikuliX):
"release_resources" definitely does not close any file, only memory is released.

By just looking at the code it is hard to find out, wether at all and if yes, what file is opened and not closed (XLSX is not a compound file, but something like a disguised zip-file).

In any case: the problem is already with the open_workbook function, since here already the complete file content is taken into memory.

BTW: are you using the bundled xlrd?

Hey Raiman,

Thanks for the reply! Yes, I am using the bundled XLRD. I have not d/l or installed it externally.

BTW, I downloaded the sysintertals "handle.exe" from technet.
It helped me to determine that it is java.exe that has a firm grip on the Excel file.
Oddly, Java.exe actually had 4 separate holds onto the file:
https://i.imgur.com/B2UlMXO.png

Apparently I can forcibly "unlock" the grip on the file using handle.exe, but I hesitate doing it with such brute force. I'd much rather do it in a more "elegant" fashion.

I really appreciate everyone's patience & help with this.

Ron

RaiMan (raimund-hocke) said : #14

--- It helped me to determine that it is java.exe
... not a surprise ;-)
with SikuliX, everything is Java and File handling is one of the basic support features for everything running on Java.

A JVM session has many, many threads open, representing the different app-aspects and your handle-tool only connects the file-handles to the process-top-level (does not know anything of Java internal structures).

On last implementation idea:
Just do it the other way round:
delegate your workbook-handling to a subprocess (so the file is opened there) and when coming back from there, try the delete.

RaiMan (raimund-hocke) said : #15

I digged a bit deeper into the xlsx handling:
(xlrd::xlsx.py::open_workbook_2007_xml)
When processing the internal xml-files in the xlsx-zip, they use only
del file_handle

... which does not guarantee to close the file behind the handle (and apparently does not in the Jython/Jave environment)

... and with the standard XLSX workbook, there seem to be at least 4 xml-files, that need to be processed (your handle-tool result ;-)

I added a
file_Handle.close()

before each
del file_handle

... but did not test wether it helps (but should do).

Please test with the build from today (build #79)

Good information, Raiman. Thanks for the tip.

I'll definitely test this out this week and report back.

Best regards,

Ron

Can you help with this problem?

Provide an answer of your own, or ask Ron Turrentine for more information if necessary.

To post a message you must log in.