Hunting down a failure in atomic file copying

I've written previously about how to atomically copy files or folders.

In Sandvox, we make atomic copies of files for media-handling purposes; new files generally needs to be copied into the document package at the next save. If this fails for some reason, Sandvox can handle that fairly gracefully by trying again at the next save. I figured it was better to carry on and do the best job possible, rather than treat that as an outright failure to save. Particularly since there's rarely any useful recovery options available that we can communicate.

One customer wrote to us to complain that Sandvox appeared to be taking up a huge amount of disk space. He had no idea where it was going, but the amount of free space had dropped drastically while the app was running. This seemed to be a pretty bad thing. Digging into the diagnostic logs revealed some unusual behaviour. The sequence went something like:

  1. Copy media file into the temporary directory
  2. Try to atomically move file into document package; this fails complaining of a permissions problem
  3. Repeat steps 1-2 with other media files
  4. Try to delete the temp directory, which also complains of a permissions problem

Well here was quite a puzzler. A file that our own process had just created was now being denied some sort of access to!

For the customer who first contacted us, Sandvox was trying to copy quite a large collection of photos into the package. Each time it failed, something like 500 MB of temporary disk space was being left abandoned with us unable to clean it up. Manually clearing it, or rebooting the Mac were about the only options.

Many diagnostics were tried to try and figure this out. What made it doubly annoying was we had no way to reproduce the situation ourselves!

After putting some diagnostic UI in for recent betas, another customer came forward with the same issue. They did some digging through the console logs themselves, found the source file that was failing, noted it had some unusual permissions according to the Finder and changed over to using a different copy of the file. We didn't have an answer yet, but it did seem the source file was the culprit somehow, and not there being a problem with the temp directory as I first expected.

I discussed the matter with Keith Duncan while visiting Brighton and he reckoned it sounded like something related to ACLs (Access Control Lists). So back I go and add some more diagnostic logging of the ACLs involved.

Happily, just before the latest logs could come in, Warren was able to track down that the problem occurs if a source file has an ACL applied to disallow deleting the file. NSFileManager's copy routine faithfully recreates the source file's ACL, leaving us with a temporary file that can't be deleted or moved. A customer's log showed that this was the case for them too.

Accidentally creating files with an ACL is actually quite hard to do as a casual user in OS X! As best we can tell, you need to be running as an admin user. And from there the easiest way to get into this position is to take a directory which already has such an ACL, and apply that to the files inside the folder. Most of the standard directories in OS X home folders are set up like this to avoid people deleting them by accident.

So now you know the history of this, go read the second half of the original article to learn about fixing it.

© Mike Abdullah 2007-2015