Jump to content

Copying cards with byte compaire


Recommended Posts

Hi, I am a freelance camera/film-maker. A job I did recently had some corruption of the files (I did not do the copy but it has got me thinking I need to find most robust way of copying off camera cards). Not sure where it came from but a friend told me an interesting story.

 

He was doing DIT for a project and being over paranoid he did 2 copies using totally different drives/card readers. It turns out it was a good thing. The copy they were using in the edit had corrupted footage. He got the backup to him and tested his hardware. Turned out it was one of the card readers that was introducing the corruption not it turns out he was not being over paranoid.

So I to be extra careful I want to do a byte compare after the copy (i.e. check the two files contain exactly the same data on a byte by bit, byte by bit, basis). What software is there that can do this. Be good if it was not too expensive, preferably open source. A simple byte compare utility would be good (did some googling but no joy). Alternatively a suite of utilities that does the copy and compare, and possibly maintain database (but I guess this is specialist software that may not be cheap).

 

Regards,

Ben

PS I realise doing a byte compare is not instead of using two sets of hardware, it is a extra check.

Edited by Ben Edwards
Link to comment
Share on other sites

I suppose this is a question for a DIT expert....

 

If you make two copies of the data from a card and they don't match, which one if correct?

If you make three copies and two match, is the one that doesn't match bad or have you simply replicated the error in both the other copies?

If you copy the files from the card using two different readers and they don't match, which one is correct?

 

Is there no error detection and correction in the camera hardware/software and the reader that prevents data corruption?

Link to comment
Share on other sites

  • Premium Member

I highly recommend always doing chechsum verified transfers for important material. When you know the checksums of the original camera files you can check all the copies and see which ones are correct or not. I use Double Data for epic and alexa material, usually I transfer the material from the camera cards using source and destination verification and when the transfer is complete I will do two normal copies to separate drives (first copy goes to raid array, I have another raid array for secondary copy and the third copy goes to 3.5" bare sata hard drive via docking station. I check the second and third copy manually and also compare the source and destination md5:s which the double data stores in text files next to the copy.

After the day I clone the 3.5" drive in docking station two times and verify all the copies with DD using the checksums from the original transfer. I make two LTO5 copies of the material weekly and can then reuse some of the 3.5" drives, the raids have to be reset every five or six days and then I need to have three more durable copies of the material.

 

When I have to leave the dit set during the day I usually take the 3.5" drive with me. The biggest risks with data are electrical problems, thefts and fires on set. Everything is ups powered so I can use the system with generator without problems.

 

You can never be overly paranoid about data, if you are sloppy you will run into lots of problems and destroy everyone's work

Edited by aapo lettinen
Link to comment
Share on other sites

  • Premium Member

I would recommend Shotput Pro, Double Data never worked for me but the previous versions of R3D and Alexa Data Manager worked fine. MD5 checksum copies to two separate locations at the minimum. I've seen readers corrupt footage occasionally when they get too hot, so it's a good idea to have a backup reader and swap them out if you are constantly being thrown cards.

 

Most importantly, you need to have a system of physically labeling cards so that there is no confusion what is exposed, downloaded but not verified, and ok to format. Don't reuse cards that have not been copy verified! This way you can always recopy from the card if there are issues.

 

The system I like is: 1" red paper tape with the roll #. Gets put on the card right when it comes out of the camera. Also, if you put a small piece of white p-touch labeling tape on the card, you can write the roll # in sharpie on the card itself before placing it in the camera. I do both just in case the tape falls off. Next, the card goes in a small pelican case marked "Exposed Media" in red. That goes to the download station. The loader or DIT puts the tape on the reader and copies the card. After a verified copy, the DIT puts 1/2" green tape on the card with the Roll # and check mark. It stays on the DIT cart until the footage has been verified. Then the card goes into a small pelican case marked "Ok to Format" in green and comes back to the camera cart.

Link to comment
Share on other sites

  • Premium Member

Sure, there are cameras that allow dual recording like the Canon C300 and most will work with an external recorder like the Pix240 or Atomos products. Monitor/Prores recorders have been all the craze the last few years. However, that doesn't replace a proper system for safely offloading data.

Link to comment
Share on other sites

Second Shotput pro.. its the industry standard really.. a lot of productions will insist on a download log and check sum log verification as part of their insurance .. and SPP does it for you.. with the down load files or to any other location you want.. even send you an email when its copied.. so you can hang in the bar as you wait.. also will copy to multiple drives .. ( I never do more than 2 at a time).. faster than doing it separately .. $99 .. easy,and works..

Link to comment
Share on other sites

  • Premium Member

I have a following system for card labelling:

 

1. 2nd AC attaches a tape to the full card which has the card name on it

2. when I have done 1st copy, I will take the tape out and attach a blank tape to the card

3. when I have checked all the copies and they are OK, I will write "format and reuse" or similar text to the blank tape (if there are couple of blank labelled cards on the table I always double check the card before writing anything to the tape)

4. then I will attach the original card name tape to the case of the 3.5" hdd so everyone will know that the card is backed up there

5. I will bring the original card back to the 2nd AC personally and take the 3.5" drive with me when doing so, so if someone nukes my DI set we won't lose the data forever

 

I only do one checksum verified copy first because I want to check the other two copies manually. maybe being overly paranoid, I don't fully trust even checksum verification :D

Checksum verified copy takes at least 2x the time compared to a simple copy so if is also a bit more time efficient

Link to comment
Share on other sites

  • Premium Member

one thing which is absolutely critical is to have a UPS (uninterruptible power supply) for the whole DI set, especially if you are using it with a generator (and it is absolutely mandatory if someone else is also using the generator because of sudden voltage drops) . It has saved the data maybe a hundred times during our production when I have had to use a generator which is shared with for example Catering and the were using lots of power irregularly.

This way you can also stop the generator without affecting the transfers when you need to add gasoline or move the generator for whatever reason

Link to comment
Share on other sites

  • Premium Member

 

 

one thing about byte comparisons is that it usually does not work when you have drives with different file systems

 

It should; the filesystem should be transparent to applications reading the files. I'd be interested to see how you got results that indicate otherwise.

 

 

 

MD5 checksum copies to two separate locations at the minimum

 

Point of order, just to avoid misconceptions - checksums just produce a hash (sort of a nearly-unique-for-these-purposes fingerprint of the file) of which MD5 is one example. An MD5 hashing program doesn't, by definition, copy anything anywhere. Applications using MD5 may copy things around, hashing both copies, and if the hashes don't match then that's your indication that there's been a problem.

 

 

 

If you make two copies of the data from a card and they don't match, which one if correct?

 

You can't tell. Usually the approach would be to ditch both copies and retry the entire process from scratch, doing that a limited number of times automatically before deciding that there's probably a more serious problem (such as a dying hard disk) and raising an error to let the user intervene manally. If you were feeling particularly adventurous, you could keep copying files until you got any two that matched, but I wouldn't advocate that approach.

 

 

 

Is there no error detection and correction in the camera hardware/software and the reader that prevents data corruption?

 

Ideally, yes. The parts of operating systems that handle files implement something a bit like MD5 hashing internally. The issue is simply that there is no implementation of what used to be called confidence replay. Back when quarter inch tape was used for audio recording, the playback head was situated after the record head in the tape path, so it was possible to listen back to what had just been recorder. Hard disks have no such capability. Various levels of hashing and checksums try to ensure that the data that's transmitted to the magnetic read/write head is correct, but if the magnetic recording system itself fails (a speck of dust on the disk surface is more than enough) then what's called a hard write error occurs. This, and other will be detected by hash comparison - but crucially this will only happen when you try to read it. There's no way of detecting a hard write error without reading the file, which is what we do when we make an MD5 hash of it.

 

LTO tape, by comparison, absolutely does have a confidence replay head and absolutely does detect hard write errors, for exactly this reason (it's one of the conveniences of linear rather than helically-scanned tape formats). Really careful people will still run hashes over the stored files, however, as the internal check will not, for instance, detect glitches in the cabling sending the data to the LTO deck, which is another source of problems.

 

The only way to ensure it's readable, sadly, is to read it, and that doubles the time all this takes.

 

P

Link to comment
Share on other sites

  • Premium Member

you don't actually necessarily need specified programs for calculating checksums, for example in MacOS you can use Terminal for that (for md5 checksum calculation type md5 ,add space, drag the file to the Terminal window, press Enter )

 

--

I get different results all the time when comparing folders against each other. When comparing the files themselves the byte counts match but it is WAY too time consuming to compare every one of them manually

Link to comment
Share on other sites

  • Premium Member

I'm not sure what the situation would be with folders; the MD5 app itself would be responsible for walking the folder tree structure and performing an MD5 on all the files therein. Personally I'd write a script to do it myself.

 

P

Link to comment
Share on other sites

  • Premium Member

I was referring to the common practice of comparing the copied folders against each other in Finder or Explorer to see how many bytes the folder takes in the drive and how many files there is in the folder and use this information as a confirmation that the copy was successful

Link to comment
Share on other sites

  • 3 weeks later...

The best is Silverstack. Second best is YoyottaId. These two are the only ones maintaining a library with rich metadata and reporting capabilities. Silverstack has great, I repeat, great support. I've asked for implementations and seen them appear in the next release. If you spot an issue, they might even make a release the next day with a bug fix. It's also a great QC tool.

YoyottaId ain't bad. But what with the absence of an actual user manual, I seriously can't be bothered to spend my days performing voodoo magic until I hit the right combination.

 

Then there's, as mentioned, ShotPutPro. I find it a bit too much on the bare side. It does the copy just fine. So if this is the only thing you need, I see no reason not to use it.

 

As for free there are great tools hiding in the terminal. Rsync is simply fantastic. It'll create a checksum if you ask for it. And it uses checksums even when you don't ask for it. I use it for cloning entire projects onto a safe server and/or LTO, or for copying other elements of a day's folder that do not require reporting, such as dailies, sound, ALE, grades, etc, etc. Another great little tool is iSFV. It doesn't do the copy, but it creates checksums for entire folders which are then simple to verify. It uses the CRC checksum. It's a smaller checksum, but it is enough indeed. I send it to edit assistants so they can easily compare the quality of the cloning they're doing in the cutting room.

 

File size comparison won't show any errors at a byte level, so it's basically a waste of time. Likewise with doing two copies with two readers to two different destinations. As long as you are calculating the checksum off the source, then comparing that to the cloned copies you are basically fine, and can then move on to the actual DIT work.

Link to comment
Share on other sites

Hi All.


I am a programmer of post-production L'espace Vision in japan.



The theme of this thread is very interesting for me.

I am developing a simple checksum copy software.


rsync can be traditional and reliable, and some problems, it does not have the most recent checksum algorithm, such as xxhash.

SilverStack is a great software that is reliable, but it is expensive.

I thought that it should make a simple and general-purpose and inexpensive verify checksum copy software.


Do you know the FastCopy checksum copy software that are trusted for over 10 years on Windows?

I was ported to Mac by improving the FastCopy.

The name of the software is RapidCopy.


This is RapidCopy Features:

- High-speed diff file copy by update date or filesize.

- Stable copy (Ignore minor error).

- Checksum verify copy.

- Generate Log file (includes checksum).

- "LTFS" (Linear Tape File System) read / write support.

- Sync (like rsync), Move, Delete ... Various copy mode.

- Classic look and feel, Usage is same as great original FastCopy.

- Source and Destination verify check (Verify only mode) support.

- Various checksum algorithm (xxHash, MD5, SHA-1 ....).

- etc ...


RapidCopy sells at roughly $ 10 in MacAppStore.



There is also a more features in Pro version.

My company provided, as possible we can, a Pro version that sells with $25 at amazon.com

For technical details there is also English documentation.

Please see if you are interested.




Now, I develop copybatch function and csv output function.

I want feedback from overseas.



E-Mail:sawatsu@lespace.co.jp


Regards

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...