[Update: a related post has been added, where I make available the scripts for reading and writing Canon RAW files.]
What if I told you that I could write a camera raw file, after editing the content? Would that change anything? I’m thinking most people’s answer is probably some variant of “really? I thought you couldn’t edit a camera raw file.” Assuming they know what a camera raw file is to begin with.
So let’s start at the beginning: a camera raw file is a format used by many digital cameras (excluding most camera phones and point-and-shoot models) to store a faithful record of the image sensor data with as little processing as possible. This is very useful with photography-oriented (as opposed to snapshot-oriented) cameras like DSLRs, where the user is likely to edit their images after capture. By keeping as much of the captured information as possible, raw images allow more freedom in the editing process, and generally keep a greater level of detail (e.g. a greater dynamic range and wider colour gamut) than processed JPEG images. However, because they need to be processed (or converted) before display, raw images are often thought of as digital negatives.
Camera raw files can be read and processed by a number of software products. Adobe Lightroom seems to be a popular one with photographers; I have used Canon’s Digital Photo Professional myself for many years, only recently moving to DxO Optics Pro. However, in all software I’m aware of, even where some editing of the image is allowed, the original sensor data remains untouched. I believe that this limitation, together with the psychological effect of considering the raw image to be a digital negative, causes most people to think that a camera raw file remains a true representation of what was captured. I have seen cases (though unfortunately I didn’t keep a record of the references/links) where camera raw files were called for either as proof of ownership / originality of the image in question, or to make sure that the resultant image was not a composite or had its content otherwise tampered with. Examples of where this may be useful include photojournalism and photographic competitions (where the rules of entry may prohibit certain processing steps).
Unfortunately, the existence of a raw image is insufficient evidence. Like any other digital file with a known format and requiring no secrets (I’ll get to this later), just like it was written by the camera, it can be written by software on a regular computer. After all, the camera is nothing but an embedded computer system, running some (usually proprietary) software. I do not think this statement will come as a surprise to any researcher working in digital imaging or image forensics. Unless any of the components are encrypted or signed with a secret key, there is nothing to stop someone else from fabricating an image in the same format.
Of course, this means that there is nothing ‘special’ about camera raw files, and no reason why an edited image cannot be written in the camera raw format. So this is what I set out to do for current-generation Canon raw files (CR2). While Canon does not publish the format description, a public documentation of CR2 files exists thanks to Laurent Clévy. So over the last couple weeks or so I set about writing some software, first to read and interpret CR2 files, and next to reassemble them from the read components. Today I decided to write about what this implies, and to publish an example of what can be done. Eventually (with some luck, soon) I plan to clean up the code a bit and make it available on an open-source license, together with instructions.
To start the demonstration, I found a suitable Canon raw image file. I chose one from the BossBase v1.0 data set. This data set, which includes 10,000 raw images from several different cameras, was published with the materials for the Break Our Steganographic System challenge1. Unfortunately, the original source does not seem to be available any longer, though there appears to be a copy here. The image I chose was taken with a Canon Digital Rebel XSi, the equivalent of Europe’s 450D. You can see a downscaled version of this below, converted with Canon DPP.
I extracted the sensor data from this raw file, and converted it to the sRGB colour space so I could edit it in Photoshop. There I cloned out the person using the content-aware tools and some old-fashioned cloning. I saved that, and did the reverse process to obtain an edited ‘sensor image’. I embedded that as a CR2 file, and that was that.
However, this process only changes one of four images actually embedded in the raw file. Most won’t realise this, but a raw file actually contains a small thumbnail (in JPEG format), a small-size JPEG version, and a very small raw image. To complete the editing process, I generated a small version of the raw image from my edited sensor image. I also loaded the edited CR2 file in Canon DPP and converted this to JPEG. I down-scaled that JPEG to obtain the necessary small-size and thumbnail JPEG images. (Note that I used Canon DPP to convert the image just to be sure that the tonality reproduction was the same as in the original raw file.) Once these steps were complete, I recreated the edited CR2 file by replacing all four images with the edited version. To give you an idea of the editing I did, you can see a downscaled verison of the edited image below, again converted with Canon DPP.
I checked that the CR2 file could be loaded by the software I had at hand (Canon DPP, DxO Optics Pro, and Photoshop). If you want to play around with them, you can download the files from the following links:
- Original raw file (falls under the original BossBase license)
- Edited raw file (please see below for license details)
The BossBase data set came with a license that allowed re-use of the images for scientific purposes (exact text below). I am happy for the edited raw file to fall under similar license conditions.
Please note that there are details in the writing process that allow me to distinguish between an original and an edited raw file. However, it would not take a huge effort to remove the obvious distinguishing features. In any case, I am confident that the edited raw file will pass scrutiny by a non-expert. Given the ease with which this was done, I think it’s about time people stop assuming that a raw file is sufficient evidence of authenticity or originality. It would also be a good thing if researchers started looking at the forensics of camera raw files with the same level of attention they have given to JPEG encoded images.
Do you have a question, or would like to weigh in? Let me know.
- Patrick Bas, Tomáš Filler, and Tomáš Pevný. 2011. “Break our steganographic system”: the ins and outs of organizing BOSS. In Proceedings of the 13th international conference on Information hiding (IH’11), Springer-Verlag, Berlin, Heidelberg, pp. 59-70.
BossBase Images License:
- This work can only be used for scientific purposes.
- It has to be cited as the “BOSSbase” or the “database from the BOSS contest”.
- You may not use this work for commercial purposes.
- For any reuse or distribution, you must make clear to others the license terms of this work.
- Any of the above conditions can be waived if you get permission from the copyright holder.