Scripts for Reading and Writing Canon RAW files

[Update: a related post has been added, where I document two new arguments to the RGB decoding/encoding scripts.]

Some time ago I wrote a blog post demonstrating that it’s possible to change the image content of a camera RAW file, thereby making it clear that the RAW file itself is insufficient evidence of authenticity. I achieved this by writing some python scripts that convert the original RAW file to a format that can easily be edited and then replace the image content of the RAW file with the edited image. When I wrote that post I stated that I would eventually be making the scripts available on an open-source license, with instructions. It has taken longer than I would have liked, but at last they are publicly available.

The scripts themselves may be obtained from GitHub, together with a python module containing classes for handling PNM and TIFF files (since Canon CR2 files are actually TIFF format with an extension for storing the raw sensor data). As an example for how these scripts can be used, I’ll detail the commands used to create the edited raw file in my earlier blog post. You will need:

  • the scripts from GitHub
  • a working python installation, including numpy and matplotlib
  • the Stanford PVRG JPEG software
  • ImageMagick
  • the original raw file (for license details see my earlier post)
  • Canon Digital Photo Professional, to create correct JPEG thumbnails

What follows was tested on a 64-bit Ubuntu 14.04 LTS setup, but should work on other operating systems. The scripts should work as-is, but the commands you enter on the command prompt will need to be modified to take into account your filing system’s conventions.

  1. First, clone the scripts repository, and start a command prompt in the ‘example’ folder.
  2. Download the original raw file and save this in the ‘example’ folder.
  3. To extract the component images in the raw file:
    ../cr2_extract.py -i 2908.cr2 -o Components/2908 -d > 2908.txt

    This saves the components in the ‘Components’ folder, with ‘2908’ as a filename prefix, as specified. It also prints a human-readable account of what it found in the file.

  4. Since we want to edit the sensor data, we next need to decode the corresponding component file ‘2908-3.dat’, which is in lossless JPEG format and convert that to a PGM file:
    ../raw_decode.py -r 2908.cr2 -i Components/2908-3.dat -o Sensor/2908.pgm

    Note that we also need pass the raw file as a parameter so that the script can determine the image parameters (which are not included in the sensor component).

  5. The PGM file can be read by most image editing programs, but this only contains an unprocessed version of the sensor data. That is, each pixel gives the contents of the corresponding sensor cell site. For colour cameras, this is still in the Bayer mosaic format, and also contains the (usually invisible) sensor border from which the black level of the sensor image is determined (and displayed). We decode the colour image using:
    ../rgb_decode.py -r 2908.cr2 -i Sensor/2908.pgm -o Sensor/2908.ppm -C "Canon EOS 450D"

    In this case, we again need to pass the raw image as a parameter, and we also need to specify the exact camera identifier string, from which the script can determine the colour table to use. (Passing the string separately makes it easy to use an alternative colour table.)

  6. (Note that the above command sequence can be found in the extract.sh shell file.)
  7. At this point, the ‘2908.ppm’ can be edited with any image editing program. For the following re-assembly process, it is assumed that the edited file is saved as ‘2908r.ppm’.
  8. We start the inverse process by creating a fake sensor image from the edited file:
    ../rgb_encode.py -r 2908.cr2 -i Sensor/2908r.ppm -o Sensor/2908r.pgm -s Components/2908r-2.dat -B 1025 -C "Canon EOS 450D"

    We need to specify the black level to use, which should be the same as what was displayed earlier. The above command saves the fake sensor image as ‘2908r.pgm’ and also creates the small RGB thumnail image ‘2908r-2.dat’.

  9. The sensor image is then encoded in lossless JPEG format:
    ../raw_encode.py -r 2908.cr2 -i Sensor/2908r.pgm -o Components/2908r-3.dat -C 2 -P 14
  10. The parameters to use for number of components and precision depend on the camera.
  11. Ignoring (for now) the two JPEG thumnails, the edited raw file can be assembled using:
    ../cr2_embed.py -i 2908.cr2 -b Components/2908r -o 2908r.cr2
  12. This produces a valid raw file, but keeps the old thumbnails. To correct this, load the edited raw file in Canon DPP, and export as a JPEG image without changing any settings. This ensures that the JPEG file produced is the same as what the camera itself would have created. Save this as ‘Manual/2908r.jpg’.
  13. The newly converted JPEG can be resized to the exact sizes needed for the thumbnails using ImageMagick:
    convert Manual/2908r.jpg -strip -rotate "90<" -resize "2256x1504" -define jpeg:optimize-coding=false -quality 50 -define jpeg:q-table=canon-q-table.xml jpeg:Components/2908r-0.dat
    convert Manual/2908r.jpg -strip -rotate "90<" -resize "160x120" -gravity center -background black -extent "160x120" -define jpeg:optimize-coding=false -quality 50 -define jpeg:q-table=canon-q-table.xml jpeg:Components/2908r-1.dat

    The above commands use the same quantization tables as Canon camera, as specified in the XML file.

  14. Finally, we need to repeat the embedding process, this time with the newly minted JPEG thumbnail images:
    ../cr2_embed.py -i 2908.cr2 -b Components/2908r -o 2908r.cr2
  15. (Note that the above command sequence can be found in the embed.sh shell file.)
Advertisements

8 comments

    • 🙂 that was a rather technical post on my side, so don’t feel bad at all. Many people with SLRs don’t even shoot RAW, never mind worrying about whether they can be edited or not. That’s also ok, of course! Though personally I find that shooting RAW and using that kind of workflow is very helpful in getting the most out of the camera. Nowadays there are many software options for processing RAW images, you won’t have to use what comes with the camera.

  1. @Johann Now all you have to do is to get with the German guys who described the term “BadUSB” and get them to code this into the microchip of a USB drive. And then when someone stores their selfies onto their USB drive your code will automatically replace the photos with Rick Astley. Best prank ever.

    • Hi Laurent! I worked mostly off the documentation on your website, and standard documentation of the compressed formats. I think I also needed to experiment a bit to determine a couple details, particularly for my (newer) 6D. Were you looking for something specific?

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s