Friday, March 1, 2013

Importing 16-bit TIFFs into NumPy

A colleague approached me today asking whether I've tried importing TIFFs into Python for image processing with SciPy. My image processing experience has been mainly focussed within a MATLAB environment, so we set about going through this together. Unfortunately, none of the images we tried to process with imread and imshow would work properly and would only display a white result. Inspection of the imread output showed that it returned a matrix of 255s. This was despite the source TIFF containing a greyscale image of something more interesting. Further investigation was thus warranted.

I firstly tried looking at the properties of the greyscale TIFF image and saw it was saved in 16 bits. I initially overlooked this thinking that this was the default. To cut a long story short, I realise now that this was the key fact as downsampling the file to 8 bits imported the TIFF file without any problem. So we were faced with a dilemma. Do we: 1. convert an entire library of TIFF files from 16 bits to 8 bits, 2. revert back to using MATLAB, or 3. try and get imread to open 16 bit TIFFs. Option 1 was immediately rejected as the library was enormous leaving options 2 and 3. We decided to continue with Python and investigate a solution.

A quick search on the Internet revealed others had experienced this problem, but few offered workable solutions. One such place was from another blogger, Philipp Klaus, who listed several methods to overcome this problem. Installing some of the packages required to accomplish this task proved too challenging on the Mac with its limited Python library, so I stopped. Fortunately, my colleague picked up where I left off and found an interesting comment on this site, posted by Mathieu Leocmach. This included a snippet of code that my colleague had tried on the data but did not work as expected. Here's the code in full:


import numpy as np
import Image

def readTIFF16(path):
    """Read 16bits TIFF"""
    im = Image.open(path)
    out = np.fromstring(
        im.tostring(), 
        np.uint8
        ).reshape(tuple(list(im.size)+[2]))
    return (np.array(out[:,:,0], np.uint16)<<8)+out[:,:,1]

Overviewing the code, I noticed the variable out consists of three dimensions. The third dimension translates to which one of the two bytes to use. According to the code, the most significant byte is in the index [:, :, 0], whereas the least significant is at [:, :, 1]. When we tried it out, the result of this sum produced mostly noise but certain elements looked to have some order. Further analysis revealed the more interesting aspects of the image were contained with [:, :, 0] instead, and the noise was contained within the other. However, there was also some order to index [:, :, 1], despite being mostly noise, so we couldn't simply reject this byte. This indicated a possible byte sequence shift which may be the result of Endian-ness. Changing the code to the following allowed us to see the image:


import numpy as np
import Image

def readTIFF16(path):
    """Read 16bits TIFF"""
    im = Image.open(path)
    out = np.fromstring(
        im.tostring(), 
        np.uint8
        ).reshape(tuple(list(im.size)+[2]))
    return (np.array(out[:,:,1], np.uint16)<<8)+out[:,:,0]

But if we're writing code like this, there's no reason why we're splitting the original image into two byte sequences. We could accomplish the same task by immediately using np.uint16 instead of np.uint8 and removing the need for a third dimension in the output. I settled with the following bit of code that worked well for us. Note that this code could, of course, be tidied up further, but I won't do that here.


import numpy as np
import Image

def readTIFF16(path):
    """Read 16bits TIFF"""
    im = Image.open(path)
    out = np.fromstring(
        im.tostring(), 
        np.uint16
        ).reshape(tuple(list(im.size)))
    return out

Now we're able to use 16 bit greyscale images within Python without having to resort to converting entire image libraries or using other packages or software. There's the possibility that the original code will work for some people and our version looks like white noise. I believe this is possibly due to Endian differences between the data and the computer. Hopefully future versions of Python will address this issue and making the necessary checks to TIFF source files so that we don't have to find workarounds. Not that we don't enjoy finding and creating these solutions!

Read more on this article...