Friday, May 23, 2008

Native and fast lossless compression within ERDAS IMAGINE

While ERDAS IMAGINE supports lossless compression through several different means (MrSID, JPEG2000, Packbits, etc.), this discussion focuses on IMAGINE's native run length encoding (RLE) compression. Typical RLE algorithms compress adjacent duplicate pixel values and are a common lossless compression. The implementation of RLE in ERDAS IMAGINE uses a dynamic range run length compression (DR-RLE) rather than the typical RLE. DR-RLE not only compresses adjacent duplicate pixel values but also compresses unused pixels values within the data’s defined data range. This has been the case since the earliest days of .img for both athematic (continuous) and thematic data.

What do we mean when we say it compresses the dynamic range of the data? An example provides a good description. Many modern data types are collected as 11 or 12-bit data, yet we must store it as 16-bit data. If we have an 11-bit dataset, all possible pixel values between value 2048 (unsigned 11-bit +1) and value 65,535 (unsigned 16-bit max) are not used by the dataset, but must take up disk space. When we store these 11-bit data in the .img data format using a DR-RLE compression, all unused pixel values are compressed to a small fraction of its initial 16-bit size.

Elevation data can also benefit from the .img lossless compress because few DEMs need values anywhere near the 16-bit maximum value of 65,535. Thematic data also can benefit. Often have 100 or less categories (classes) and thus must be stored 8-bit. When encoded using DR-RLE, unused values are compressed.

One gottcha discovered by Donn Rodekohr at Auburn Univ., floating-point data stored in the .img DR-RLE should not be processed with intensive math functions in the modeler. Using these floating-point data in complicated models are shown to have some degradation in processing speed. We will address this issue in future versions of IMAGINE. So, keep your DR-RLE data storage limited to integer data for the time-being.

Below are a few size comparisons. From my previous GeoTIFF post (http://field-guide.blogspot.com/2008/05/what-is-wrong-with-my-geotiff.html) you will recall that Packbits is TIFF’s RLE compression. Packbits compresssion is not a DR-RLE compression.

DR-RLE comparison using a 512 x 512 single band .img file:
271KB 8-bit uncompressed (with pyramid layers)
271KB 8-bit RLE compressed (with pyramid layers)
527KB rescaled to 16-bit uncompressed (with pyramid layers)
272KB rescaled to 16-bit RLE compressed (with pyramid layers)

Packbits comparison using a 512 x 512 single band .tif file:
258KB 8-bit uncompressed (without pyramid layers)
258KB 8-bit packbits compressed (without pyramid layers)
514KB rescaled to 16-bit uncompressed (without pyramid layers)
514KB rescaled to 16-bit packbits compressed (without pyramid layers)

But what about the speed? When ERDAS implemented DR-RLE back in 1993, we focused on it being efficient. This effort, and because DR-RLE is so very simple to compress and decompress we see some speed gains from DR-RLE compressed img data over uncompressed .img data. This is the case for today’s processors as well. Having an efficient access mechanism, small data footprint stored in an ultra-simple compression & de-compression format helps processing speeds to disk or to screen.

When an .img file is >2.1GB, IMAGINE rolls the data over in an .ige file and the .img file stores metadata, spatial indices and so forth. The DR-RLE compression is not supported within .ige files. The original design of the .ige is for the simple and fast access of very large uncompressed data files. The .ige design does not allow for any file compression whatsoever. (See: http://field-guide.blogspot.com/2010/04/when-does-img-image-roll-over-to-ige.html)

For files >2.1GB which can stand a little value loss, we suggest our own ECW format or LizardTech’s MrSID format. We have added a ECW read capability to IMAGINE Essentials 9.2 and an encoding capability with IMAGINE Professional 9.2. While IMAGINE supports MrSID Generation 2 and Generation 3 (MG2 & MG3 respectively), we must use MG3 for MrSID files >2.1GB and lossless compression. ECW does not support lossless compression.

Are there any side effects (good or bad) when pulling these .img DR-RLE data into ArcGIS?” Not that I have seen in 15 years. In-fact, ESRI and ERDAS both use the same DR-RLE algorithm. ESRI uses it in GRID, ERDAS in .img. As well, ERDAS provides (writes) many of the raster data objects (RDO) in ESRI products, and .img support is one of them (RLE being part of .img support).

Any positive side effects in ArcGIS? Where ESRI uses the ERDAS / ESRI RDO within their product lines, the user will see the same performance improvements in ArcGIS. In other words, other than the issue with floating-point data, rock-n-roll in ArcGIS.

How do you take advantage of the .img dynamic range run length encoding in ERDAS IMAGINE?
Set “RLE” as your “Default Compression” in the "IMAGINE Image Files (Native)" preference category.
Set “Default” as your “Data Compression” in the "Spatial Modeler” preference category. This will cover Import, Save As, Subset (which is currently a modeler function), Mosaic Tool, MosaicPro, Spatial Modeler and all IMAGINE for files less than 2.1GB. We are planning to make RLE 'on' in the preferences in 9.3, if I hear a demand growing from the user community. Comments?

IMAGINE now supports encoding and decoding DR-RLE, JPEG, LZW, TIFF Packbits, MrSID, JPEG2000, ECW, et al. What other compressions do you folks wish to see within the ERDAS IMAGINE suite?

Tuesday, May 13, 2008

What is wrong with my GeoTIFF?

Maybe, it is not really a GeoTIFF?

Many people have tried over the last decade to standardize on GeoTIFF as a neutral data delivery format. This was the intent and design of the format. So, why are many still having problems with GeoTIFF?

GeoTIFF is built upon Adobe’s TIFF Revision 6 Specifications (see: http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf). While this data format is well documented, it is not followed by some software when creating and modifying data. This creates a challenge because some very prominent software companies are among the offenders. We are not immune to this non-standard software tweaking in the remote sensing, photogrammetry and GIS (geospatial) communities. Typically, today’s geospatial data providers have painstakingly located software that conforms to the GeoTIFF specifications (see: http://www.remotesensing.org/geotiff/geotiff.html) in a very strict manner. So, when we take receipt of data from today’s data vendors, we can be confident we have a proper GeoTIFF.

Again, some very prominent software does not conform to the TIFF and GeoTIFF specifications. When GeoTIFF data are modified subsequent to delivery to GIS, remote sensing and photogrammetry practitioners; often the practitioner will use software that does not conform to TIFF/GeoTIFF specifications to modify the data. This is where problems occur. When software creates a TIFF with non-standard GeoTIFF tags, it ceases to be a GeoTIFF, and is simply a TIFF with geographic information associated. These associated files are seldom readable by packages outside of the creating package, thus decreasing the portability of the image data. When you must modify GeoTIFF data, be sure you can create a true GeoTIFF.

While ERDAS IMAGINE is flexible and forgiving when reading many non-standard TIFF data with geographic information associated, ERDAS IMAGINE follows GeoTIFF specifications very carefully when creating GeoTIFFs. Notwithstanding ERDAS IMAGINE’s careful implementation of TIFF and GeoTIFF support, there are several things to be aware when creating the most portable GeoTIFF data possible. If we follow a few simple rules, we can provide data with a high level of portability, even to those beyond the geospatial community.

First, determine what level is the lowest level of support needed. I typically assume non-geospatial people will try to use the data in Microsoft Windows Picture and Fax Viewer or will insert the image into Microsoft Word. Hey, I know that is a bad idea, but a whole lot of people do it by accident, or because that is the best they have. Now, let’s not call them idiots because my mom has done this by accidently double-clicking on the TIFF. (Sorry mom.)

So, what TIFF parameters will create an image which will be supported the ‘lowest common software?’


1. GeoTIFF tags do not decrease portability. Software not using these GeoTIFF will ignore the tags. Because GeoTIFF followed TIFF6 specifications, true GeoTIFF’s will not cause a problem.

2. Unless you intend to open a lot of TIFF images at the same time, do not use tiling. If you need to open a lot of images at the same time, use tiling. While tiling is a standard TIFF option, and improves display speed in many software (especially for large images and lots of images), it decreases the portability of data. This is not a problem within ERDAS IMAGINE. It will handle a few dozen either striped or tiled images rapidly. But again, many lower-end software packages cannot handle TIFF data using tiling option, so use stripped TIFF (BIL) when in doubt.

3. Use the Red, Green, Blue (RGB, denoting 3 colors in the TIFF) option rather than Multispectral option. Multispectral is a standard TIFF option allowing for placement of more than three bands into a single TIFF. Many software products cannot support more than 3 colors per TIFF, and thus will not allow the display of the data.

In the modern geospatial industry, 4+ band data delivery is becoming more prevalent. Many providers are required to deliver data as a red, green, blue image and a NIR, red, green image for each frame. This means the deliverable has duplicated 2 bands, green and red. Sadly, until the key software vendors decide to support Multispectral TIFF, this will continue to be a problem for TIFF and thus GeoTIFF.

4. You typically want to keep compression to None or Packbits. Packbits is the TIFF implementation of run-length compression (RLE). I have never run access a product which does not support Packbits. I have in the past run access software which did not support JPEG and LZW inside a TIFF wrapper, but these are becoming less and less frequent.

My lowest common test software for TIFF data is Microsoft Windows Picture and Fax Viewer. It supports: Striping, RGB, No Compression; or Striping, RGB, Packbits Compression; or Striping, RGB, LZW Compression. If these tools can read TIFF, everything can (Microsoft is kind of cheap.)

5. While BigTIFF (aka TIFF-64) is the standard for TIFF data >4GB in file size, it is not widely adopted yet. Consider delivering a compressed image until it is more widely adopted. ECW and MrSID are excellent solutions within the geospatial community. In the future JPEG 2000 may become a good format for portability in and out of the geospatial community.

So, how do I know I have a true GeoTIFF? A true GeoTIFF will have all horizontal map coordinate, projection information contained within the TIFF tags. It even supports vertical projection information. For a test, move any ancillary files (non-TIFF files) to another folder and display the TIFF in ERDAS IMAGINE 9.1 SP4 or higher version (9.2 SP1 is the current version). Use the TIFF and GeoTIFF tag viewer within the Image Info capability to examine the projection information. If you see your projection defined correctly, then the GeoTIFF is a true GeoTIFF.