Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve been adding topological maps of South Africa to the saffer-data-map
repository. The maps are originally in MrSID format (.sid
files), which is a proprietary file format developed by LizardTech (a company which has been consumed by Extensis).
Although MrSID is efficient from a storage perspective, it’s challenging to work with. Especially on a Linux machine (the support has always been focused on Windows).
I wanted to translate the .sid
files into GeoTIFF. And I wanted to do this on Linux.
Docker to the Rescue
I spent some time researching the problem and learned that there was a MrSID plugin for GDAL. But installation seemed to be tricky. Fortunately I stumbled across a GDAL Docker image with integrated support for MrSID developed by Klokan Technologies.
< !-- The repository for the Docker image can be found at https://github.com/klokantech/gdal-docker. -->Pull the image.
docker pull klokantech/gdal
Check that MrSID plugin is available.
docker run klokantech/gdal gdalinfo --formats | grep MrSID
The expected output looks like this:
MrSID -raster- (rov): Multi-resolution Seamless Image Database (MrSID) MG4Lidar -raster- (ro): MrSID Generation 4 / Lidar (.sid)
Yes! MrSID is supported. 🙂
Using GDAL to Extract Information from Raster Data Files
The gdalinfo
utility can be used to extract general informtion from spatial raster data files. Let’s see how to do that with the Docker image.
docker run -ti -v $(pwd):/data klokantech/gdal gdalinfo WGS2929BC.sid Driver: MrSID/Multi-resolution Seamless Image Database (MrSID) Files: WGS2929BC.sid WGS2929BC.sid.aux.xml Size is 6922, 6921 Coordinate System is `' Origin = (26.749632892002410,-26.750512179722136) Pixel Size = (0.000036126395571,-0.000036126395571) Metadata: IMAGE__COMPRESSION_BLOCK_SIZE=512 IMAGE__COMPRESSION_GAMMA=2.000000 IMAGE__COMPRESSION_NLEV=8 IMAGE__COMPRESSION_VERSION=1,6,1 IMAGE__COMPRESSION_WEIGHT=4.000000 IMAGE__CREATION_DATE=Thu Jul 24 08:20:08 2003 IMAGE__INPUT_FILE_SIZE=12033322.000000 IMAGE__INPUT_NAME=D:\Free State\5\WGS2929BC.TIF IMAGE__TARGET_COMPRESSION_RATIO=29.999962 IMAGE__Z_ORIGIN=0.000000 IMAGE__Z_RESOLUTION=0.000000 VERSION=MG2 Image Structure Metadata: INTERLEAVE=PIXEL Corner Coordinates: Upper Left ( 26.7496329, -26.7505122) Lower Left ( 26.7496329, -27.0005430) Upper Right ( 26.9996998, -26.7505122) Lower Right ( 26.9996998, -27.0005430) Center ( 26.8746663, -26.8755276) Band 1 Block=1024x128 Type=Byte, ColorInterp=Red Min=145.000 Max=255.000 Minimum=145.000, Maximum=255.000, Mean=235.534, StdDev=13.500 Overviews: 3461x3461, 1731x1731, 866x866, 433x433, 217x217, 109x109, 55x55, 28x28 Metadata: STATISTICS_APPROXIMATE=YES STATISTICS_MAXIMUM=255 STATISTICS_MEAN=235.53421487603 STATISTICS_MINIMUM=145 STATISTICS_STDDEV=13.499711766257 STATISTICS_VALID_PERCENT=100 Band 2 Block=1024x128 Type=Byte, ColorInterp=Green Min=149.000 Max=255.000 Minimum=149.000, Maximum=255.000, Mean=238.647, StdDev=12.531 Overviews: 3461x3461, 1731x1731, 866x866, 433x433, 217x217, 109x109, 55x55, 28x28 Metadata: STATISTICS_APPROXIMATE=YES STATISTICS_MAXIMUM=255 STATISTICS_MEAN=238.64727272727 STATISTICS_MINIMUM=149 STATISTICS_STDDEV=12.531463080668 STATISTICS_VALID_PERCENT=100 Band 3 Block=1024x128 Type=Byte, ColorInterp=Blue Min=152.000 Max=255.000 Minimum=152.000, Maximum=255.000, Mean=237.746, StdDev=12.490 Overviews: 3461x3461, 1731x1731, 866x866, 433x433, 217x217, 109x109, 55x55, 28x28 Metadata: STATISTICS_APPROXIMATE=YES STATISTICS_MAXIMUM=255 STATISTICS_MEAN=237.74611570248 STATISTICS_MINIMUM=152 STATISTICS_STDDEV=12.490247658278 STATISTICS_VALID_PERCENT=100
Cool. Very informative. However, that lengthy command with all of the Docker options is a little clunky. To make it more efficient I’ll create a BASH alias.
alias gdalinfo="docker run -ti -v $(pwd):/data klokantech/gdal gdalinfo"
Now I just need to do this:
gdalinfo WGS2929BC.sid
Something to note from the gdalinfo
output above is that the .sid
file does not specify a coordinate system. It appears that these files are simply projected onto a longitude/latitude. We’ll want to be specific about that when we translate them.
Translating MrSID with GDAL
To transtale .sid
files to GeoTIFF I’ll use the gdal_translate
utility. An alias will also make this easier.
alias gdal_translate="docker run -ti -v $(pwd):/data klokantech/gdal gdal_translate"
Now translation is as simple as this:
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif
The -of GTiff
flag just makes it explicit that the output format is GeoTIFF.
We can take this up a notch and specify the correct coordinate system, EPSG:4326 (which is equivalent to WGS84).
gdal_translate -of GTiff WGS2929BC.sid -a_srs EPSG:4326 WGS2929BC.tif
The size of the resulting .tif
file is pretty big (around 140 Mb), whereas the .sid
file is relatively miniscule (only around 4 Mb). It’d be great to make the .tif
a little smaller.
Let’s explore a few compression options.
Lossless Compression
First let’s try out some lossless compression options. We use the -co
(creation options) flag to specify the compression algorithm.
# Compression: LZW gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=lzw # Compression: DEFLATE gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=deflate # Compression: PACKBITS gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=packbits # Compression: LZMA gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=lzma
Here are the results:
- LZW — 86 Mb
- DEFLATE — 78 Mb
- PACKBITS — 118 Mb and
- LZMA — 71 Mb (compression is seriously slow!).
Looks like DEFLATE might be the best option for lossless compression (factoring in compression ratio and speed).
Lossy Compression
What about some lossy compression options?
# Compression: JPEG gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=jpeg # Compression: LERC gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=lerc
Here are the results:
- JPEG — 22 Mb and
- LERC — 78 Mb.
The compression ratio with JPEG is by far the winner. We’ll stick with that and see what the image quality is like.
Colourspace
We also have some freedom in choosing the colourspace.
# Colourspace: YCbCr (with JPEG compression) gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co photometric=ycbcr -co compress=jpeg
Here are the results:
- YCbCr (with JPEG compression) — 8 Mb.
Wow! Okay, that seriously shrinks the size of the file!
Tiling
Tiling can also be used to generate a more efficient .tif
file.
# Tiling gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co tiled=yes
Tiling does produce a slightly larger file, but this file is, in principle, much more efficient to query.
Using the Results in R
My ultimate goal is to have data which I can easily import into R. The translated GeoTIFF files are perfect for this.
Setup location of the GeoTIFF file.
FILEPATH <- "WGS2929BC.tif"
Using {rgdal}
Now load up the {rgdal}
package.
library(rgdal)
Use GDALinfo()
to extract metadata from the .tif
file (compare this with the metadata extracted from the corresponding .sid
file above).
GDALinfo(FILEPATH) rows 6903 columns 6904 bands 3 lower left origin.x 29.49971 lower left origin.y -29.50041 res.x 3.622124e-05 res.y 3.622124e-05 ysign -1 oblique.x 0 oblique.y 0 driver GTiff projection +proj=longlat +datum=WGS84 +no_defs file WGS2929BC.tif apparent band summary: GDType hasNoDataValue NoDataValue blockSize1 blockSize2 1 Byte FALSE 0 256 256 2 Byte FALSE 0 256 256 3 Byte FALSE 0 256 256 apparent band statistics: Bmin Bmax Bmean Bsd 1 0 255 NA NA 2 0 255 NA NA 3 0 255 NA NA Metadata: AREA_OR_POINT=Area IMAGE__COMPRESSION_BLOCK_SIZE=512 IMAGE__COMPRESSION_GAMMA=2.000000 IMAGE__COMPRESSION_NLEV=8 IMAGE__COMPRESSION_VERSION=1,6,1 IMAGE__COMPRESSION_WEIGHT=4.000000 IMAGE__CREATION_DATE=Thu Jul 24 15:01:46 2003 IMAGE__INPUT_FILE_SIZE=17737854.000000 IMAGE__INPUT_NAME=D:\KZN\3\WGS2929BC.TIF IMAGE__TARGET_COMPRESSION_RATIO=29.999962 IMAGE__Z_ORIGIN=0.000000 IMAGE__Z_RESOLUTION=0.000000 VERSION=MG2
Note that
- the projection is WGS84 as specified when we translated from
.sid
format and - there are three colour bands.
Using {raster}
We can also use the {raster}
package to read the file.
library(raster)
The raster()
function will read in just a single colour band.
grey <- raster(FILEPATH)
What’s the Coordinate Reference System (CRS)?
crs(grey) CRS arguments: +proj=longlat +datum=WGS84 +no_defs
Check whether the coordinate system is longitude/latitude.
isLonLat(grey) [1] TRUE
Confirmed.
What is the extent of the data?
grey@extent class : Extent xmin : 29.49971 xmax : 29.74978 ymin : -29.50041 ymax : -29.25037
This is the range of the data in longitude and latitude.
How many layers are there?
nlayers(grey) [1] 1
Well, that makes sense since we only loaded a single colour band. Let’s now use the brick()
function to load all of the colour bands.
rgb <- brick(FILEPATH)
How many layers?
nlayers(rgb) [1] 3
Excellent.
Take a look at the results.
plotRGB(rgb)
Looks good. But let’s zoom in to see the details.
plotRGB( rgb, interpolate = TRUE, ext = extent(29.65, 29.70, -29.40, -29.35) )
Very nice indeed. One of my favourite parts of the Drakensberg. And the image quality does not seem to be compromised by the aggressive compression (these images are a little “fuzzy” to start with!).
Conclusion
You can access the original MrSID and processed GeoTIFF files in the saffer-data-map repository. Have fun.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.