Generating Source Code Contour Images
For many years, DrScheme has had a nifty feature called “Program Contour” that displays a microscopic copy of your source code along side your editor window. Each character is represented by a single pixel, which allows you to see the entire source file at a glance. This feature doesn’t seem to have gotten much attention, but recently the concept popped up on the radar when this reddit thread and this stackoverflow question pointed toward a similar image and raised the issue of the shape of good code.
I don’t claim to know what good code is shaped like, and I don’t know what impact an awareness of your code’s shape might make on its design, but I think the images are neato and I wanted to see the contour of my current project, which consists of around 34K lines of C++ code spread across 156 source and header files. The means of generating such images was not immediately obvious to me, so I developed the following technique using utilities commonly found on any Unix-like system.
As an overview: we convert text to image by way of Postscript. Enscript generates color syntax-highlighted vector output which is converted to a raster image by Imagemagick.
The primary difficulty lies in eliminating page breaks and generating a continuous scroll. To accomplish this, define a new Enscript media type called “Scroll,” representing a very long narrow page. To use it, place the following text in the Enscript configuration file, ~/.enscriptrc:
Media: Scroll 512 32768 0 0 512 32768
Given this, here is a command line to generate hello.cpp.png from the text of hello.cpp. Enscript writes its output to stdout, which is piped into the stdin of convert.
enscript --no-header -Ecpp --color -MScroll -p - hello.cpp | convert ps:- -flatten -resize 25% -trim -normalize -quality 100 -colors 16 hello.cpp.png
This code…
#include <iostream>
int main()
{
std::cout << "Hello, World." << std::endl;
}
… produces this image.
![]()
Here’s a larger example (122×2510)
There are many opportunities for customization. Lets begin with the Enscript options. The --no-header option tells Enscript to print nothing but the file’s contents, omitting the header line it would normally include. The -Ecpp option tells Enscript to highlight the syntax of C++ code. Run enscript --help-pretty-print to see a list of all supported languages. The --color option requests colored syntax, which defaults to an Emacs style. -MScroll selects the long page media type we’ve just defined. Finally, -p - tells Enscript to write the generated Postscript to stdout.
Now take a look at the Imagemagick options. Postscript is assumed to be transparent, and I want an opaque image, so the first step is to -flatten the input. Then, we do the work we set out to do in the first place, -resize 25% to scale the image down. A value of 10% is probably more appropriate if you want something resembling one pixel per character, as Enscript renders the text using 10-point Courier. The -trim option eliminates blank space at the end of a page (more on that issue below). -normalize enhances the contrast. This is necessary because text tends to have more background than foreground. The default down-sampling filter averages these, resulting in colors near white. Finally the -quality 100 and -colors 16 options configure the PNG compression. Sixteen colors are probably enough, as highlighted syntax usually has only a few colors, but we’d like a few extras for antialiasing.
Now, for this to be really useful we’ll want to automate it. We want to generate a complete set of images including every source file in a project. This can be easily done with GNU make. Here’s a Makefile:
SRCS= $(shell find . -name "*.cpp" -or -name "*.hpp") IMGS= $(SRCS:%=%.png) ESFLAGS= -Ecpp --no-header --color -MScroll CVFLAGS= -flatten -resize 25% -trim -normalize -quality 100 -colors 16 %.png : % enscript $(ESFLAGS) -p - $< | convert ps:- $(CVFLAGS) $@ images : $(IMGS)
Given this, all we need to do is type make images and make will seek out all C++ source and header files (that's the find operation in the definition of SRCS), determine an appropriate PNG image name for each (the definition of IMGS) and generate PNGs as above. As with most make-based processes, the next time we make images, make will generate PNGs only for those source files that changed since the last run.
Lastly, a note on source length: Notice that the Postscript Scroll media type defined above is only 32768 points long. That's about 5000 lines of program text. If your source file is longer than that, then it will be paginated on the pipe, and convert will generate only one image. Yet, we can't freely set that value larger because the generated Postscript will, in fact, always be exactly 32768 points long. Thus when Imagemagick begins processing there is briefly a raster buffer large enough to accept it. The -trim optimizes this by slicing away any blank space. This is the ugly side of this technique. If you know of a way to get Enscript to automatically adapt its media size to the output, let me know. If you know of a completely different and vastly superior means of program contour image generation, let me know that too.
Darren LeGrange:
Thank you very much Robert. I’ve been considering how to accomplish this very task and your instructions are spot-on. Saved me a bunch of work with an elegant output.
9 April 2009, 3:04 pm