Thursday, May 20, 2010

This is a story+tutorial about how I learned to marry OCaml and GNU make. Since I'll be leaving out my mistakes I'll appear to be unusally intelligent. Excellent!

Some of my projects that I write in OCaml have gotten too large for a simple "run these commands" build script to be convenient. There are many, many options for building OCaml projects and some are designed specifically for OCaml. OMake, Ocamlbuild, and GODI are the ones I know of. I find that these are overkill for my projects and not trivial to learn.

Another option is GNU make, which is not designed for OCaml, but is general enough that it works. Its syntax is a little on the shell scripting side but it has the advantage that if you learn to use it with OCaml, it's easy to apply your knowledge to projects in other languages. GNU make also supports parallel building. If your Makefile is written properly, make can spawn multiple "jobs" to potentially increase build speed on systems with multiple CPU's. This is done with the -jn flag where n is the number of jobs.

make's syntax is very much like shell scripting but its semantics aren't entirely procedural. It's much like a combination between a General Problem Solver, a database with rewriting rules and a shell. What it does is create a dependency graph to the first target listed in the file and then runs commands in order to satisfy the target's dependencies. For instance, if you have a list of object files that must be compiled before linking some executable, "executable_file", you would need the rule "executable_file: object1.o object2.o object3.o ..." The command to perform the linking goes on the next line after a tab. Like Python and Haskell, make makes use of white space for its syntax. So keeping mind that the tab is required, a common sort of entry in a Makefile is

executable_file: object1.o object2.o object3.o
    linker -o executable_file object1.o object2.o object3.o

You can replace the target with "$@" and the dependencies with "$^". "$@" and "$^" are automatic variables.

executable_file: object1.o object2.o object3.o
    linker -o $@ $^

For debugging, I found the most useful thing to do was "make -n", which tells make to print what commands it would run but not run them. Here is what my Makefile prints:

jeff$ make -n
ocamlc atn.mli
ocamlc -c -o atn.cmo atn.ml
ocamlc iMap.mli
ocamlc -c -o iMap.cmo iMap.ml
ocamlc q.mli
ocamlc -c -o q.cmo q.ml
ocamlc matching.mli
ocamlc -c -o matching.cmo matching.ml
ocamlyacc regex.mly
ocamlc regexTypes.mli
ocamlc regex.mli
ocamlc -c -o regex.cmo regex.ml
ocamllex regexLex.mll
ocamlc -i regexLex.ml > regexLex.mli
ocamlc regexLex.mli
ocamlc -c -o regexLex.cmo regexLex.ml
ocamlc regexATN.mli
ocamlc -c -o regexATN.cmo regexATN.ml
ocamlc -a -o regexLib.cma atn.cmo iMap.cmo q.cmo matching.cmo regex.cmo regexLex.cmo regexATN.cmo
rm regex.ml

As you might have guessed, my project involves regular expressions. The final file is a library of several modules, regexLib.cma. Interestingly, make will try to deduce what temporary files can be cleaned up after it's done. In the above case it's deleting the file generated with ocamlyacc, which is fine with me. Unfortunately there was one file it wanted to delete that I wanted to keep for debugging reasons. To keep it I simply added it to the list of dependencies for regexLib.cma.

To switch between compiling bytecode or native code I just change the variable TARGET before running make or give make the argument TARGET="BYTECODE" or TARGET="OPT". My Makefile assumes every file will be a module and so it creates an .mli for .ml files that have no corresponding .mli file. I'm also building a library, not an executable, but for an executable only minor changes would be needed.

#TARGET is BYTE or OPT (i.e. machine code).
#On the command line, set this with TARGET="OPT" or TARGET="BYTE".
TARGET = BYTE

#Set OC to the compiler being used and file suffixes be appropriate
#for TARGET.
ifeq ($(TARGET),OPT)
OC = ocamlopt
SUFFIX = .cmx
SUFFIX_LIB = .cmxa
else
OC = ocamlc
SUFFIX = .cmo
SUFFIX_LIB = .cma
endif

#Command to compile object code file.
OC_OBJ = $(OC) -c -o
#Command to compile a library.
OC_LIB = $(OC) -a -o
#Command to compile an interface to a source file.
OC_I = ocamlc -i

I hope the syntax is obvious so far. Referencing variables will look odd to some. It's done with $(name) and the names are case sensitive. Technically, the above will set OC to ocamlc for any value of TARGET that is not OPT, but since it's a binary choice I don't see that is a bug. Having references to SUFFIX and such later in the file make it look a bit messy, but it saves the trouble of making a new Makefile or other complicated procedure when I want to compile to machine code. Having compile commands in variables is makes it easier to globally change them.

Several guides I looked at recommended putting all of your object files in one variable so that they are easy to put into depency rules. For OCaml, generated interface files (.mli) might be necessary to keep for reference. In addition, some compiled interface files (.cmi) should be kept for users of the library. I'll put such files in the INTERFACE variable.

OBJ = atn$(SUFFIX) iMap$(SUFFIX) q$(SUFFIX) matching$(SUFFIX) \
    regex$(SUFFIX_LIB) regexLex$(SUFFIX) regexATN$(SUFFIX)

INTERFACE = q.cmi matching.cmi regexTypes.cmi regex.cmi regex.mli \
    regexLex.cmi regexLex.mli regexATN.cmi

$(SUFFIX) will be replaced with either .cmo or .cmx depending on the definition of TARGET.

So far I haven't told make to do anything. Since the first target in the Makefile is the one built by default, I list the final library file first.

regexLib$(SUFFIX): $(OBJ) $(INTERFACE)
    $(OC_LIB) regexLib$(SUFFIX_LIB) $(OBJ)

The first line says that the file regexLib.(cma|cmxa) depends on all the object code files and all the compiled interface files. The second line gives the command to create the file regexLib.(cma|cmxa). For bytecode, you can see the full expansion on the second to last line of the log (the above output from make -n).

It would be tedious and repetitive to manually write the dependency rules and associated commands for every file. Like the following:

regex.cmo: regex.ml regex.cmi
    $(OC) $@ $<

regex.cmi: regex.mli regexTypes.cmi
    $(OC_I) $<

regex.ml regex.mli: regex.mly
    ocamlyacc regex.mly

so I don't. Luckily I discovered make's template system. (I think of these rules as templates, but GNU calls them Static Patterns.) Targets that match a specified prefix and suffix can have an automatically generated dependence rule. Here is the macro for compiling a bytecode object file:

%.cmo: %.ml %.cmi
    ocamlc -c -o $@ $<

"%.cmo" means "match any target such that it has the prefix ".cmo"." The % signs on both sides of ":" are expanded to whatever the target is. So since regexLib.cma depends on q.cmo, if I have not specified a rule for q.cmo, make uses the template and fills in "q" wherever there is a "%", "q.cmo" wherever there is a "$@", and "$q.ml" wherever there is a "$<". Note that "%" cannot appear in the commands, only in the target and dependancy lists. I say lists because you can have more than one target for a particular pair of dependancy and command lists. Since ocamlyacc generates both .ml and .mli files from a .mly file, the following template will compile any ocamlyacc grammar files:

%.ml %.mli: %.mly
    ocamlyacc $<

Here are the rest of my templates:

%.cma: %.ml %.cmi
    ocamlc -a -o $@ $<

%.cmx: %.ml %.cmi
    ocamlopt -c -o $@ $<

%.cmi: %.mli
    ocamlc $<

%.ml: %.mll
    ocamllex $<

#Create an interface for any .ml file that doesn't have one.
%.mli: %.ml
    ocamlc -i $< > $@

These templates are nice for simple cases but Make requires that we manually enter dependencies for each object file. For my project. regex.cmi requires regexTypes.cmi, so the rule to compile regex.cmi would be

regex.cmi: regex.mli regexTypes.cmi
    ocamlc regex.mli

Writing something like that manually for every file defeats the purpose of having templates. This is where the sometimes-not-procedural part of make comes in handy. In addition to the template, I include "regex.cmi: regexTypes.cmi". When using the %.cmi template to target regex.cmi, make will add regexTypes.cmi to regex.cmi's dependency list. Thanks to this useful functionality, the only special case in the Makefile is the final library.

There is one downside to how make handles these rules. In OCaml, to compile an executable or object code file, we often need to have available an interface file (.cmi) to some library, but OCaml compilers fail when given .cmi files as arguments. To further automate the Makefile, I would like to use a template for making library files.

%.cma: %.ml %.cmi
    ocamlc -a -o %@ $^

The problem with this is that any .cmi files that the target is dependent upon are given to ocamlc and ocamlc then chokes on them.

jeff$ ocamlc -a -o regexATN.cma regexATN.ml regexTypes.cmi
/usr/bin/ocamlc: don't know what to do with regexTypes.cmi.

Furthermore, for an executable, the %.ml file will need to be the last one on the argument list. There is perhaps a work-around involving functions, but I have yet to play with those to find out.

The final thing to add is a "clean" target, so that "make clean" will remove any files used during compilation. Since clean is a target that does not create a file, it will confuse make unless we include the ".PHONEY: clean" directive, to tell make that clean does not produce a file.

.PHONY: clean
clean:
    rm -f regex.ml regex.mli regexLex.mli regexLex.ml
    rm -f *.cma *.cmx *.cmxa *.cmi *.o *.a *.cmo

That's the end of my Makefile. You can download it if you like. It should be easy to adapt it to other simple OCaml projects. I hope I helped make make your life a little easier.

No comments: