Space Leak Diagnostic and Remedial Tool

If you’re reading this, you probably don’t need to be told that space leak is the bane of lazy languages like Haskell. Quoting briefly from Neil Mitchell’s 2013 article in ACM Queue:

“[Writers of] compilers for lazy functional languages have been dealing with space leaks for more than 30 years and have developed a number of strategies to help. ... Despite all the improvements, space leaks remain a thorn in the side of lazy evaluation, producing a significant disadvantage to weigh against the benefits.”

“Pinpointing space leaks is a skill that takes practice and perseverance. Better tools could significantly simplify the process.”

Seqaid (package seqaid on hackage) is a tool to help debug space leak in Haskell projects.

The key contribution of seqaid is runtime strictness optimisation, through dynamic, principled forcing, as supported by deepseq-bounded.

This package consists of

Seqaid can auto-plug many leaks. [1] However, not all leaks can be addressed by adding strictness; and, although most can (in my experience), usually these could also be addressed by refactoring in such a way that no added strictness was required. It may be that clues to such refactorings are offered by observation of optimal forcing patterns.

I believe that seqaid has potential to be a useful resource, but this is only a bare-bones first-version. The ultimate goal would be to see some evolution of this make it into compilers (perhaps even the GHC RTS), supposing a better solution hasn’t arrived first.

Seqaid is more general-purpose than just a space leak tool. It contains TH and GHC plugin code to automatically [2] instrument Haskell packages with an arbitrary wrapper function around select expressions. There’s nothing (much) to stop you from substituting your own wrapper functions. When the dust settles here, I should probably try to split off a wrapper-agnostic auto-instrumentation tool.

We do not have the luxury of changing the types of the functions we’re instrumenting, so passing extra state in any pure way is out of the question. In seqaid specifically, per-call-site data is maintained by judicious use of unsafe IORef operations. The unsafe operations are no more dangerous than seq (no matter how wrong things go). But it is true that, technically, program semantics can be affected by strictification: Adding strictness to a program may introduce bottoms (in addition to any that might already be lurking).

Happily, although seq breaks free theorems, seqaid does not. [3]

Dynamically Configurable Parallelism

With recent upgrades to deepseq-bounded, (the) dynamically-configurable parallelisation harness now comes for free, and should also prove useful. The strictness harness machinery of seqaid will work without change for this purpose, including the optimiser, and promises some interesting experiments!

How to Use Seqaid

A good introduction to using seqaid is generated by running seqaid demo. This will create a fresh, uniquely-named subdirectory in the current directory, and populate it with a sample project (leaky), build it with seqaid auto-instrumentation in effect, and run it. The README file in the generated directory has more information about the meaning of the output.

Note the file seqaid.config, a small text file with a very simple format which you write to control seqaid. Here is the seqaid.config file for leaky:

package    leaky
module     Main
  binds    duty
    types  Types.TA
instances  Types.TA, Types.TB, Types.TC

This file instructs the seqaid preprocessor to build with a conditional forcing wrapper (strictness harness) for all subexpressions of type TA occurring in the equations of the bind duty in module Main. It furthermore instructs seqaid to automatically derive necessary instances for instrumenting user-defined data types Types.TA etc. [4]

Besides the seqaid.config file, all that is needed to use seqaid on your own project are some additions to the .cabal file. This includes some extra dependencies (notably, seqaid; possibly a few others; Cabal will let you know), and the following GHC options:

  ghc-options: -fplugin=Seqaid.Plugin
               -F -pgmF seqaidpp

Manual instrumentation is also supported, should you prefer. Simply wrap any expressions you’d like in harness with a seqaid application. Although seqaid is not an actual function, from the user’s perspective it must be treated as if it was, in the fixity/associativity context of each call site. For technical reasons [no good reason really...], this presently also requires adding a SeqaidAnnManual annotation for each bind you edit.

Leaky Contrived to Showcase Seqaid?

Some words about the derivation of leaky are here, where the output of seqaid applied to leaky is also discussed.

Work in Progress...


I’m just fretting that I should have investigated using a GHC API app as a GHC replacement instead, if by chance it can be done light-weight. Well, at worst the GHC program is only a bit over 1 MB in size. Er ... but compiling it is a serious build, this is not like any old cabal install from hackage... But also, it seems GHC API can be used with TH. The problem seems to be that due to staging restrictions you still cannot modify existing declarations this way. Best bet is really some plugin facility between Parser and Renamer/TC.

Most all the work up till now has been to get the basic infrastructure in place. I hustled this preliminary version of seqaid together in the wake of writing the deepseq-bounded library, just as proof of concept. Unfortunately, auto-instrumentation turned out to be a nightmare, particularly as I had no TH or Core programming experience, setting me back by more than a month...

So I’ve determined to release leaky, deepseq-bounded, and seqaid without further delay.

After a much-needed vacation from these topics (which are not my interest; I’m an applications programmer!...), there are a few more things I’m intending to do.


From this point on (including note [3]), I may be talking nonsense in places.
I’ll ammend this preliminary stuff subject to discussion and new learning.

Immediate priorities

Two necessary things are still pending:

Other issues and plans


It would be interesting to see whether instrumentation could be pushed further down the compilation pipeline, or even implemented "from the other side" in the RTS. We might say that RTS can perform reciprocal injection, and think of this as another level of injection deferral.

But the RTS is not a compilation stage of your code; it is a virtual machine (mostly written in C) which evaluates your code, so implementing seqaid in the RTS will probably be quite a departure from the above.

The new Seqable module of deepseq-bounded is a step in the right direction, but I’ve not had time to really investigate it let alone document it.

And anyway, it turns out seq is completely desugared by the time it hits the simplifier, so we’re dealing with Core let versus case antics, and not even function applications.


[1] See for example the Space Leak wiki page. 
[2] A preprocessor takes care of all necessary pragmas and imports. The TH and Core (plugin) code then work in tandem to complete the refactoring. 
[3] Patricia Johann and Janis Voigtländer (2004),
Free Theorems in the Presence of seq We avoid this problem entirely, because seqaid never modifies the original source code: generic strictifying code is injected during compilation, and then configured dynamically. Thus there is no impediment to program transformation based on free theorems. Since parts of the GHC simplifier depend on free theorems, it is necessary to inject seqaid instrumentation after these optimisation phases. This also seems to indicate the desireability of an RTS solution.

Of course, FT transformations may affect the space use behaviour of your program, resulting in shifts in optimal forcing patterns, like rocks tumbling in the freshet will shift the rapids. If performance tuning was your game, you could even turn the tide and use feedback from seqaid to search for optimising transformations...

In case you are somehow able to perform your transformations at runtime, even that is no problem. You don’t even need to turn off the plugin! The forcing harness can be made truly transparent by simply asking it, dynamically and selectively, to lay off while you transform. 

[4] The instances required are some subset of Typeable, SOP Generic, and NFDataP and superclasses, and the exact set currently depends on Cabal flags.

Another point to note about the seqaid preprocessor concerns inlining. GHC inlines very aggressively, even across module boundaries. If you’ve requested instrumentation of bind duty, but somehow duty is inlined before the Core plugin runs, the expressions you hoped to harness will slip through the net. If you can afford a complete (blanket) harness, this problem will never arise, as it has to be inlined somewhere! Another possible solution would be to have TH add a NOINLINE pragma to any binds mentioned in seqaid.config; but weakening the inlining powers of a functional language compiler is generally a bad idea — not just for the function call overhead: it blocks potential optimisations at the call site. So both of these solutions come at a performance cost, but both guarantee your fish will go to market.