Posted by Jesper on February 10, 2024
As a person who has worked quite a bit on Agda, I sometimes get the question why Agda does not have a proper core language. And it’s a valid question, given that most - if not all - closely related languages do have a (reasonably) well defined core language.
You might also have heard that I am working on implementing a core language for Agda - implementing it in Agda no less. The work on that is progressing slowly with fits and jumps, and I hope to tell you more about it soon. In the meantime, let’s look at some reasons why it would be a good idea to have a core language, and other reasons why it might not be such a good idea.
This post is mostly meant as an overview, not going very deep into any of the points. But I hope it’ll be useful as a reference and a starting point for further discussion.
PRO 1: It reduces the trusted code base
The first and most obvious reason why you would want a core language is known as the de Bruijn criterion: if we can separate the generation of a proof from its verification, then we only need to care about the soundness of the checker and not the rest of the system. Since a core language is typically quite small, it is much easier to trust its implementation through either manual inspection or formal analysis.
PRO 2: You can implement independent checkers
Related to the first point, even if you do not trust the result of the provided type checker for the core language, implementing an independent checker for a small core language is not a huge task: it will probably take days rather than years. One example of this is the Kontroli checker for the LambdaPi language. There are probably many other examples of independent checkers but (surprisingly?) I haven’t heard of them.
Update: Sebastian Ullrich writes:
Regarding existing external checkers, Chris Bailey has not only implemented one for Lean 4 in Rust but also written a whole book on how to do that https://ammkrn.github.io/type_checking_in_lean4/
PRO 3: You can’t do meta-theory without one
In the same line as the previous two, for the ultimate trustworthiness of your language you might want to develop its formal meta-theory either on paper or - ideally - in a proof assistant. You might prove basic properties such as confluence, type preservation (i.e. subject reduction), and progress, or tackle the big questions of logical soundness and strong normalization. Either way, you will not get very far doing this to a full-fledged surface language intended to be convenient to use, so having a core language is a must.
PRO 4: Syntactic sugar can be added without changing the backend
Moving on to the more practical benefits of a core language, it provides a nice separation between the front-end of your language (which elaborates into the core language) and the back-end (which takes the core and compiles it to something executable). In particular, you can easily add new syntactic features to the surface language, as long as they can be desugared into the existing core, all without ever touching the core or backend.
PRO 5: It serves as a basis for metaprogramming
Another practical benefit of a core language is that the AST of the core language is much simpler than that of the surface language, so it is much easier to analyze or transform code in it. This is particularly useful for user-facing features that generate or transform code, variously known as tactic languages, metaprogramming APIs, or compile-time reflection frameworks.
PRO 6: It enables easier export to other languages
As a final practical benefit of a well-defined core language, it makes exporting your programs and proofs to other languages a much more feasible task. I’m not saying it becomes trivial: proof export still requires a lot of work for aligning or encoding language constructs, typing features, and library concepts. But the progress on the EuroProofNet objectives shows that it is not impossible either!
CONTRA 1: You already have a core language
Even in a language that does not have an official core language (such as Agda), chances are that there will be something very much like a core language hiding in its implementation. In Agda’s case, that is its internal syntax - consisting among others of the
Defn types. It certainly doesn’t count as a full core language because it is not separated from the rest of the codebase (failing PRO 1), there is no independent checker for
Defn (failing PRO 2, though there is one for
Term), and I haven’t met anyone crazy enough to do meta-theory for it (failing PRO 3). Nevertheless it is still used as a target for syntactic sugar such as
with-abstraction and pattern-matching lambdas (PRO 4), it is the basis of Agda’s reflection API (PRO 5), and it is used for exporting proofs to Dedukti and programs to Haskell (PRO 6). So building a true core language is more a matter of increasing trust rather than adding functionality.
CONTRA 2: Desugared code might be less efficient
While a small core language is good for increasing trust, making it too small can also mean we lose information that could help the compiler to produce more efficient code. While a sufficiently smart compiler could in theory reconstruct this information, in practice compilers are often not “sufficiently smart”. One example of this is the Equations plugin for Rocq, which compiles pattern matching definitions to primitive case statements. In this process, it bakes in a specific order of case analysis in the term and produces large terms witnessing the unification steps (or at least, it did so when I looked at it a couple of years back, it might have improved since). So if you want to output efficient code it might make sense to start from a less minimalist representation of the program.
CONTRA 3: Error messages become harder to read
Related to the previous point, if (some part of) our programs and proofs are only checked after they are translated to the core, then there is a risk that error messages will become harder to read as a result. This is not inevitable, as it is possible to carefully keep track of the origins of terms to they can be printed like the user wrote them, but that is not easy to do everywhere. In general, producing error messages for dependent type checkers that can actually be understood by someone unaware of the underlying type theory is very much an open problem, and one I would like to investigate more in the future.
Update. Jan de Muijnck-Hughes writes:
idris2 does a good job in translating core terms back into the surface language syntax. Also @d_christiansen implemented error reflection in idris1 and this allowed one to rewrite error messages for your eDSL in your surface language.
CONTRA 4: Non-sugar features become harder to add
While adding new syntactic sugar that can be desugared into the existing core language is relatively easy, a monolithic core language does make it much harder to experiment with new features that change the type theory itself. Examples of this are definitional proof irrelevance, rewrite rules, quantitative types, sized types, universe polymorphism, and of course cubical type theory. As anecdotal evidence, Agda has all of these features while Rocq only has some (definitional irrelevance and universe polymorphism), and those took much more effort to integrate. But there is also such a thing as feature bloat, so having higher friction against adding new features might actually sometimes be a good thing!
CONTRA 5: It distracts from other things
Perhaps my biggest hesitation against working towards a real core language for Agda is that it takes a lot of effort to build, especially since Agda was never really designed with a separate core in mind. This is effort that does not go towards other tasks that could end up having more impact towards the goal of actually having more (partially or fully) verified software. A few of these other tasks that might have a higher impact on the margin:
- Increasing the user-friendliness and accessibility (since a core is useless if no-one is using your language).
- Making the task of writing formal specifications easier (since proving something is useless if we are not proving the right thing).
- Verifying the correctness of the elaboration process (since even a correct specification could be elaborated to a wrong statement in the core language).
None of these other goals invalidate the necessity of having a core language, but they do put it into perspective of the grander overall goal: to make it more likely that we mean what we say by improving our ability to say what we mean (thank you for that quote, Conor).
So these are the six reasons why I’m working on a core language for Agda, and five pitfalls I’m trying to be mindful of while doing so. Let me know if you agree with these reasons or not, which ones you think weigh the strongest, and if there are any others I missed! You can send comments on the Agda Zulip, to @email@example.com via any Fediverse portal, or email me at firstname.lastname@example.org.