Personal debriefing of CICM'2016

Nicolas M. Thiéry - CICM 2016, Białystok, Poland - 01 Aug 2016

About this blog post
About CICM
Some talks I particularly enjoyed
Knowledge use in systems: stuff I learned or rethought about
Using views + MMT + alignments for Sage profit?

About this blog post

For a number of years, I have been following the announcements of the CICM (Conference in Intelligent Compupter Mathematics), not quite daring to hop in given that this is on the far fringe of my usual research topic (Computational Algebraic Combinatorics). This year gave me the occasion to attend for the first time, being honoured to deliver an invited talk there about the category infrastructure in Sage.

After the conference, I enjoyed two days of hiking and biking in the nearby Bielowiesa national park which gave me plenty of time for afterthoughts (well, when not busy dodging mosquitoes), hence this blog post.

About CICM

Since a decade, CICM collocates several workshops: Calculemus (automated reasoning systems and Computer algebra), DML (Digital Math Library), MKM (Mathematical Knowledge Management), AISC (Artificial Intelligence and Symbolic Computation) with the deliberate strategy to bring together a diverse crowd of people. In particular, the Tetrapod workshop is meant to exchange ideas between people involved in the four main areas of mechanized mathematics: computation, data, knowledge management, and deduction (proof systems).

I very much enjoyed the experience, discovering a nice and welcoming community, and participating to many enlightening discussions. It was a good opportunity to step back and enlarge my views.

Some talks I particularly enjoyed

Invited talk of John Harrison on HOL Light: Algebraic Combinatorics at CICM!

John Harrison is the lead developer of HOL Light, a proof system based on higher order logic. I did not quite expect to hear about algebraic combinatorics in this conference! So I much enjoyed the part about the formalization of the Wilf-Zeilberger algorithm for proving identities.

John’s reflections reminded me of Florent’s on his work on the formalization of the Littlewood-Richardon’s rule: in algebraic combinatorics, the nitty-gritty details can become very cumbersome (lengthy case studies) to write down while being, most of the time, irrelevant. Hence the natural – and actually vital – tendency in this field to skip them in proofs. Here proof systems could contribute added value, by mechanizing at least the proof verification, if not parts of the proof of the nitty-gritty details.

In the case of WZ, the nitty gritty details all come from the behavior differences between rational functions as algebraic versus analytic functions (simplification, convergence).

Michael Kohlhase’s talk on OpenDreamKit and Math-in-the-Middle

Energetic as usual! The goal was to expose our ongoing work to the audience, and this triggered quite some interesting exchanges.

Dennis Müller’s talk on alignments (Rosetta Stone’s on steroids)

Dennis presented his ideas about a database of alignments between systems, in MMT. Think the Rosetta stone dear to Tim Daly’s heart, crowd-sourced and mechanized.

An alignment is a matching between two names for the same concept, typically in two different systems, like:

“cardinality” in Sage vs “Size” in GAP
“Groups” in GAP vs https://en.wikipedia.org/wiki/Groups
“Parent” in Sage vs “carrier” in formal logic systems
“Groups” vs “AdditiveGroups” in Sage

An alignment can be formal as in 1 (exactly the same concept). Even more formal would be a proven alignment between e.g. two proof systems. On the other hand, 3 is informal (the alignment only occurs when speaking about the elements of an algebraic structure; also the carrier is the underlying set while the parent is the set endowed with its structure), and 2 as well since the page content on wikipedia’ is. The alignment can also be up to a “functor”, as in 3 where multiplication is mapped to addition.

Alignments are at the core of our so-called Math-in-the-Middle approach.

Dennis’s thesis:

We should care both about formal and informal alignments, i.e. about flexiformality as the KWARC group would put it.
The Rosetta stone should extend to proof systems, knowledge management systems, …
We can achieve this by crowd sourcing & mechanizing: in short, have system authors and users progressively contribute the alignments they know of (manually or systematically), and compute adequate transitive closures.

To bootstrap the social process, it’s important that each contribution gives immediate added value for the contributor.

Florian Rabe’s MMT tutorial

Florian led, with the help of the other KWARC members, the first official introductory tutorial to MMT (Meta-Meta-Theory). This was a good occasion for me to learn more about this system for representation of mathematical knowledge, with a strong emphasis on genericity (the meta-meta keyword): bring your own logic, your own syntax, …

Jacques Carette’s talk on MathScheme and code generation

After years as core developer in the very informal world of Maple, Jacques Carette is now exploring the other extreme, and in particular how to best exploit the mathematical knowledge in the design of mathematical systems (rings a bell?).

Together with William Farmer, he has been developing MathScheme “An Integrated Framework For Computer Algebra And Computer Theorem Proving”. A particular focus of his talk (or of my attention?) was on automatic “code” generation, in particular using views (I’ll be using “code”, but this includes as well proofs, …). For example, he manages to avoid all the code duplication that we have in Sage / GAP between the hierarchies for additive structures and multiplicative structures by generating automatically one for the other (or maybe both from yet another?); in practice he uses Haskell’s substitution feature to transliterate statically all the operations appearing in the code: 1->0, *->+, …

My invited talk on Sage category infrastructure :-)

I enjoyed delivering it! And there were quite some discussion during and after the talk. Here are my slides.

I first introduced my pet analogy that Sage categories are the bookshelves of the Sage library: as mathematicians, to lookup features of groups (e.g. definitions, theorems, proofs, algorithms, data), we go to the library, and search through the bookshelf about group theory. Similarly, when computing with groups, Sage searches through the Sage category of groups to find code (and doc, and tests).

The bookshelves are organized hierarchically: sometimes the feature we need is more specific, and we lookup the sub bookshelf about e.g. the theory of finite groups. Sometimes it will be more general, and we lookup the larger bookshelf about semigroup theory.

I then put some focus on code bloat taming by:

Generating automatically the inheritance
Using the lattice structure of categories/theories to only implement the non trivial categories (e.g. those containing code)

Knowledge use in systems: stuff I learned or rethought about

From the discussions at CICM, there is a consensus that, with the growth of proof/computation/… mathematical systems, explicitly using the mathematical knowledge is central in the design and implementation of the former. Of course, given the themes of the conference, there is a strong bias. Still I was glad to hear it was not just a fancy of mine :-)

In practice, many systems have developed infrastructure to use such knowledge:

Computer algebra systems: categories in Axiom/Aldor/MuPAD/Sage
Proof systems: Gonthier’s team MathComp library for Coq
Knowledge management system: MMT

In particular, the development of such infrastructure has always been prompted by concrete scaling needs, and in particular code bloat taming. The key ideas are of the same nature, but different systems have different implementations and are more advanced in different directions due to the variety of constraints:

using an existing language or not
static vs dynamic
formal vs informal
current breath and depth of the system. E.g. mathcomp was originally designed for the proof of a specific theorem which made it relatively deep but narrow.

So it’s good to be discussing across systems to learn from each other what kind of issues our own systems may be facing next, and how such issues have possibly been tamed elsewhere.

Below are notes about various things that were discussed and matured, or that I learned at the occasion of the workshop.

Terminology

Of course, coming from different communities, we use different terminologies. In fact terminology was the main problem I have had when typing those notes. So one thing to discuss further would be the standardization of terminology for all the concepts involved.

In the mean time, here are some more or less informal alignments:

Sage	logic world (e.g. MMT)
Sage category	theory
subcategory	category extension
join or intersection of categories	push out of theories
meet of categories	pull back?
parent	carrier
functor	view

Here are some other concepts for which we need to decide on common terminology:

realms, bookshelves, …
Mixins
functorial construction
category with axiom, category defining/implementing an axiom
stating that the implementation of a feature (e.g. a view, an axiom, a construction) should be split into little chunks, one for each category it is relevant for. By lack of something better, I’ll speak for now of split implementation of a feature (better suggestions welcome!). Paul suggests this has been called “small theory approach” in the logic world.

“Mixins”

Compared to other areas, in mathematics there tends to be relatively few fundamental concepts (e.g. magmas, associativity, …); all the richness comes from the many ways those concepts can be combined to obtain, e.g. fields. More so, new concepts tend to appear very low in the concept hierarchy (with exceptions; e.g. “connected” which appears at the level of graded modules); there may be a bias in algebra w.r.t. other areas (analysis?).

A key point is that knowledge enables splitting the features into small feature sets (e.g. features of commutative groups that are not necessarily valid for all groups) that are manageable, reusable, and composable according to the hierarchy of realms, to cater for the variety of combinations of the basic concepts.

In Sage, each such chunk is essentially a collection of “Mixin classes”, together with some metadata.

Terminology: do we want to use the word “Mixin” to speak about those feature sets?

Formally defining versus implementing a realm

Implementing “Groups” can have two rather distinct means:

formally defining what groups are (operations / axioms)
providing the features that holds for groups

Typically, in a book about group theory, the first pages will be about the definition (and recap about semigroups), and all the rest about the features of groups. In Sage, we focus mainly on the later. Jacques liked the bookshelf analogy to express that.

Those two goals may be better achieved in two steps. Indeed, the formal definitions tend to be tricky, in particular to avoid definition loops: for example, to state that additive groups are modules over the integer ring, one first need to define this ring which is … an additive group itself. On the other hand, the amount of material is much smaller, since one really only need to define the basic concepts (magmas, associativity, …) and combine them by trivial push outs.

Providing the features can then be done in a later step. So we would have:

a hierarchy of defining theories, with just the theories needed to define new concepts, and possibly some trivial push outs for naming purposes; e.g. we would have trivial push out theory that just states that a “monoid” is a “magma” that is both “associative” and “unital”.
a hierarchy of feature theories

the latter being typically ``conservative theory extensions’’ of the former.

Somehow this is exactly what we are doing for Sage in the Math-in-the-Middle approach: we use MMT’s latin1 as defining theories, Sage categories as feature theories, and the semantic annotations of Sage categories (aka Sage <-> MMT alignments) to produce the interface theories aligning the two.

Inheritance versus composition and views

There are several ways to express that a ring is a set endowed with both a multiplicative monoid and an additive monoid structure. In Sage, we use plain inheritance, having Ring inherit from AdditiveMonoid and MultiplicativeMonoid. These two are implemented independently (duplication!!!). If both define a function “cayley_graph”, there is an ambiguity for rings: is the “cayley graph” about the additive or multiplicative structure.

An alternative – used in MMT, MathComp, and … – is to use composition and views, saying that Ring is the composition of MultiplicativeMonoid and AdditiveMonoid, both being views on Monoid. Then, the operation cayley_graph can be disambiguated by accessing it through MultiplicativeMonoid.cayley_graph or AdditiveMonoid.cayley_graph.

Using composition only would be very cumbersome: to access the zero of the ring we would need to do use something like AdditiveCommutativeMonoid.AdditiveMonoid.UnitalMagma.unit. This is resolved in MMT by doing explicit imports/aliases: “Ring.zero” is an alias for “AdditiveMonoid.zero” which itself is a view on “Monoid.unit” which itself is a view on “UnitalMagma.unit”. That’s all fine because this is all resolved statically, with no runtime overhead (when it’s about computation).

Claudio mentioned that one particular reason for using composition in MathComp is that Coq does not support multiple inheritance.

Referencing subcategories from super categories

In Sage, one can create the subcategory of a category (say Groups) by applying axioms or constructions (say Finite) with the syntax:

Groups().Finite()

This is reflected in the code by having the class Groups holding an attribute Finite that references the class FiniteGroups.

Jacques rightfully pointed out that this is an anti-pattern: a class (in a rather large sense) should not need to be aware of its subclasses, in particular so that adding new subclasses does not require changing the code of preexisting classes.

That’s a valid concern which is mitigated by concrete added value:

From a user interface point of view, it’s very handy to be able to create new categories from basic ones: this enables having a single entry points for groups, from which one can create all variants (finite, finitely generated, compact, quotient, …).

An alternative syntax would be to use push outs only: Groups() & FiniteSets(). This would not require a priori references to subcategories. But from a Pythonic perspective, the former syntax further enables the discovery by introspection of which axioms/constructions can be applied.
This enables a fast algorithm for computing joins (roughly linear in the size of the created hierarchy)
The reference Groups -> FiniteGroups could be set a posteriori by the FiniteGroups source code, rather than the Groups source code. This is not done so just to support lazy loading of the library.

Using views + MMT + alignments for Sage profit?

Jacques’s talk was thought provoking, as not only he could use views to reduce duplication in knowledge representation, but also in code, thanks to Haskel’s code transliteration features.

Combined with the various other discussion points, I am now wondering whether we could use dynamic views for producing uniformly, from views and alignments represented in MMT, automatically generated split implementations of

the exp / log constructions between additive structures and multiplicative structures
the subquotient, quotient, subobjects and isomorphic objects constructions
a proper forgetful functor
the semantic handle Sage-XXX interface
the additive structures hierarchy from its multiplicative counterpart
facade from non facade parents (aka the OO vs modular programming disease)
…

Context: code duplication in Sage

It’s been a while that the code duplication between additive and multiplicative structures in Sage (and most if not all other computational systems I know of) has been itching.

I should say that it’s not yet critical though: we currently live with only a small subset of features for additive structures (basically those really needed for rings) and having those being duplicates of the equivalent features of multiplicative structures. That’s manageable and a plausible trade-off for some simplicity (using plain OO).

Still interesting to scratch further, if not just out of curiosity.

No static transliteration in Python

The first thing that had stopped me from scratching this issue so far is that, unlike Haskel, Python has no transliteration feature. At least not that I know of. Is this intrinsic? After all Python allows for full read-write reflection on the code AST, so nothing prevents code rewriting (what about Cython?).

The main difficulty however is that there is little type information to determine, in a piece of code, when a method call should be transliterated or not. Let’s take the following method:

def f(self, i):
    return self+self, i+i

If we want to transliterate from additive to multiplicative, it’s clear that we want to replace self+self by self*self. However should i+i be replaced by i*i? Maybe this can be derived by type inference from the context, but this is a non trivial endeavour.

But we have adapters for dynamic transliteration

Let’s take the Sage-GAP interface. For the longer while we have had a semantic handle interface which allows for manipulating a handle h to a GAP group G as if it was a native Sage group. It works using the classical adapter design pattern: the handle h is wrapped in an adapter object Gh with adapter methods like cardinality that delegates calls of the form Gh.cardinality() to the equivalent GAP call Size(G) thanks to an

It had been itching me for a long time that this was available only for groups. Last january in Saint Andrews I started making a split implementation of this. So now we had semantic handle interfaces for semigroups, magmas, etc for free, and for rings and such for the price of implementing just the additional adapter methods.

The next step I initiated in Saint Andrews was to generate automatically this split implementation from alignements. Indeed, most of the time all we need to know to generate an adapter method is the name alignment and some basic signature information.

This is still work in progress with Samuel Lelièvre:

 https://github.com/nthiery/sage-gap-semantic-interface

Putting things together: the Exp/Log functors for free

What I realized over last week-end is that putting together

The interface theory for additive structures in Sage
The additive -> multiplicative view in MMT
The interface theory for multiplicative structures in Sage
Dennis’ service for doing the transitive closure of the above alignments
An automatic generation of adapters as above

we could generate automatically, in Sage, the Exp construction that takes an additive monoid and adapts it into a multiplicative monoid.

We already have a crude implementation of the Exp construction in Sage. We use it when we have an additive monoid and get frustrated by the lack of features available. However this construction is manually generated and thus very far from complete. The automatic generation would also make for a much smaller code base which is thus easier to optimize (e.g. compile with Cython).

Of course the same could be done for the Log functor.

The additive structures hierarchy from its multiplicative counterpart

The hierarchy of categories for additive structures could be generated from the multiplicative one. To compute an operation x on an additive monoid M, one would apply the equivalent operation on the corresponding multiplicative monoid.

When no alignment exist for an operation, one could provide one automatically by adding an additive_ prefix to the operation name. This trick could be used as well for disambiguation, so that a ring would have two operations additive_cayley_graph and cayley_graph, mimicking composition.

Further applications

An adapter method for an alignment between f and g typically looks like:

def f(x, y):
    wrap( g( unwrap(x), unwrap(y) ) )

An other location in Sage where we use very similar methods are quotients. Typically the product in a quotient A of an ambient space B is computed using:

def mul(x,y):
    retract( g(lift(x), lift(y)) )

where retract is the quotient map from B to A, and lift a section. More generally the same adapter can work for subquotients, assuming proper lift and retract maps.

Based on this principle, we currently have in Sage a manually written split implementation of a hierarchy of categories for computing the basic operations on A just from the lift and retract map to B. The number of features available depend on whether A is a subquotient, a subobject, a quotient of or is isomorphic to B.

Using the trivial alignment, this hierarchy of categories could be automatically generated. This would require some additional semantic information, to specify which operation actually pass down to quotients, subquotients, or subobjects, and not just isomorphic objects.

About the dynamic vs static overhead

Using dynamic adapter classes and methods rather than static code transliteration has a cost. Let’s consider for example the Log functor, and analyse the overhead of calling a method g through an adapter method f on adapter objects x and y, e.g. as in x.f(y):

def f(x, y):
    wrap( g( unwrap(x), unwrap(y) ) )

We further assume that the adapter class and methods are written in Cython. The adapter method will typically be a generic Cython implementation shared between all adapter methods with a given signature, and specialized through a closure.

Costs:

unwrap calls: each call to unwrap above costs a Cython method calls and a C pointer dereferencing. In a static world both could be done at compile time, in particular through inlining.
wrap call: one Cython function call and the creation of a Cython object. In a static world, whenever the wrapped object is a temporary value, there are chances that the compiler could avoid it’s creation.
g call: usual cost of calling g, potentially saving the virtual resolution if the adapter is specialized on a specific monoid.
f call: a Cython method call; potentially saving the virtual resolution?

Altogether, one could tentatively hope that the overall cost would be of the order of magniture of a Cython method call (with virtual resolution). For very low granularity Cython methods like arithmetic on small fields, there still will be a high overhead compared to what could achieve with static code transliteration. On the other hand, the overhead would be tentatively negligible at higher granularity and for plain Python code. This remains of course to be confirmed by benchmarks.

Workshops and Conferences WP6: Data/Knowledge/Software bases Math-in-the-Middle CICM