Survey on knowledge representation in Sage

# Survey on knowledge representation in Sage Nicolas M. Thiéry --- ## Overview at a high level of your system - [SageMath](http://www.sagemath.org): General purpose computational (pure) mathematics software - 300 contributors - 1.5M lines of Python/Cython code - 40k functions - 4k classes - hundreds of open source (math) software/libraries in the distribution -- --- ## What data do you have? ### A collection of (optional) databases: - Usually coming from external software/databases - Examples: - GAP databases - OEIS - John's database of elliptic curves - ... --- ### Pickling / Serialization Objects can be converted to strings and reconstructed - Applications: - Persistence - Sage databases - Exchange of data between Sage instances - Format: Python's pickle protocol - Code to reconstruct the object + sanity checks - By default: pickling by class + plain data (no encapsulation) - Aiming for: pickling by construction (more semantic) --- ## What knowledge do you have? Mathematical properties and theorems, algorithms, ... ## Two key points that conditioned the design - There are only a handful of fundamental concepts: operations: *, +, ... axioms: associativity, commutativity, ... Richness in the combinations of them (e.g. Fields) - Using an existing language and its object oriented features for modelling and method selection ### Sources of external knowledge? Each Sage contributer brings on specific mathematical knowledge about the objects studied, which might not be available to others in the collaboration. -- ### Can you point to implicit knowledge? - The algorithms rely heavily on the mathematical properties of the objects they manipulate. - Sage uses the Object Oriented features of Python The properties of a Sage object are specified by its *class*: - what mathematical object does it represent? - how is it represented - The class information is often of technical flavor, and complemented by additional information on its universe (parent, category) -- - Sage strives to model mathematics closely: Not only matrices are instances of a specific classes and not plain list of lists But linear maps are instances of specific classes and not just represented by matrices ==> Reduced risk of calling a meaningless function -- - The abstract algebraic properties of an object (e.g. being a group or a field) are modelled relatively explicitely: Objects know the names of their categories and axioms Meaning essentially implicit except, in the good cases, informally in the documentation and as testing methods - The names of the operations are hardcoded ==> duplication: additive / multiplicative / lie magmas Morphisms by automatic renaming? Lacks static typing? - Taming the exponential blow Size of the code linear in the number of methods -- - It's not always defined explicitly which methods an object in a given category should implement. Methods/operations are documented, but their exact specifications of is not always completely defined/defined consistently accross the class hierarchy. - Some theorems (e.g. wedderburn) are embedded in actionable form, but that information cannot be extracted / operated on -- ### Is it common knowledge? The meaning of the relevant categories / axioms (e.g. ring, associativity) is relatively well known by the users and developpers. -- ### What would you gain if it was made explicit/machine actionable? - Dynamic generation of documentation that the user can navigate - Sanity/correctness checks; proofs? - Semantic handles to communicate with other systems - Avoiding duplication (additive magmas / multiplicative magmas)? ### Have you gone in this direction? How did you represent the knowledge then? - Categories / axioms were a first step :-) ### How do you collaborate on knowledge representation? - Collaborative development of code / doc / tests in the Sage sources --- ## What software do you have? ### What custom software are you running? - 1.5 M lines for the Sage library + all the rest ### In which language? - Python/Cython + myriads of languages used by subsystems ### How does it use the data and the knowledge? - As a fundation for its hierarchy of classes/categories - Those are used for code factorization, documentation, and generic testing -- ### How does your software act on represented knowledge? - Some computations in the lattices of categories: X is a division ring and X is a finite set ==> X is a finite field --- ## Mixing (revisit?) ### Which knowledge is implicit in the data you have? ### Which knowledge is implicit in the software you have? - Formal definition of axioms That can be manipulated by the machine - Formal specifications of methods That can be manipulated by the machine 1. Anything else? Nope