The Plan

Input:

The input to the program is mathematical text. For example:

Lecture transcripts.
Textbook pages.
Research papers.

Step 1 — POS Tagging:

In this section, the program will tag the input text for parts of speech (POS). We have opted not to use the mainstream methods for reasons that will be explained more later. Essentially, due to the fragility of the program (i.e. poor individual taggings could mess up later processes significantly), we want to capture POS more intuitively, and so we are not using the standard POS system (adverb, adjective, verb, etc.).

Step 2 — Grammatical Analysis:

After tagging for POS, the program will need to know where the sections of interest in the text are for later sections. In this section, the program will convert from the nonstandard POS system developed in section 1 to the standard system, as while they aren't intuitive, they do convey incredibly useful semantic information. After such conversion, the analysis will contain a highlighting of various special sections (subjects, copular verbs, etc.).

Step 3 — Database of Mathematical Objects:

This section (the DoMO) is the heavy CV aspect of the program, this section takes in a term (like "square") and outputs a series of points and segment connections between these points that describes a wireframe image that can be nicely displayed and manipulated with Manim.

Step 4 — Semantic Processing:

This section will take the information from the grammar analysis (perhaps that hasn't been used by the DoMO, such as preposition-like words, verb-like words, etc.), and visually convey connections between the objects created by the DoMO. For example, if you are proving the squeeze theorem geometrically, you will want to show a set of coordinate axes, a unit circle, and a right triangle. But you want these objects to be shown in the correct relations to each other, not just arbitrarily displayed in space.

Step 5 — Scene Construction:

All of the information created by the rest of the program needs to be contained in an animated scene. This section is basically the stage director telling everyone where to be and when to be.

Output:

The program outputs a scene of Manim-based animation that visually portrays the input text. We have a few examples of such videos created at various parts in the development process.

First Working Test

(Proof of concept DoMO and some scene construction):

Third Working Test

(Tune-ups to the DoMO):