Bridge of LaTex and SymPy

Key considerations and issues of bridge and conversion between LaTeX to an equivalent form of SymPy.

How to convert at the first place?

There are 2 possible ways to do so. Either by using latex2sympy or by using latexlambda. Both of these are parsers that use ANTLR. This means, that expressions must be point-to-point beautiful LaTeX expressions. So \log_2(5) is not parsable, cause it must to be \log_{2}(5). Same goes to a lot of different LaTeX expressions.

Arrays are out of the game

These kind of parsers can't parse arrays, in other words stuff between LaTeX \begin{array} and \end{array} are the nightmare for parsers. So how to deal with all the problems?

Solution #1: LaTex to MathML

Our first idea, was to convert LaTex to MathML and then try to parse it, because XML is one the languages easiest to parse. But! That didn't work, damn interMATHZ! You see, MathML is not some plain, good old normal XML, it is a bit "different" (is that a good word)? While it makes parsing easier, it doesn't make it perfect, so, bye MathML...

Solution #2: Usage of Regular Expressions

Complete failure, why did even do that *facepalm*.

Solution #3: Find and replace system

Super messy, but it works. Probably everyone is laughing at us for using .replace() function for such things. We do understand it, but I can tell you, it did work out! We were replacing things like \frac{5}{5} to (5) / (5).

Solution #4: Bye MathPix, Hi Tesseract

Our best move, a long one and time consuming. Migration to Tesseract, which yes, we trained ourselves. We had more control over what the scanner supports and how the solver parses it. ...and yes, Tesseract influenced us to move from SymPy Gamma to our own solver that we had to write from scratch! It was good move.

Hope this blog post helped you a bit at least, just like it did for us. See you soon in our upcoming step by step solutions blog post.