11

I'd like to take user input (sometimes this will be large paragraphs) and generate a LaTeX document. I'm considering a couple of simple regular expressions that replaces all instances of \ with \textbackslash and all instances of { or } with \} or \{.

I doubt that this is sufficient. What else do I need to do? Note: In case there is a special library made for this, I'm using python.

To clarify, I do not wish anything to be parsed treated as LaTeX syntax: $a$ should be replaced with \$a\$.

TRiG
  • 10,148
  • 7
  • 57
  • 107
Conley Owens
  • 8,691
  • 5
  • 30
  • 43
  • Because of the complex semantics and parsing rules for TeX, the solution probably won't be in processing with Python but in how you dump the data to LaTeX. If you provide details about what input data can contain (to what extent should it be processed as LaTeX? Should things like `---` and `\ae` and math mode work?), someone might be able to get you a great answer. – Mike Graham Apr 13 '10 at 05:28
  • 1
    This question is substantially the same as http://stackoverflow.com/questions/2541616/how-to-escape-strip-special-characters-in-the-latex-document – Charles Stewart Apr 13 '10 at 07:40
  • 1
    The other question is focused on keeping the user from doing harmful things (gaining shell access), not on making sure input looks the same in both the plain text input and in the document. – Conley Owens Apr 13 '10 at 14:58

1 Answers1

16

If your input is plain text and you are in a normal catcode regime, you must do the following substitutions:

  • \\textbackslash{} (note the empty group!)
  • {\{
  • }\}
  • $\$
  • &\&
  • #\#
  • ^\textasciicircum{} (requires the textcomp package)
  • _\_
  • ~\textasciitilde{}
  • %\%

In addition, the following substitutions are useful at least when using the OT1 encoding (and harmless in any case):

  • <\textless{}
  • >\textgreater{}
  • |\textbar{}

And these three disable the curly quotes:

  • "\textquotedbl{}
  • '\textquotesingle{}
  • `\textasciigrave{}
doncherry
  • 259
  • 3
  • 14
Philipp
  • 48,066
  • 12
  • 84
  • 109
  • Will a space suffice in place of the empty group? – Conley Owens Apr 13 '10 at 06:08
  • Also what about the `---` that Mike Graham mentioned? – Conley Owens Apr 13 '10 at 06:53
  • 1
    Notice that `\textasciitilde` is actually really ugly because it’s too high and that is rarely what is wanted. Similarly, `\texttildelow` is too low. The best workaround that I know is posted here: http://stackoverflow.com/questions/256457/how-does-one-insert-a-backslash-or-a-tilde-into-latex/2037332#2037332 – Konrad Rudolph Apr 13 '10 at 10:48
  • @Conley Owens: No, an space won't suffice, it will be gobbled by the input processor. The empty group is the easiest solution; you could also check whether a space follows in the input text and insert a control space (`\ `, backslash–space) in that case. – Philipp Apr 13 '10 at 21:52
  • @Conley Owens: What do you mean with `---`? The dashes is implemented as a ligature in (pdf)TeX. If you don't want to have “---” converted to “—”, you must replace it explicitly (e.g., `-{}-{}-`). The opposite direction is unproblematic: If you use a Unicode-capable engine (XeTeX, LuaTeX) or load the `inputenc` package with an appropriate encoding, you can use typographic characters like — or “ directly. – Philipp Apr 13 '10 at 21:52