-3

So I've tried using tokenizers, but I can only figure out how to replace or remove single delimiters in java.

Like for this input: \box { Boxed words } {\boldface This line in bold. }

I want to be able to remove \box and some other guidelines I have to follow which are: The rules that we are going to apply are very simple .

  1. Remove all commands backslash followed one or more lowercase letters and terminated with a blank.
  2. Remove all braces: } or {.
  3. Substitute all math display (characters in between $), by the words FORMULA 1 , FORMULA 2 etc...
  4. The environment ( a special command) . \begin{enumerate} \item First item, \fer and only this. \item Second line \iterate and maybe more. \item Third. ... \end{enumerate} puts everything between backslash item in a new paragraph with a number. So the above should look:
  5. First item and only this.
  6. Second line and maybe more.
  7. Third.
Avo
  • 1
  • 1
  • You are essentially asking how to write a parser for TeX, so that you can distinguish between data (the text) and markup instructions (the TeX commands). This is a large task and much too broad for StackOverflow. As this is a school project, you should be asking your teacher for help, not StackOverflow. – Jim Garrison Jan 21 '17 at 05:52
  • Okay thanks, I thought it would be a small task as I am only in my second java class and the prof only gave a week to figure this out when he hasn't even mentioned anything about parsing and so forth. – Avo Jan 21 '17 at 06:10

1 Answers1

0

The (IMO) sensible way to is to use a stand-alone TeX to text (or TeX to HTML) converter. That should:

  • Save you a lot of work in implementing your own converter.
  • Do a better job ... assuming you pick a decent converter.
  • Insulate you from having to deal with a stream of special cases where your heuristic / pattern-based approach fails.
Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • I'm supposed to write a java program that stands as a simple converter for tex files to txt files. For example, removing everything in the tex file that is a command (\box or \boldface), and remove all braces in that file. – Avo Jan 21 '17 at 04:42
  • 1
    Why? Is this an exercise? If yes, then (surely) your teacher has explained to you how to use either parsing or pattern matching. I suggest you ask your teacher for some guidance. – Stephen C Jan 21 '17 at 04:43
  • I have asked the teacher but he is very hard to understand and is very vague on what he want's. I even asked a TA and the TA says he doesn't understand the professors instruction. Can you explain how parsing or pattern matching would help with this exercise or some sort of page that would show me? Thanks in advance. I've been trying to use what we have been covering, for 7 hours now and no luck. We've been covering more about strings and tokenizing, but i can only get rid of symbols and not the wordings right after the symbol suck as "\bold". – Avo Jan 21 '17 at 04:53