3

Question: Is there an automatic way to add the line numbers of the original R Markdown source code to the formatted code portions of the HTML output produced by knitr?

Purpose: My ultimate goal is to be able to quickly move to parts of my source R Markdown code that I identify need editing while reviewing the HTML output. Using line numbers is the fastest way I know to do this, but I welcome hearing others' strategies.

Solutions I've tried:

  • Although the chunk option attr.source = '.numberLines' will attractively add line numbers to the code parts of the HTML output, that option doesn't provide the source-code line numbers automatically (you must force that manually using .startFrom) -- instead, the lines are renumbered at the beginning of each chunk and after each piece of output. In the following illustration, I've included .startFrom to force the line numbering to start at 10, to match the line number for test_data <- rnorm(10) which is the line number I want to see. A practical solution, however, needs the starting number to be automatic. Also, in the HTML output (shown beneath the code) the hist(test_data) line is renumbered starting with the same starting number, 10. I would want that to be 12, as in the source code. Screenshot of example source R Markdown, showing line numbers Output HTML after knitting
  • This question (How can I add line numbers that go across chunks in Rmarkdown?) is related, but the OP just needed any unique identifier for each line, not necessarily the line numbers of the source code, with the solution being sequential numbers unrelated to the source-code line numbers.

Considered option: I've considered preprocessing my code by running an initial script that will add line numbers as comments at the end of lines, but I'd prefer a solution that is contained within the main knitr file.

nuthatch
  • 33
  • 5

1 Answers1

3

Reverted to this update based on your request

I'm glad you figured out the issue. I hadn't considered or tested the code for a chunk that only had a single line of code. However, based on your feedback, I've accounted for it now.

If you'd like it accounted for and would like to keep the color in the code, let me know. I'll add it back with the fix for single lines of code. (I'll stick to the ES6.)

This version uses the line numbers you'll see in the source pane of RStudio. You must use RStudio for this to work. The following changes to the RMD are necessary:

  • The library(jsonlite) and library(dplyr)
  • An R chunk, which you could mark include and echo as false
  • A set of script tags outside of and after that chunk
  • A JS chunk (modified from my original answer)

The R chunk and R script need to be placed at the end of your RMD. The JS chunk can be placed anywhere.

The R chunk and script tags **in order!

Put me at the end of the RMD.

```{r ignoreMe,include=F,echo=F}

# get all lines of RMD into object for numbering; R => JS object 
cx <- rstudioapi::getSourceEditorContext()$contents
cxt <- data.frame(rws = cx, id = 1:length(cx)) %>% toJSON()

```

<script id='dat'>`r cxt`</script>

The JS chunk

This collects the R object that you made in the R chunk, but its placement does not matter. All of the R code will execute before this regardless of where you place it in your RMD.

```{r gimme,engine="js",results="as-is",echo=F}

setTimeout(function(){
  scrCx = document.querySelector("#dat"); // call R created JSON*
  cxt = JSON.parse(scrCx.innerHTML);
  echoes = document.querySelectorAll('pre > code'); // capture echoes to #
  j = 0;
  for(i=0; i < echoes.length; i++){ // for each chunk
    txt = echoes[i].innerText;
    ix = finder(txt, cxt, j);  // call finder, which calls looker
    stxt = txt.replace(/^/gm, () => `${ix++} `); // for each line
    echoes[i].innerText = stxt;          // replace with numbered lines
    j = ix; // all indices should be bigger than previous numbering
  }
}, 300);

function looker(str) {  //get the first string in chunk echo
  k = 0;
  if(str.includes("\n")) {
    ind = str.indexOf("\n");
  } else {
    ind = str.length + 1;
  }
  sret = str.substring(0, ind);
  oind = ind; // start where left off
  while(sret === null || sret === "" || sret === " "){
    nInd = str.indexOf("\n", oind + 1);       // loop if string is blank!
    sret = str.substring(oind + 1, nInd);
    k++;
    ind = oind;
    oind = nInd;
  }
  return {sret, k};  // return string AND how many rows were blank/offset
}

function finder(txt, jstr, j) {
  txsp = looker(txt);
  xi = jstr.findIndex(function(item, j){ // search JSON match
    return item.rws === txsp.sret;       // search after last index
  })
  xx = xi - txsp.k + 1; // minus # of blank lines; add 1 (JS starts at 0)
  return xx;
}

```

If you wanted to validate the line numbers, you can use the object cx, like cx[102] should match the 102 in the HTML and the 102 in the source pane.

I've added comments so that you're able to understand the purpose of the code. However, if something's not clear, let me know.

enter image description here

ORIGINAL

What I think you're looking for is a line number for each line of the echoes, not necessarily anything else. If that's the case, add this to your RMD. If there are any chunks that you don't want to be numbered, add the chunk option include=F. The code still runs, but it won't show the content in the output. You may want to add that chunk option to this JS chunk.

```{r gimme,engine="js",results="as-is"}

setTimeout(function(){
  // number all lines that reflect echoes
  echoes = document.querySelectorAll('pre > code');
  j = 1;
  for(i=0; i < echoes.length; i++){ // for each chunk
    txt = echoes[i].innerText.replace(/^/gm, () => `${j++} `); // for each line
    echoes[i].innerText = txt;          // replace with numbered lines
  }
}, 300)


```

It doesn't matter where you put this (at the end, at the beginning). You won't get anything from this chunk if you try to run it inline. You have to knit for it to work.

I assembled some arbitrary code to number with this chunk.

Kat
  • 15,669
  • 3
  • 18
  • 51
  • Thank you, Kat. I added more explanation in my bullet about the `.numberLines` option, which should clarify my goal better. The line numbers should be those of the source code; your solution numbers all the echoes sequentially within the HTML file. That you mentioned "echoes" helps me clarify further -- although I do need the code parts numbered, I don't need the R output (like the output from, say, `summary`) numbered. Your solution is a good answer to [this question](https://stackoverflow.com/questions/57867878/how-can-i-add-line-numbers-that-go-across-chunks-in-rmarkdown). – nuthatch Aug 18 '22 at 15:41
  • Okay, but I've got an update that will work for you. It still just numbers the echoes, but it will use the number lines assigned as source code. – Kat Aug 18 '22 at 22:50
  • I'm making progress -- the JSON is constructed and is within the script tags (I verified this at the end of the intermediate `md` file) -- but the line numbers don't appear in the output HTML. I suspect the JavaScript isn't running. After some research, as a test I tried changing that JS chunk option to `include=T`, but that causes RStudio to freeze (seems a process called `QtWebEngineProc` runs forever taking over resources). I've tried knitting this from RStudio on both Windows and Linux; same result. Added both `jsonlite` and `magrittr` (for the pipe in your R chunk) libraries. – nuthatch Aug 20 '22 at 00:26
  • Sorry I missed that library in my answer! I just created a new RMD and copied what was in my answer. It didn't run, but I only had to change `include=T`, which you already identified as something you tried. It ran as expected then. It looks like you're running Ubuntu, based on the screenshot. I have a few different OS I run through VirtualBox. On Ubuntu Bionic, I installed R and RStudio, then ran it there. (What a pain to set up!) It ran as expected. On Linux, QtWebEnginProcess is for X11. I have problems with it all the time. Try running updates; restarting the machine usually works for me – Kat Aug 20 '22 at 05:54
  • No luck -- I must be missing something basic, and having run out of ideas I'm thinking of posting a new question to see if anyone knows why a JavaScript chunk like this runs as expected on some systems but on other systems doesn't complete and then freezes RStudio. Tried: update and restart, new Ubuntu 22.04 VM with up-to-date R and RStudio, install JRE, and commenting parts of the code to isolate the problem. I can run a simple JavaScript, though, and can avoid the freezing of RStudio by commenting out your call to your function "finder", but reached the limit of what I can do. – nuthatch Aug 23 '22 at 02:19
  • Okay, to test out if the problem is strictly the JS, add `eval=F` to the chunk options for the JS chunk, then knit. Open the resulting webpage in your browser. Go to developer tools (in your browser) -> console (in your browser). At the prompt, paste the JS code. (Don't use RStudio's developer tools for this.) Do you get any errors? If so, what? Does it change the webpage? (You can hit refresh on the browser; it will revert to the original webpage. Any changes through the console are temporary.) If it works here, then it rules out the JS being the issue, at least. – Kat Aug 23 '22 at 16:13
  • Thanks for this tip. No errors that I can see, but the web page isn't changed, either. What I do see is really high CPU usage by the browser (on Linux the process is called "file:// Content") and the browser tells me that the page is slowing it down. The CPU hogging is similar to the symptoms I see running from RStudio, except there the process that is taking up the CPU is QtWebEngineProc (Linux)/QtWebEngineProcess (Windows). So, something in the JS seems to be taking up lots of resources but neither RStudio nor a browser is telling us why. Appreciate all the coaching on this! – nuthatch Aug 25 '22 at 23:15
  • On a hunch... I made one last shot, to see if it helps. After running the JS through a check tool, it pointed out that one single line of this code was JS ES6. I'm not sure if that's the problem. I figured it was worth a shot; I've made another update to my answer. – Kat Aug 26 '22 at 02:54
  • I figured it out: solitary lines of code provide no newline to count, in which case `ind` is assigned the value -1, resulting in an infinite `while` loop. I inserted the line `if (ind < 0) {ind = str.length + 1};` after the line `ind = str.indexOf("\n");` in the `looker` function, and now it works! The new non-ES6 code didn't work for me, but the older code with the ES6 line did. I'll let you decide how you want to address the `ind` = -1 problem, and if you want to look at the non-ES6 code again or revert to the older one, and then I'll accept. (I learned `console.log` today, which helped!) – nuthatch Aug 26 '22 at 23:32
  • I updated the answer with the fix for single lines of code. I definitely didn't think to check for that issue! It's pretty awesome that you figured it out, though! That being said, I didn't use exactly what you've got there. I used `str.includes("\n")` to look for a match (with a boolean return). It does essentially exactly the same thing as what you did, but it is potentially more readable. I could update the answer again so that you can keep the code colors, just let me know. – Kat Aug 27 '22 at 00:39
  • 1
    Looks good. FYI: I discovered that there appears to be an RStudio limitation of about max 1k lines for the RMarkdown file. I can circumvent this by setting `eval=F` for the JS chunk, knitting the RMarkdown in RStudio as normal, opening/refreshing the HTML in a browser (not the RStudio window), and running the JS code in the developer-tools console in that browser. It runs quickly. Thanks for your patience, persistence, and first lessons on JS! This solution gives me what I wanted. – nuthatch Aug 28 '22 at 01:24