4

Running grep by opening Cygwin terminal via Microsoft Remote Desktop to Windows Server 1012 R2(same as natively?):

Administrator@MYSERV /cygdrive/d/bin/beta
$ time grep -inowf matchfile_431184247462809.temp infile_431184247462809.temp > delme

real    1m40.568s
user    1m40.405s
sys     0m0.140s

Exact same command, on same files, executed when connected via Cygwin SSH:

Administrator@MYSERV /cygdrive/d/bin/beta
$ time grep -inowf matchfile_431184247462809.temp infile_431184247462809.temp > delmessh

real    0m0.148s
user    0m0.140s
sys     0m0.000s

grep.exe executable is the same, the output file is the same, but the run time is split second vs. almost 2 minutes.

Given that cygwin SSH runs under special user setup, i tried to ssh localhost on remote desktop; the runtime: 1 minute and 40 seconds.

Is there some logical or illogical explanation for this? Any settings that I can check on Windows Server 2012 that artificially suppress remote desktop processes?

Update: running C:\cygwin\bin\grep.exe from Windows command line cmd also is instant. So there is an issue with Cygwin terminal.

Update 2: I googled that having dead file shares in PATH can slow Bash terminal down. Contrary to my initial hope erasing $PATH variable did not do anything. I also do not have any dead links in PATH.

Solution, kudos to @Paul Haldane: Grep seems to be thrown off by $LANG value of en_US.UTF-8, which is default in Cygwin. This hits regex performance especially hard. Running grep -F was also slower but only by a factor of 4.

Here is a verification on a separate server:

$ echo $LANG
en_US.UTF-8

$ time grep -inowf matchfile_431184247462809.temp infile_431184247462809.temp > delme

real    1m56.425s
user    1m56.218s
sys     0m0.171s

$ LANG=''

$ time grep -inowf matchfile_431184247462809.temp infile_431184247462809.temp > delme2

real    0m0.286s
user    0m0.265s
sys     0m0.015s

$ diff delme delme2
** no difference **
Muposat
  • 121
  • 9
  • Can you run it from the console? – Ryan Babchishin Aug 17 '16 at 21:39
  • As it's only slow inside a RDP session I would guess it's a terminal buffering problem somehow (which of course would make more sense if you were outputting a lot of data to stdout) ... did you already try to run the command from the RDP session using a terminal multiplexer (gnu screen / tmux) instead? Would be interesting to see how it behaves then. – s1lv3r Aug 17 '16 at 21:57
  • @Ryan Babchishin I did run it from DOS console, and the result was very fast. See my update. I do not know how to time DOS commands, so "fast" was the runtime. – Muposat Aug 17 '16 at 22:08
  • @s1lv3r it outputs 231 lines, too little to justify a two minute delay. I also run it from Python subprocess, with and without shell=True argument, and it is always slow when invoked from Cygwin terminal. – Muposat Aug 17 '16 at 22:15
  • 2
    Are the locale settings (LANG etc) the same between the RDP and ssh sessions? What does `echo $LANG` say in each case? There was a bug in grep which resulted in slow searches when LANG was set to something other than C (http://savannah.gnu.org/bugs/?14472). – Paul Haldane Aug 17 '16 at 22:33
  • @Paul Haldane thank you kind sir. This works. I even verified this on a separate server. – Muposat Aug 18 '16 at 13:51
  • @Muposat Can you post an answer explaining what worked and what was in the variable? I think this is a very interesting situation. – Ryan Babchishin Aug 18 '16 at 20:47
  • @Ryan Babchishin done – Muposat Aug 18 '16 at 21:22
  • @Muposat Thanks, I need to burn this one into my memory for the future – Ryan Babchishin Aug 18 '16 at 21:38

2 Answers2

1

Solution, kudos to @Paul Haldane:

There was a bug in grep which resulted in slow searches when LANG was set to something other than C – Paul Haldane 22 hours ago

Grep seems to be thrown off by $LANG value of en_US.UTF-8, which is default in Cygwin. This hits regex performance especially hard. Running grep -F was also slower but only by a factor of 4.

Here is a verification on a separate server:

$ echo $LANG
en_US.UTF-8

$ time grep -inowf matchfile_431184247462809.temp infile_431184247462809.temp > delme

real    1m56.425s
user    1m56.218s
sys     0m0.171s

$ LANG=''

$ time grep -inowf matchfile_431184247462809.temp infile_431184247462809.temp > delme2

real    0m0.286s
user    0m0.265s
sys     0m0.015s

$ diff delme delme2
** no difference **
Muposat
  • 121
  • 9
0

For one ssh adds overhead because of the encryption but that doesn't explain the jump from seconds to minutes, what does explain is the fact that Cygwin emulates a Unix terminal and emulation is slow. You can find more details regarding this on Wikipedia https://en.wikipedia.org/wiki/Cygwin

That part explain it pretty well

The fork system call for duplicating a process is fully implemented, but it does not map well to the Windows API. For example, the copy-on-write optimization strategy could not be used.[5][6][7] As a result, Cygwin's fork is rather slow compared with Linux and others. (That overhead can often be avoided by replacing uses of the fork/exec technique with calls to the spawn functions declared in the Windows-specific process.h header).

sebastienvg
  • 199
  • 4
  • I don't see how this explains or solves his problem. Does it? Maybe I just don't understand. I down voted, but I'll up vote if explain yourself. – Ryan Babchishin Aug 17 '16 at 21:00
  • @RyanBabchishin you are right it doesn't solve the problem but it does explain why things run slower on cygwin. I would have comment instead of reply but I can't yet. – sebastienvg Aug 17 '16 at 21:09
  • @RyanBabchishin it does not explain anything as I doubt fork is related to this. It was an honest attempt though. – Muposat Aug 18 '16 at 13:33