0

I have put up a simple bash script that generates 4 words random passphrases from a list of thousands of words. Now I am not sure if it is really secure or efficient being for my personal use, you guys let me know if you think about any improvements. But that is not the main point. Check it out ->

So When I run it in my laptop, the input and output looks like this:

time sh genpass
astrology cringe tingling massager

real    0m0.319s
user    0m0.267s
sys     0m0.077s

A second time:

$ time sh genpass
prankish askew siren fritter

real    0m0.318s
user    0m0.266s
sys     0m0.077s

Can be quite funny sometimes.

Anyway, this is the script:

# EDITABLES ###########################################
target="/path/to/my/wordList.txt" 
# END EDITABLES #######################################

getWordList() {
  case $1 in
    "verb")  mawk '/ing$|ed$|en$/ {print $2}' $target ;;
    "adjective")  mawk '/y$|ish$/ {print $2}' $target ;;
    "noun")  mawk '!/ing$|ed$|en$|y$|ish$/ {print $2}' $target ;; 
    *) printf "%s" "'${1}' is an invalid argument." && echo && exit 1
  esac
}

pickRandomLineNumber() {
  # Get the list in an array
  declare -a list_a=("${!1}")
  # How many items in the list
  local length="${#list_a[@]}"
  # Generate a random number between 1 and the number of items in the list
  local number=$RANDOM 
  let "number %= $length"
  # Print the word at random line
  printf "%s\n" ${list_a[@]} | mawk -v line=$number 'NR==line {print}' 
}

read -ra verbList <<< $( getWordList verb )
verb=$(pickRandomLineNumber verbList[@])

read -ra adjectiveList <<< $( getWordList adjective )
adjective=$(pickRandomLineNumber adjectiveList[@])

read -ra nounList <<< $( getWordList noun )
noun1=$(pickRandomLineNumber nounList[@])
noun2=$(pickRandomLineNumber nounList[@])

printf "%s %s %s %s\n" "${adjective}" "${noun1}" "${verb}" "${noun2}"

See where I have to create an array for each type of word? 3 types, 3 arrays. Well I thought about getting that code in a function so I'd just have to call that function 4 times, one for each of my 4 words, with a different argument. I really thought it would be faster.

Here is the code change:

# EDITABLES ###########################################
target="/path/to/my/wordList.txt"  
# END EDITABLES #######################################

getWordList() {
  case $1 in
    "verb")  mawk '/ing$|ed$|en$/ {print $2}' $target ;;
    "adjective")  mawk '/y$|ish$/ {print $2}' $target ;;
    "noun")  mawk '!/ing$|ed$|en$|y$|ish$/ {print $2}' $target ;; 
    *) printf "%s" "'${1}' is an invalid argument." && echo && exit 1
  esac
}

pickRandomLineNumber() {
  # Get the list in an array
  declare -a list_a=("${!1}")
  # How many items in the list
  local length="${#list_a[@]}"
  # Generate a random number between 1 and the number of items in the list
  local number=$RANDOM 
  let "number %= $length"
  # Print the word at random line
  printf "%s\n" ${list_a[@]} | mawk -v line=$number 'NR==line {print}' 
}

#### CHANGE ####
getWord() {
  read -ra list <<< $( getWordList $1)
  local word=$(pickRandomLineNumber list[@])
  printf "%s" "${word}"
}

printf "%s %s %s %s\n" $(getWord adjective) $(getWord noun) $(getWord verb) $(getWord noun)

Now here is the input/output:

$ time sh genpass
overstay clench napping palace

real    0m0.403s
user    0m0.304s
sys     0m0.090s

And again:

$ time sh genpass
gainfully cameo extended nutshell

real    0m0.369s
user    0m0.304s
sys     0m0.090s

The differences in timing are not that big of a deal, even though overall, I thought it could definitely be faster.

So do you have any idea why the second script is slower than the first?

Jeanmichel Cote
  • 531
  • 1
  • 5
  • 19
  • Of course, "astrology" and "overstay" are not adjectives. Your rules need some tweaking. – tripleee Sep 11 '16 at 17:19
  • Does not really matter, the randomness matters. I could have actually skipped those "verb - adj - noun" thing and just let the script output 4 totally random words, but I thought it would be sweet. Either I adjust the rules, either I get rid of them. Would certainly be faster getting rid of them. But not as sweet... – Jeanmichel Cote Sep 11 '16 at 17:23
  • 3
    1. If you need to call `awk` so many times, then just script the whole thing in `awk`. 2. Anyway, if you're parsing the same file several times, you're certainly doing something wrong. 3. Thou shalt not use `awk` to retrieve a random element from an array. That's really silly. You have direct access to any field of the array. 4. If you feel you need “references” (that's what you're doing with `declare -a list_a=("${!1}")`), then either your design is wrong, or you're just using the wrong language for the job: shell scripts shouldn't use such features. – gniourf_gniourf Sep 11 '16 at 17:25
  • 1
    If you want to properly optimize this, inline the arrays and get rid of all the Awk code. Printing the entire array to get the *n*-th element is particularly wasteful, when you could simply `printf '%s\n' "${array[number]}"`. – tripleee Sep 11 '16 at 17:26
  • In terms of localizing the performance penalty of the second version, my guess as to the cuprit would be the (needless!) copying of the input array with the variable indirection `$(!1}`. – tripleee Sep 11 '16 at 18:27

1 Answers1

2

You have more code doing more stuff, all of it unnecessary. Here's how to do what you are trying to do:

$ cat tst.awk
function grw(arr) {     # Get Random Word
    return arr[int(rand() * length(arr)) + 1]
}

{
    if ( /(ing|ed|en)$/ ) verbs[++numVerbs] = $0
    else if ( /(y|ish)$/ ) adjectives[++numAdjectives] = $0
    else nouns[++numNouns] = $0
}

END {
    srand()
    printf "%s %s %s %s\n", grw(adjectives), grw(nouns), grw(verbs), grw(nouns)
}

$ awk -f tst.awk words
overstay clench siren clench
$ awk -f tst.awk words
prankish nutshell tingling cameo
$ awk -f tst.awk words
astrology clench tingling palace

The above was run against this "words" file created from the sample output you provided in your question:

$ cat words
askew
astrology
cameo
clench
cringe
extended
fritter
gainfully
massager
napping
nutshell
overstay
palace
prankish
siren
tingling
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    That certainly does the job, thanks. And so much quicker. Need to get more into awk, i guess. Could you explain what is the use of `srand()`, here, please? Does it just means that you generate a seed just before calling those `grw` functions three times? Does it means that we use the same seed the three times around? – Jeanmichel Cote Sep 11 '16 at 22:44
  • `time awk -f genpass.awk ~/wordList.txt` -> stuffy grandpa unblessed skydiver real 0m0.045s user 0m0.038s sys 0m0.005s – Jeanmichel Cote Sep 11 '16 at 22:46
  • 1
    It generates a seed for the first call to rand() based on the current time.The 2nd call to rand() uses as a seed the output of the first call to rand() etc. Without the srand() every time you call awk the first rand() would start with the same seed value and so every call to awk would generate the same sequence of "random" numbers. Read the book Effective Awk Programming, 4th Edition, by Arnold Robbins to learn awk and learn immediately that the shell is an environment from which to call tools with a language to sequence those calls, it is NOT a tool to manipulate text, that's what awk is for. – Ed Morton Sep 12 '16 at 03:56