2

I have a large JSON file that I am using JQ to pair down to only those elements I need. I have that working but there are some values that are string in all caps. Unfortunately, while jq has ascii_downcase and ascii_upcase, it does not have a built in function for uppercasing only the first letter of each word.

I need to only perform this on brand_name and generic_name, while ensure that the manufacturer name is also first letter capitalized with the exception of things like LLC which should remain capitalized.

Here's my current jq statement:

jq '.results[] | select(.openfda.brand_name != null or .openfda.generic_name != null or .openfda.rxcui != null) | select(.openfda|has("rxcui")) | {brand_name: .openfda.brand_name[0], generic_name: .openfda.generic_name[0], manufacturer: .openfda.manufacturer_name[0], rxcui: .openfda.rxcui[0]}' filename.json > newfile.json

This is a sample output:

{
  "brand_name": "VELTIN",
  "generic_name": "CLINDAMYCIN PHOSPHATE AND TRETINOIN",
  "manufacturer": "Almirall, LLC",
  "rxcui": "882548"
}

I need the output to be:

{
  "brand_name": "Veltin",
  "generic_name": "Clindamycin Phosphate And Tretinoin",
  "manufacturer": "Almirall, LLC",
  "rxcui": "882548"
}
peak
  • 105,803
  • 17
  • 152
  • 177
kittonian
  • 1,020
  • 10
  • 22

2 Answers2

1

Split at space characters to get an array of words, then split again at the empty string to get an array of characters. For the inner array, use ascii_downcase on all elements but the first, then put all back together using add on the inner and join with a space character on the outer array.

(.brand_name, .generic_name) |= (
  (. / " ") | map(. / "" | .[1:] |= map(ascii_downcase) | add) | join(" ")
)
{
  "brand_name": "Veltin",
  "generic_name": "Clindamycin Phosphate And Tretinoin",
  "manufacturer": "Almirall, LLC",
  "rxcui": "882548"
}

Demo


To ignore certain words from being processed, capture them with an if condition:

map_values((. / " ") | map(
  if IN("LLC", "AND") then .
  else . / "" | .[1:] |= map(ascii_downcase) | add end
) | join(" "))
{
  "brand_name": "Veltin",
  "generic_name": "Clindamycin Phosphate AND Tretinoin",
  "manufacturer": "Almirall, LLC",
  "rxcui": "882548"
}

Demo

pmf
  • 24,478
  • 2
  • 22
  • 31
  • This works perfectly, except for the second part of my question where I need to exclude certain strings from being included (i.e. LLC) – kittonian Jun 14 '22 at 21:20
  • @kittonian How would you want to handle those cases. If the value was `ABC, LLC`, do you want to ignore the **word** `LLC`, leaving you with `Abc, LLC`, or do you want to ignore the **whole value** if it contains `LLC`, leaving it as is? – pmf Jun 14 '22 at 21:26
  • That's a very valid question. What I would like to do in a perfect world is be able to have a list of strings to ignore. This way, if I notice something was incorrectly modified I can just add it to the list and rerun. – kittonian Jun 14 '22 at 21:40
  • @kittonian Understood, but ignore a word or ignore a value? – pmf Jun 14 '22 at 21:47
  • I'm not sure the difference in what you are asking but if it's John, LLC I want to only ignore LLC, likewise if it's ABC, LLC I want to ignore both ABC and LLC so I would need to add both of those to my list. – kittonian Jun 14 '22 at 21:49
  • Also, it seems your solution doesn't catch everything. Any words that are hyphenated without a space don't modify correctly and for some reason, some values don't get modified at all (can't figure out why) – kittonian Jun 14 '22 at 21:49
  • 1
    @kittonian Your examples just showed space-separated words. To split on more than one option, use `splits` with a regex. For "space or hyphen", a character class would suffice: `splits("[ -]")`. – pmf Jun 14 '22 at 21:56
1

Suppose we are given an array of words that are to be left as is, e.g.:

def exceptions: ["LLC", "USA"];

We can then define a capitalization function as follows:

# Capitalize all the words in the input string other than those specified by exceptions:
def capitalize:
  INDEX(exceptions[]; .) as $e
  | [splits("\\b") | select(length>0)]
  | map(if $e[.] then . else (.[:1]|ascii_upcase) + (.[1:] |ascii_downcase) end)
  | join("");

For example, given "abc-DEF ghi USA" as input, the result would be "Abc-Def Ghi USA".

peak
  • 105,803
  • 17
  • 152
  • 177