parsing FireFox bookmarks using regular expression

Question

I tried to parse firefox bookmark(JSON exported version), using this efforts:

cat boo.json | grep '\"uri\"\:\"^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}\"'
cat boo.json | grep '"uri"\:"^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}'
cat boo.json | grep '"uri"\:"^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}"'

And few others but all fails, json bookmarked file will look like this:

.........."uri":"http://www.google.com/?"......"uri":"http://stackoverflow.com/"

So, the output should be like this:

"uri":"http://www.google.com/?"
"uri":"http://stackoverflow.com/"

What is the missing part on my regular expression?

UPDATE:

Url's on bookmark file ending with one of this special character:

/, ex: "uri":"http://stackoverflow.com/"

", ex: "uri":"http://stackoverflow.com/questions/13148794/parsing-firefox-bookmarks-using-regular-expression"

}, ex: "uri":"https://fr.add-ons.mozilla.com/fr/firefox/bookmarks/"}

With this modified regular expression:

$ egrep -o "(http|https)://([^ ]*).(*\/)"  boo.json

Result:

http://fr.fxfeeds.mozilla.com/fr/firefox/headlines.xml"},{"name":"livemark/siteURI","flags":0,"expires":4,"mimeType":null,"type":3,"value":"http://www.lemonde.fr/"}],"type":"text/x-moz-place-container","children":[]}]},{"index":2,"title":"Tags","id":4,"parent":1,"dateAdded":1344432674984000,"lastModified":1344432674984000,"type":"text/
http://stackoverflow.com/questions/13148794/parsing-firefox-bookmarks-using-regular-expression","charset":"UTF-8"},{"index":29,"title":"adrusi/
http://stackoverflow.com/
...

But with this still doesn't get me only url's.

I'm unfamiliar with JSON format but from the very small snippet you posted it LOOKS like it'd be a very brief, simple awk script to pull out the URLs. If you posted a bit more sample input (say a 10-line file) and expected output, I'd take a look. — Ed Morton, Dec 08 '12 at 09:03

score 0 · Answer 1 · answered Oct 30 '12 at 22:59

0

Have you tried JSON.sh? Its works great!

https://github.com/dominictarr/JSON.sh

answered Oct 30 '12 at 22:59

Nicholas Terry

1,812
24
40

What do you mean it doenst work? Did you use it that way that the README says to? It'll still need some parsing, but this will get you a standard output for any JSON from which to extract values – Nicholas Terry Oct 31 '12 at 16:21

score 0 · Answer 2 · answered Dec 28 '19 at 07:24

0

I use this regex to extract urls , it's works great

cat *.html | grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" | sort | uniq

answered Dec 28 '19 at 07:24

BuGaU0

391
3
2

score -1 · Accepted Answer · answered Dec 08 '12 at 02:25

Mr Jeff Atwood had posted an article the problem with urls, With his proposed Regular Expression, I managed to extract all the url's from FireFox bookmark:

egrep -o "\(?\bhttp://[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]"  my-bookmark.json

parsing FireFox bookmarks using regular expression

3 Answers3