0

Problem: data from one particular source into our accumulo instance is not being returned to our client application via a subset of our search interfaces.

When we use search method "A" we get results, but when we use search method "B" we do not.

I have a hunch that method "A" and method "B" are actually hitting different tables.
In order to prove that, I need a way to put a hook into the stream of data coming into the tables and grep for data indicating the source of the message. I can't do anything programatically because that would necessitate my taking the system down which isn't an option right now.

I see from the manual that there is a 'grep' and 'egrep' command. The help file on grep says not to use it for regex, and I can't seem to get egrep to return a record I know to be in the database.

example:
A record in the exchange contains the line <gml:pos>23.05507 113.5268</gml:pos>. To egrep for it, I log into the accumulo shell, select the table in which the record exists, then enter the following egrep ^:pos>23.*113.*.

Nothing comes back.
I've tried all the variants of the command I could think of (quoting, not quoting, searching only for 23.*, etc).

What am I missing here?

Scott Solmer
  • 3,871
  • 6
  • 44
  • 72
snerd
  • 1,238
  • 1
  • 14
  • 28

2 Answers2

0

Your regex is incorrect. Your's starts with ^:pos which would mean "Match where :pos starts the record"

You need to change it to something like:

egrep "^.*pos>23.*113.*"

This says "Match any amount of any characters from the beggining to pos>23, then match any amount of characters until I see 113, then match any more characters". The key is the .* between ^ and pos

Do note, however that this will match ANYTHING that goes pos>23. For instance:

root@accumuloinstance testTable> insert "<gml:pos>23.05507 113.5268</gml:pos>" "" "" ""
root@accumuloinstance testTable> insert "<gml:pos>232.05507 113.5268</gml:pos>" "" "" ""
root@accumuloinstance testTable> insert "<gml:pos>232XXX113.5268</gml:pos>" "" "" ""
root@accumuloinstance testTable> egrep "^.*pos>23.*113.*"
<gml:pos>23.05507 113.5268</gml:pos> : []
<gml:pos>232.05507 113.5268</gml:pos> : []
<gml:pos>232XXX113.5268</gml:pos> : []

Don't know exactly what you're looking for, but you might want to try:

root@accumuloinstance testTable> egrep "^.*pos>23[.].* 113[.].*"
<gml:pos>23.05507 113.5268</gml:pos> : []

Which will match 23.xxxx 113.xxxxx so you get exactly 23.something and 113.something

If that doesn't give you your result you're looking for, try doing egrep ".*" If you don't get any records back then either you don't have any, or your visibilities don't match.

FuriousGeorge
  • 4,561
  • 5
  • 29
  • 52
  • w00t - t/y! I'll try this when I get into the office tomorrow morning. – snerd Jun 16 '14 at 07:05
  • no go. attempted egrep "^.*pos>22[.].* 113[.].*" against a table containing a record containing the line 22.31997 113.3154. command egrep".*" returns a plethora of results. – snerd Jun 16 '14 at 17:19
  • How much data is in your table. How is is layed out? Can you do a scan and confirm that the row is actually there? – FuriousGeorge Jun 16 '14 at 20:08
  • Do you *think* that row is there, or do you *know* its there. (i.e. scan you do a scan and find it)? Scan you egrep for any of the other values besides that particular value? – FuriousGeorge Jun 16 '14 at 20:12
  • as far as I know there's millions of entries in the system. i know the data is there because I did a scan on the table and picked the first entry I found to craft an egrep query. – snerd Jun 16 '14 at 20:33
  • That's very odd. I guess start broad and start working more narrow. So first do a `egrep ".*05507.*". You should get lots of records. Then add more to the regex until it stops working and figure out why it stopped working. – FuriousGeorge Jun 16 '14 at 20:54
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/55731/discussion-between-user1146334-and-david-daedalus). – FuriousGeorge Jun 16 '14 at 20:56
-1

The leading ^ in your regex appears to be anchoring ":pos" to the beginning of the line. Since the line begins with "

$ echo '<gml:pos>23.05507 113.5268</gml:pos>' | egrep '^:pos>23.*113.*'
$ echo '<gml:pos>23.05507 113.5268</gml:pos>' | egrep ':pos>23.*113.*'
<gml:pos>23.05507 113.5268</gml:pos>
brownlee
  • 24
  • 1