1

I'm reading a book about Postgres and there is an exercise with regex. So, I created table and loaded csv into it.

CREATE TABLE crime_reports (
    crime_id bigserial PRIMARY KEY,
    date_1 timestamp with time zone,
    date_2 timestamp with time zone,
    street varchar(250),
    city varchar(100),
    crime_type varchar(100),
    description text,
    case_number varchar(50),
    original_text text NOT NULL
);

COPY crime_reports (original_text)
FROM 'C:\YourDirectory\crime_reports.csv'
WITH (FORMAT CSV, HEADER OFF, QUOTE '"');

Here is CSV:

"4/16/17-4/17/17
2100-0900 hrs.
46000 Block Ashmere Sq.
Sterling
Larceny: The victim reported that a
bicycle was stolen from their opened
garage door during the overnight hours.
C0170006614"
"4/8/17
1600 hrs.
46000 Block Potomac Run Plz.
Sterling
Destruction of Property: The victim
reported that their vehicle was spray
painted and the trim was ripped off while
it was parked at this location.
C0170006162"
"4/4/17
1400-1500 hrs.
24000 Block Hawthorn Thicket Ter.
Sterling
Larceny: The complainant reported that
multiple windows were stolen from this home
under construction. C0170006079"
"04/10/17
1605 hrs.
21800 block Newlin Mill Rd.
Middleburg
Larceny: A license plate was reported
stolen from a vehicle.
SO170006250"
"04/09/17
1200 hrs.
470000 block Fairway Dr.
Sterling
Destruction of Property: Unknown
subject(s) wrote graffiti on a sign in the
area.
SO170006211"

And when I'm trying to execute this query I have a lot of NULL values returned.

SELECT
    regexp_match(original_text, '(?:C0|SO)[0-9]+') AS case_number,
    regexp_match(original_text, '\d{1,2}\/\d{1,2}\/\d{2}') AS date_1,
    regexp_match(original_text, '\n(?:\w+ \w+|\w+)\n(.*):') AS crime_type,
    regexp_match(original_text, '(?:Sq.|Plz.|Dr.|Ter.|Rd.)\n(\w+ \w+|\w+)\n')
        AS city
FROM crime_reports;

enter image description here

If I use explicit string the query works well:

SELECT  regexp_match(
'4/16/17-4/17/17
2100-0900 hrs.
46000 Block Ashmere Sq.
Sterling
Larceny: The victim reported that a
bicycle was stolen from their opened
garage door during the overnight hours.
C0170006614',   '\n(?:\w+ \w+|\w+)\n(.*):');

enter image description here

But in my case there are NULL values:

SELECT crime_id,
       regexp_match(original_text, '\d{1,2}\/\d{1,2}\/\d{2}') AS date_1,
       CASE WHEN EXISTS (SELECT regexp_matches(original_text, '-(\d{1,2}\/\d{1,2}\/\d{1,2})'))
            THEN regexp_match(original_text, '-(\d{1,2}\/\d{1,2}\/\d{1,2})')
            ELSE NULL
            END AS date_2,
       regexp_match(original_text, '\/\d{2}\n(\d{4})') AS hour_1,
       CASE WHEN EXISTS (SELECT regexp_matches(original_text, '\/\d{2}\n\d{4}-(\d{4})'))
            THEN regexp_match(original_text, '\/\d{2}\n\d{4}-(\d{4})')
            ELSE NULL
            END AS hour_2,
       regexp_match(original_text, 'hrs.\n(\d+ .+(?:Sq.|Plz.|Dr.|Ter.|Rd.))') AS street,
       regexp_match(original_text, '(?:Sq.|Plz.|Dr.|Ter.|Rd.)\n(\w+ \w+|\w+)\n') AS city,
       regexp_match(original_text, '\n(?:\w+ \w+|\w+)\n(.*):') AS crime_type,
       regexp_match(original_text, ':\s(.+)(?:C0|SO)') AS description,
       regexp_match(original_text, '(?:C0|SO)[0-9]+') AS case_number
FROM crime_reports;

This is what I have: This is what I have

What should be:

enter image description here

So what am I doing wrong?

I dont' know what to do

By the way I remembered that when I execute:

SELECT original_text FROM crime_reports;

I receive this: enter image description here

Instead of this like in a book: enter image description here

And it doesn't display popup msg with a full text. Does that matter?

qwerty 1999
  • 121
  • 7

0 Answers0