I'm reading a book about Postgres and there is an exercise with regex. So, I created table and loaded csv into it.
CREATE TABLE crime_reports (
crime_id bigserial PRIMARY KEY,
date_1 timestamp with time zone,
date_2 timestamp with time zone,
street varchar(250),
city varchar(100),
crime_type varchar(100),
description text,
case_number varchar(50),
original_text text NOT NULL
);
COPY crime_reports (original_text)
FROM 'C:\YourDirectory\crime_reports.csv'
WITH (FORMAT CSV, HEADER OFF, QUOTE '"');
Here is CSV:
"4/16/17-4/17/17
2100-0900 hrs.
46000 Block Ashmere Sq.
Sterling
Larceny: The victim reported that a
bicycle was stolen from their opened
garage door during the overnight hours.
C0170006614"
"4/8/17
1600 hrs.
46000 Block Potomac Run Plz.
Sterling
Destruction of Property: The victim
reported that their vehicle was spray
painted and the trim was ripped off while
it was parked at this location.
C0170006162"
"4/4/17
1400-1500 hrs.
24000 Block Hawthorn Thicket Ter.
Sterling
Larceny: The complainant reported that
multiple windows were stolen from this home
under construction. C0170006079"
"04/10/17
1605 hrs.
21800 block Newlin Mill Rd.
Middleburg
Larceny: A license plate was reported
stolen from a vehicle.
SO170006250"
"04/09/17
1200 hrs.
470000 block Fairway Dr.
Sterling
Destruction of Property: Unknown
subject(s) wrote graffiti on a sign in the
area.
SO170006211"
And when I'm trying to execute this query I have a lot of NULL values returned.
SELECT
regexp_match(original_text, '(?:C0|SO)[0-9]+') AS case_number,
regexp_match(original_text, '\d{1,2}\/\d{1,2}\/\d{2}') AS date_1,
regexp_match(original_text, '\n(?:\w+ \w+|\w+)\n(.*):') AS crime_type,
regexp_match(original_text, '(?:Sq.|Plz.|Dr.|Ter.|Rd.)\n(\w+ \w+|\w+)\n')
AS city
FROM crime_reports;
If I use explicit string the query works well:
SELECT regexp_match(
'4/16/17-4/17/17
2100-0900 hrs.
46000 Block Ashmere Sq.
Sterling
Larceny: The victim reported that a
bicycle was stolen from their opened
garage door during the overnight hours.
C0170006614', '\n(?:\w+ \w+|\w+)\n(.*):');
But in my case there are NULL values:
SELECT crime_id,
regexp_match(original_text, '\d{1,2}\/\d{1,2}\/\d{2}') AS date_1,
CASE WHEN EXISTS (SELECT regexp_matches(original_text, '-(\d{1,2}\/\d{1,2}\/\d{1,2})'))
THEN regexp_match(original_text, '-(\d{1,2}\/\d{1,2}\/\d{1,2})')
ELSE NULL
END AS date_2,
regexp_match(original_text, '\/\d{2}\n(\d{4})') AS hour_1,
CASE WHEN EXISTS (SELECT regexp_matches(original_text, '\/\d{2}\n\d{4}-(\d{4})'))
THEN regexp_match(original_text, '\/\d{2}\n\d{4}-(\d{4})')
ELSE NULL
END AS hour_2,
regexp_match(original_text, 'hrs.\n(\d+ .+(?:Sq.|Plz.|Dr.|Ter.|Rd.))') AS street,
regexp_match(original_text, '(?:Sq.|Plz.|Dr.|Ter.|Rd.)\n(\w+ \w+|\w+)\n') AS city,
regexp_match(original_text, '\n(?:\w+ \w+|\w+)\n(.*):') AS crime_type,
regexp_match(original_text, ':\s(.+)(?:C0|SO)') AS description,
regexp_match(original_text, '(?:C0|SO)[0-9]+') AS case_number
FROM crime_reports;
What should be:
So what am I doing wrong?
I dont' know what to do
By the way I remembered that when I execute:
SELECT original_text FROM crime_reports;
Instead of this like in a book:
And it doesn't display popup msg with a full text. Does that matter?