Extracting String in R

Question

I am wanting to extract strings from elements in a data frame. Having gone through numerous previous questions, I am still unable to understand what to do! This is what I have tried to do so far:

unlist(strsplit(pcode2$Postcode,"'"))

I get the following error:

Error in strsplit(pcode2$Postcode, "'") : non-character argument

which I understand because I am trying to reference the data rather than putting the text in the code itself. I have 16,000 cases in a dataframe so also not sure how to vectorise the operation.

Any help would be greatly appreciated.

Data:

    Postcode    Locality    State   Latitude    Longitude
1   ('200', Australian National University  ACT -35.280,    149.120),
2   ('221', Barton  ACT -35.200,    149.100),
3   ('3030',    Werribee    VIC -12.800,    130.960),
4   ('3030',    Point Cook  VIC -12.800,    130.960),

I want to get rid of the commas and braces etc so that I am left with the numeric part of Column 1 which is Postcode, numeric part of Latitude andLongitude. This is how the I am hoping the final result will look like:

    Postcode    Locality    State   Latitude    Longitude
1   200 Australian National University  ACT -35.280 149.120
2   221 Barton  ACT -35.200 149.100
3   3030    Werribee    VIC -12.800 130.960
4   3030    Point Cook  VIC -12.800 130.960

Lastly, I would also like to understand how to nicely format the data in the questions.

You can find out how to format code by clicking the question mark on the right side of the edit box ("Markdown Editing Help"). — Rich Scriven, Dec 17 '15 at 03:15
I think the easiest thing to do might be to just read in the original data using `read.csv()`. You might have to do something about the parentheses and trailing comma. — Tim Biegeleisen, Dec 17 '15 at 03:24
@ Chris, the result of dput(head(pcode2)) is lot of data, couple of screens fly off in doing so. Should I "factor" the columns first as pcode2 is a small subset of the data? — Amit Verma, Dec 17 '15 at 03:36
@ Tim - The original data is in the format as per below: INSERT INTO postcodes_geo (postcode, suburb, state, latitude, longitude) VALUES , ('200', 'Australian National University', 'ACT', -35.280, 149.120), , ('221', 'Barton', 'ACT', -35.200, 149.100), — Amit Verma, Dec 17 '15 at 03:40
You could just show us `dput(head(pcode2$Postcode)))`. But my guess is that you **don't actually have a column called Postcode**;, i.e. your column headers are messed up; `strsplit(NULL,"'")` gives the same error message. — Ben Bolker, Dec 17 '15 at 03:41
I guess this is what you are wanting to see: class = "factor")), .Names = c("Postcode", "Locality", "State", "Latitude", "Longitude"), row.names = c(NA, 6L), class = "data.frame") — Amit Verma, Dec 17 '15 at 03:44
I was looking for the values, not the structure (which is above what you selected in your copy/paste). Based on the data you posted, there should not be many screens of data going by, unless you did not type `head` in the call — Chris, Dec 17 '15 at 03:49
Postcode Locality State Latitude Longitude 1 ('200', Australian National University ACT -35.280, 149.120), 2 ('221', Barton ACT -35.200, 149.100), 3 ('800', Darwin NT -12.800, 130.960), 4 ('801', Darwin NT -12.800, 130.960), 5 ('804', Parap NT -12.430, 130.840), 6 ('810', Alawa NT -12.380, 130.880), — Amit Verma, Dec 17 '15 at 03:54

Extracting String in R

0 Answers0