0

I am wanting to extract strings from elements in a data frame. Having gone through numerous previous questions, I am still unable to understand what to do! This is what I have tried to do so far:

unlist(strsplit(pcode2$Postcode,"'"))

I get the following error:

Error in strsplit(pcode2$Postcode, "'") : non-character argument

which I understand because I am trying to reference the data rather than putting the text in the code itself. I have 16,000 cases in a dataframe so also not sure how to vectorise the operation.

Any help would be greatly appreciated.

Data:

    Postcode    Locality    State   Latitude    Longitude
1   ('200', Australian National University  ACT -35.280,    149.120),
2   ('221', Barton  ACT -35.200,    149.100),
3   ('3030',    Werribee    VIC -12.800,    130.960),
4   ('3030',    Point Cook  VIC -12.800,    130.960),

I want to get rid of the commas and braces etc so that I am left with the numeric part of Column 1 which is Postcode, numeric part of Latitude andLongitude. This is how the I am hoping the final result will look like:

    Postcode    Locality    State   Latitude    Longitude
1   200 Australian National University  ACT -35.280 149.120
2   221 Barton  ACT -35.200 149.100
3   3030    Werribee    VIC -12.800 130.960
4   3030    Point Cook  VIC -12.800 130.960

Lastly, I would also like to understand how to nicely format the data in the questions.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
Amit Verma
  • 95
  • 1
  • 1
  • 6
  • 3
    You can find out how to format code by clicking the question mark on the right side of the edit box ("Markdown Editing Help"). – Rich Scriven Dec 17 '15 at 03:15
  • 3
    can you post the results of `dput(head(pcode2))` – Chris Dec 17 '15 at 03:23
  • I think the easiest thing to do might be to just read in the original data using `read.csv()`. You might have to do something about the parentheses and trailing comma. – Tim Biegeleisen Dec 17 '15 at 03:24
  • @ Chris, the result of dput(head(pcode2)) is lot of data, couple of screens fly off in doing so. Should I "factor" the columns first as pcode2 is a small subset of the data? – Amit Verma Dec 17 '15 at 03:36
  • @ Tim - The original data is in the format as per below: INSERT INTO postcodes_geo (postcode, suburb, state, latitude, longitude) VALUES , ('200', 'Australian National University', 'ACT', -35.280, 149.120), , ('221', 'Barton', 'ACT', -35.200, 149.100), – Amit Verma Dec 17 '15 at 03:40
  • You could just show us `dput(head(pcode2$Postcode)))`. But my guess is that you **don't actually have a column called Postcode**;, i.e. your column headers are messed up; `strsplit(NULL,"'")` gives the same error message. – Ben Bolker Dec 17 '15 at 03:41
  • I guess this is what you are wanting to see: class = "factor")), .Names = c("Postcode", "Locality", "State", "Latitude", "Longitude"), row.names = c(NA, 6L), class = "data.frame") – Amit Verma Dec 17 '15 at 03:44
  • 2
    I was looking for the values, not the structure (which is above what you selected in your copy/paste). Based on the data you posted, there should not be many screens of data going by, unless you did not type `head` in the call – Chris Dec 17 '15 at 03:49
  • Postcode Locality State Latitude Longitude 1 ('200', Australian National University ACT -35.280, 149.120), 2 ('221', Barton ACT -35.200, 149.100), 3 ('800', Darwin NT -12.800, 130.960), 4 ('801', Darwin NT -12.800, 130.960), 5 ('804', Parap NT -12.430, 130.840), 6 ('810', Alawa NT -12.380, 130.880), – Amit Verma Dec 17 '15 at 03:54

0 Answers0