0

How can I bulk import lat, long from csv file into sql server as spatial data type, not two separate columns of type double?

Given input .csv file with two columns:
Latitude, Longitude

want to create sql db that has one column corresponding to a spatial data type.

morpheus
  • 18,676
  • 24
  • 96
  • 159

1 Answers1

4

Did some research and found this article:

Convert Latitude/Longitude (Lat/Long) to Geography Point

So as given in article, I've created table and inserted some test data using given script:

CREATE TABLE [dbo].[Landmark] (
    [ID]                INT IDENTITY(1, 1),
    [LandmarkName]      VARCHAR(100),
    [Location]          VARCHAR(50),
    [Latitude]          FLOAT,
    [Longitude]         FLOAT
)
GO

INSERT INTO [dbo].[Landmark] ( [LandmarkName], [Location], [Latitude], [Longitude] )
VALUES ( 'Statue of Liberty', 'New York, USA', 40.689168,-74.044563 ),
       ( 'Eiffel Tower', 'Paris, France', 48.858454, 2.294694),
       ( 'Leaning Tower of Pisa', 'Pisa, Italy', 43.72294, 10.396604 ),
       ( 'Great Pyramids of Giza', 'Cairo, Egypt', 29.978989, 31.134632 ),
       ( 'Sydney Opera House', 'Syndey, Australia', -33.856651, 151.214967 ),
       ( 'Taj Mahal', 'Agra, India', 27.175047, 78.042042 ),
       ( 'Colosseum', 'Rome, Italy', 41.890178, 12.492378 )
GO

And then added calculated column using this query:

ALTER TABLE [dbo].[Landmark]
ADD [GeoLocation] AS geography::STPointFromText('POINT(' + CAST([Longitude] AS VARCHAR(20)) + ' ' + CAST([Latitude] AS VARCHAR(20)) + ')', 4326)
GO

Now when I query table using

SELECT *
FROM dbo.Landmark

I'm getting calculated spatial results too: enter image description here

And results in axis: enter image description here

Hopefully I understood you right.

Update: I'm not sure if this will satisfy you. It's quite dirty, but it does the job: That's how I formatted CSV file. I've used same structure as in previous example:

Statue of Liberty| New York, USA| 40.689168|-74.044563
Eiffel Tower| Paris, France| 48.858454| 2.294694
Leaning Tower of Pisa| Pisa, Italy| 43.72294| 10.396604
Great Pyramids of Giza| Cairo, Egypt| 29.978989| 31.134632
Sydney Opera House| Syndey, Australia| -33.856651| 151.214967
Taj Mahal| Agra, India| 27.175047| 78.042042
Colosseum| Rome, Italy| 41.890178| 12.492378

Columns seperator is | symbol and rows seperator is break symbol.

So what I did is, I used OPENROWSET to open CSV file and format this into rows instead having one long string( that's how OPENROWSET read my csv file, unfortunately). I've used this SplitString function: https://stackoverflow.com/a/19935594/3680098

Now I need to turn these buddies into columns instead of one string. I've used this answer provided on SO: https://stackoverflow.com/a/15108499/3680098

Summing things up, that's the query:

SELECT LTRIM(RTRIM(xmlValue.value('/values[1]/value[1]','nvarchar(100)'))) AS LandmarkName
    , LTRIM(RTRIM(xmlValue.value('/values[1]/value[2]','nvarchar(100)'))) AS Location
    , LTRIM(RTRIM(xmlValue.value('/values[1]/value[3]','nvarchar(20)'))) AS Lon
    , LTRIM(RTRIM(xmlValue.value('/values[1]/value[4]','nvarchar(20)'))) AS Lat
    , GEOGRAPHY::STPointFromText('POINT(' + xmlValue.value('/values[1]/value[4]','nvarchar(20)') + ' ' + xmlValue.value('/values[1]/value[3]','nvarchar(20)') + ')', 4326)
FROM dbo.SplitString((SELECT Document.* FROM OPENROWSET(BULK N'C:\Temp\test.csv', SINGLE_CLOB) AS Document), CHAR(10)) AS T
CROSS APPLY (SELECT CAST('<values><value>' + REPLACE(T.Value, '|', '</value><value>') + '</value></values>' AS XML)) AS T1(xmlValue);

It generates me required data as in my first screenshot and it seems just fine. So what I need to do, is to create my table and insert these into it:

CREATE TABLE [dbo].[Landmark] (
    [ID]                INT IDENTITY(1, 1),
    [LandmarkName]      VARCHAR(100),
    [Location]          VARCHAR(50),
    [GeoLocation]       GEOGRAPHY
)
GO

INSERT INTO dbo.Landmark (LandmarkName, Location, GeoLocation)
SELECT LTRIM(RTRIM(xmlValue.value('/values[1]/value[1]','nvarchar(100)'))) AS LandmarkName
    , LTRIM(RTRIM(xmlValue.value('/values[1]/value[2]','nvarchar(100)'))) AS Location
    , GEOGRAPHY::STPointFromText('POINT(' + xmlValue.value('/values[1]/value[4]','nvarchar(20)') + ' ' + xmlValue.value('/values[1]/value[3]','nvarchar(20)') + ')', 4326)
FROM dbo.SplitString((SELECT Document.* FROM OPENROWSET(BULK N'C:\Temp\test.csv', SINGLE_CLOB) AS Document), CHAR(10)) AS T
CROSS APPLY (SELECT CAST('<values><value>' + REPLACE(T.Value, '|', '</value><value>') + '</value></values>' AS XML)) AS T1(xmlValue)

Results:

Final results

Community
  • 1
  • 1
Evaldas Buinauskas
  • 13,739
  • 11
  • 55
  • 107
  • thanks for your detailed answer. unfortunately it does not address my question. my input is a csv file and i want to output directly a sql db that has a spatial column. i don't want to create a temp db in the process. – morpheus Jun 23 '15 at 19:23
  • If you would look at calculated column, you can see that it's made from latitude and longitude combination. So you could create column with geography data type and pass in your data straight to calculation. I hope you understood. I'll try to do that and update my answer. – Evaldas Buinauskas Jun 23 '15 at 19:26
  • Hi @morpheus I've updated my answer. It inserts records directly into table now. Hope it helps. – Evaldas Buinauskas Jun 23 '15 at 20:35
  • thanks Evaldas. Let me go through it and see if it does what I want. – morpheus Jun 23 '15 at 22:08
  • Wouldn't it be possible to use a [format file](https://msdn.microsoft.com/en-us/library/ms178129.aspx) so that you don't have to convert to xml and then extract values from xml? again, the problem is how to tell sql server that the lat, long columns in csv files need to be converted into a STPoint. – morpheus Jun 23 '15 at 23:55
  • Hi, yes. That would be possible and a better solution. – Evaldas Buinauskas Jun 24 '15 at 04:12
  • Well, this line of code converts your lat/long values into STPoint column: `GEOGRAPHY::STPointFromText('POINT(' + xmlValue.value('/values[1]/value[4]','nvarchar(20)') + ' ' + xmlValue.value('/values[1]/value[3]','nvarchar(20)') + ')', 4326)`, you just have to ensure they're always lat/long, that's it. Don't you think so? – Evaldas Buinauskas Jun 24 '15 at 05:30
  • I think in terms of performance above code that first converts csv to xml and then parses xml is bound to be slower than a solution that first imports to a temp table in memory and then processes the columns in that table to create a STPoint column in the final table. btw, i have given your answer an upvote for your time and effort. – morpheus Jun 24 '15 at 15:42
  • Well I've even read here that it's **NOT RECOMMENDED** to bulk insert into table directly, use temp table. http://stackoverflow.com/a/10851169/3680098 Another reason to use temp table is that bulk inserts are not transactional. If something fails, you cannot roll it back. I don't think it's a big issue to do that, just encapsulate this in a procedure or your code and that's all about it. – Evaldas Buinauskas Jun 24 '15 at 15:45