0

I'm trying to read a column from a database using a SQL query. The column consists of empty string or numbers as strings, such as

"7500" "4460" "" "2900" "2640" "1850" "" "2570" "9050" "8000" "9600"

I'm trying to find the right sql query to extract all the numbers (as integers) and removing the empty ones, but I'm stuck. So far I've got

SELECT * 
FROM   base 
WHERE  CONVERT(INT, code) IS NOT NULL

Done in program R (package sqldf)

ABC
  • 125
  • 1
  • 10
  • Added `sql-server` tag based on the usage of `convert()` –  Jul 27 '14 at 14:19
  • Is your example a single value (with the quotation marks included) or does that example indicate multiple rows with contents between each quotation mark being a value on one row? – Brian DeMilia Jul 27 '14 at 14:37
  • Are the blank ones null or a blank space? – Brian DeMilia Jul 27 '14 at 14:38
  • The quotation marks are included and the example above is just a part of the column named code (so each row/instance of the database consists of exactly one code element) – ABC Jul 27 '14 at 15:01

4 Answers4

1

If all columns are valid integers, you could use:

select * , cast(code as int) IntCode
from base 
where code <> ''

To prevent cases when field code is not a valid number, use:

select *, cast(codeN as int) IntCode
from base
cross apply (select case when code <> '' and not code like '%[^0-9]%' then code else NULL end) N(codeN)
where codeN is not null

SQL Fiddle

UPDATE

To find rows where code is not a valid number, use

select * from base where code like '%[^0-9]%'
Nizam
  • 4,569
  • 3
  • 43
  • 60
  • @ABC I have changed my answer to prevent cases when code is not a valid number – Nizam Jul 27 '14 at 14:33
  • Both suggestions aren't working out. The resulting elements are still strings (characters). – ABC Jul 27 '14 at 15:15
  • I have changed the query. You just need to convert the archer to int – Nizam Jul 27 '14 at 15:46
  • Is there a change that this 'apply' function doesn't work in R (postgresql)? – ABC Jul 27 '14 at 16:14
  • 1
    @n: Note that `IsNumeric()` return true for all of these strings: '.' (ie single period), ',' (ie single comma), and '1e10', as they are valid *numeric* representations. A better test of whether a string contains only digits is `case when Value not like '%[^0-9]%' then 1 else 0 end as IsNumeric` – Pieter Geerkens Jul 27 '14 at 16:14
  • 1
    Updated as your comments. Tks a lot. – Nizam Jul 27 '14 at 16:42
  • Cross Apply only works in SQL Server (you have a tag sql-server). Aren't you using it? – Nizam Jul 27 '14 at 16:43
  • 1
    Oh, my bad. Went too fast with the sql tags. I'm gonna select this as the right answer. Thanks for your efforts – ABC Jul 27 '14 at 18:47
0
select *
from base
where col like '[1-9]%'

Example: http://sqlfiddle.com/#!6/f7626/2/0

If you don't need to test for the number being valid, ie. a string such as '909XY2' then this may run marginally faster, more or less depending on the size of the table

Brian DeMilia
  • 13,103
  • 1
  • 23
  • 33
  • But the query `select * from base where code <> ''` would be even faster, right? – Nizam Jul 27 '14 at 14:49
  • 1
    @nizam did not think of that, is probably the simplest of the 2 as far as syntax goes, but both ways would probably perform equally as well. the execution plan would likely be identical, w/ just one table scan – Brian DeMilia Jul 27 '14 at 14:55
0

Is this what you want?

SELECT (case when code not like '%[^0-9]%' then cast(code as int) end)
FROM   base 
WHERE code <> '' and code not like '%[^0-9]%'; 

The conditions are repeated in the where and case on purpose. SQL Server does not guarantee that where filters are applied before logic in the select, so you can get an error with conversions. More recent versions of SQL Server have try_convert() to fix this problem.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Using sqldf with the default sqlite database and this test data:

 DF <- data.frame(a = c("7500", "4460", "", "2900", "2640", "1850", "", "2570", 
                        "9050", "8000", "9600"), stringsAsFactors = FALSE)

try this:

library(sqldf)
sqldf("select cast(a as aint) as aint from DF where length(a) > 0")

giving:

  aint
1 7500
2 4460
3 2900
4 2640
5 1850
6 2570
7 9050
8 8000
9 9600

Note In plain R one could write:

transform(subset(DF, nchar(a) > 0), a = as.integer(a))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341