I'm dealing with some sensitive data, so I'm concerned about using RPostgreSQL. I have all the data necessary loaded into dataframes in R. I'm trying to run queries on the data using the sqldf()
function in R. These queries were written for Oracle SQL Developer years ago, so we're trying to avoid rewriting the scripts entirely. Being able to reuse the pre-written SQL scripts will save us a massive amount of time. The script seems to trip up when we hit the over()
SQL function. I'm aware that base sqldf doesn't support the over()
function. I've read that the over()
function works with the RPostgreSQL package, but does that require me to send my dataframes to an external database? From my understand of RpostgreSQL you need to connect to PostgreSQL and create a new database. We can't send this data to an external data storage system. Is there another way to use the over()
function while keeping the dataframes local to my PC?
select program, importance_level, count( distinct subject_id )
from
(
select r.subject_id,
case
when rc_level is not null and rc_level <> 'NA'
then 'bad_guy'
when (rc_level is null or rc_level = 'NA') and
(substr( r.base_category, 2, 2 ) in ( '5R', '8Q', '8P' )
or r.process_name in ('On The Way'))
then 'run_away'
when (rc_level is null or rc_level = 'NA') and r.process_name =
'Fancy Order'
then 'repeater'
when (rc_level is null or rc_level = 'NA') and
(a.current_program_code in ( 'BOP', 'IAS', 'LIS', 'SIS' )
or method_code in ( 'SIP', 'POB' )
or substr( r.base_category, 2, 2 ) in ( '9F', '7G' ))
then 'NEWBIE'
else 'Other'
end
as importance_level,
case
when a.current_program_code in ('123', 'ABC', 'DEF', 'HIJ', 'KLM', 'NOP', 'QRS' ) then 'YAW'
when a.current_program_code in ( 'RE', 'FDS', 'QWE', 'WER', 'ERT','RTY','TYU' ) then 'PO'
when a.current_program_code in ( 'LEP' ) then 'MOM'
else a.current_program_code
end
as program
from FY16DATA r left join (select distinct * from (select subject_id, first_value(current_program_code) over (partition by subject_id order by start_date desc) as current_program_code, first_value(process_name) over (partition by subject_id order by start_date desc) as process_name, first_value(method_code) over (partition by subject_id order by start_date desc) as method_code, max(load_fy) over (partition by subject_id) as load_fy from FY16NAME)) a on r.subject_id = a.subject_id
where r.load_fy = '2016' and r.thing_status <> 'Over' and r.thing_status in ('Head','Hair','Face')
)
group by program, importance_level;