2

Each day, I get an email with the quantities of fruit sold on a particular day. The structure of the email is as below:

Date of report:,04-JAN-2022
Time report produced:,5-JAN-2022 02:04
Apples,6
Pears,1
Lemons,4
Oranges,2
Grapes,7
Grapefruit,2

I'm trying to build some code in R that will search through my emails, find all emails with a particular subject, iterate through each email to find the variables I'm looking for, take the values and place them in a dataframe with the "Date of report" put in a date column.

With the assistance of people in the community, I was able to achieve the desired result in Python. However as my project has developed, I need to now achieve the same result in R if at all possible.

Unfortunately, I'm quite new to R and therefore if anyone has any advice on how to take this forward I would greatly appreciate it.

For those interested, my Python code is below:

#PREP THE STUFF
Fruit_1 = "Apples"
Fruit_2 = "Pears"
searchf = [
    Fruit_1, 
    Fruit_2
]

#DEF THE STUFF
def get_report_vals(report, searches):
    dct = {}
    for line in report:
        term, *value = line
        if term.casefold().startswith('date'):
            dct['date'] = pd.to_datetime(value[0])
        elif term in searches:
            dct[term] = float(value[0])
    if len(dct.keys()) != len(searches):
    dct.update({x: None for x in searches if x not in dct})
return dct


#DO THE STUFF
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6) 
messages = inbox.Items
messages.Sort("[ReceivedTime]", True)

results = []

for message in messages:
    if message.subject == 'FRUIT QUANTITIES':
        if Fruit_1 in message.body and Fruit_2 in message.body:
            data = [line.strip().split(",") for line in message.body.split('\n')]
            results.append(get_report_vals(data, searchf))
        else:
            pass

fruit_vals = pd.DataFrame(results)
fruit_vals.columns = map(str.upper, fruit_vals.columns)

I'm probably going about this the wrong way, but I'm trying to use the steps I took in Python to achieve the same result in R. So for example I create some variables to hold the fruit sales I'm searching for, then I create a vector to store the searchables, and then when I create an equivalent 'get_vals' function, I create an empty vector.

library(RDCOMClient)

Fruit_1 <- "Apples"
Fruit_2 <- "Pears"
##Create vector to store searchables
searchf <- c(Fruit_1, Fruit_2)
## create object for outlook
OutApp <- COMCreate("Outlook.Application")
outlookNameSpace = OutApp$GetNameSpace("MAPI")

search <- OutApp$AdvancedSearch("Inbox", "urn:schemas:httpmail:subject = 'FRUIT QUANTITIES'")
inbox <- outlookNameSpace$Folders(6)$Folders("Inbox")

vec <- c()

for (x in emails)
{
  subject <- emails(i)$Subject(1)
  if (grepl(search, subject)[1])
  {
    text <- emails(i)$Body()
    print(text)
    break
  }
}
alec22
  • 735
  • 2
  • 12
  • 2
    You seem to be asking (at least) 3 separate questions - connecting to email, identifying the target emails, and parsing the identified emails. Which are you having problems with the most? Consider splitting this into multiple questions – Marcus Jan 28 '22 at 16:09
  • 1
    I think my major issue is recreating the function (get_vals) due to the nature of vectors in R. – alec22 Jan 28 '22 at 16:23

3 Answers3

2

read.table could be a good start for get_report_vals.
Code below outputs result as a list, exception handling still needs to be implemented :

report <- "
Date of report:,04-JAN-2022
Apples,6
Pears,1
Lemons,4
Oranges,2
Grapes,7
Grapefruit,2
"


get_report_vals <- function(report,searches) {
  data <- read.table(text=report,sep=",")
  colnames(data) <- c('key','value')

  # find date
  date <- data[grepl("date",data$key,ignore.case=T),"value"]
  
  # transform dataframe to list
  lst <- split(data$value,data$key)
  
  # output result as list
  c(list(date=date),lst[searches])
}

get_report_vals(report,c('Lemons','Oranges'))
$date
[1] "04-JAN-2022"

$Lemons
[1] "4"

$Oranges
[1] "2"

The results of various reports can then be concatenated in a data.frame using rbind:

rbind(get_report_vals(report,c('Lemons','Oranges')),get_report_vals(report,c('Lemons','Oranges')))
     date          Lemons Oranges
[1,] "04-JAN-2022" "4"    "2"    
[2,] "04-JAN-2022" "4"    "2"
Waldi
  • 39,242
  • 6
  • 30
  • 78
2

The code now functions as intended. Function was written quite a bit differently from those recommended:

get_vals <- function(email) {
  body <- email$body()
  date <- str_extract(body, "\\d{2}-[:alpha:]{3}-\\d{4}") %>% 
    as.character()
  data <- read.table(text = body, sep = ",", skip = 9, strip.white = T) %>% 
    row_to_names(1) %>% 
    mutate("Date" = date)

  return(data)
}

In addition I've written this to bind the rows together:

info <- sapply(results, get_vals, simplify = F) %>% 
  bind_rows()
alec22
  • 735
  • 2
  • 12
0

May this is not what you are expecting to get as an answer, but I must state that here to help other readers to avoid such mistakes in future.

Unfortunately your Python code is not well-written. For example, I've noticed the following code where you iterate over all items in a folder and check the Subject and message bodies for keywords:

for message in messages:
    if message.subject == 'FRUIT QUANTITIES':
        if Fruit_1 in message.body and Fruit_2 in message.body:

You need to use the Find/FindNext or Restrict methods of the Items class instead. So, you don't need to iterate over all items in a folder. Instead, you get only items that correspond to your conditions. Read more about these methods in the following articles:

You may combine all your search criteria into a single query. So, you just need to iterate over found items and extract the data.

Also you may find the AdvancedSearch method helpful. The key benefits of using the AdvancedSearch method in Outlook are:

  • The search is performed in another thread. You don’t need to run another thread manually since the AdvancedSearch method runs it automatically in the background.
  • Possibility to search for any item types: mail, appointment, calendar, notes etc. in any location, i.e. beyond the scope of a certain folder. The Restrict and Find/FindNext methods can be applied to a particular Items collection (see the Items property of the Folder class in Outlook).
  • Full support for DASL queries (custom properties can be used for searching too). You can read more about this in the Filtering article in MSDN. To improve the search performance, Instant Search keywords can be used if Instant Search is enabled for the store (see the IsInstantSearchEnabled property of the Store class).
  • You can stop the search process at any moment using the Stop method of the Search class.

See Advanced search in Outlook programmatically: C#, VB.NET for more information.

Eugene Astafiev
  • 47,483
  • 3
  • 24
  • 45
  • Hi there, thanks for providing that. Why does that make it "badly written"? While the format of the email itself doesn't change, there have been various version of the email that I want to check for. – alec22 Feb 01 '22 at 11:01
  • Iterating over all items in a folder is not really a good idea. The `Find`/`FindNext` or `Restrict` methods should be used instead to optimize the Outlook performance. – Eugene Astafiev Feb 01 '22 at 22:43