0

If I have a set of emails which I retrieve from a Hive Table called users in this spark code below:

val sparkConf = new SparkConf().setAppName("YOUR_APP_NAME").setMaster("local[10]")
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
val hiveContext = new HiveContext(sc)

hiveContext.setConf("hive.metastore.uris", "METASTORE_URI_NAME_HERE")

val df = hiveContext.sql("SELECT email FROM USERS")

Now df consists of a dataframe with one row of all email addresses. Is there a way in scala where I can validate the email addresses for example something like this: (https://pypi.python.org/pypi/validate_email) except this one is in python I need one in scala. Or would NLP be a good use case for this as well?

I am stuck on how to validate these email addresses and I need more than some Regex. I need a way to check if the domain of the email address has an SMTP server.

Something like this (except in scala):

is_valid = validate_email('example@example.com',check_smtp_connection = True)
  • 2
    question is actually "how to validate email in scala", and answer is the same as for "how to validate email in java" – Łukasz Sep 06 '16 at 17:11

1 Answers1

0

You definitely don't need natural language processing to validate email. You should use javamail for that, it supports SMTP validation.

Also note that the only possible way to check if email really exists -- send user unique link and ask to follow it.

dveim
  • 3,381
  • 2
  • 21
  • 31
  • Can you give me some pointers on how I use `javamail`. Basically I do not know in the emails if the domain even exists in the first place. Like I can get emails like `1234.com` or `jsdklfj@jsdaklf.com`. So basically the pipeline is first I need to check if the domain exists, then I need to "send them some unique link and ask to follow it" like you stated. Is there some sample code I can follow to achieve this? –  Sep 06 '16 at 17:16
  • To verify SMTP server credentials, you can use https://stackoverflow.com/questions/3060837/validate-smtp-server-credentials-using-java-without-actually-sending-mail. But you cannot check if given email exists, only if it *may* exist. That can be done with something like http://crunchify.com/how-to-validate-email-address-using-java-mail-api/. However, as I wrote in the answer, the only reliable check is to ask user to follow some already sent unique link. – dveim Sep 06 '16 at 17:27
  • Hello sorry for the late response but can you please explain a little more by what you mean by "ask user to follow some already sent unique link." So if the user does click on that link how will I get that notification? Is that what you are trying to say I implement? Is that feature in `javamail`? –  Sep 06 '16 at 18:41
  • Yes, user should click on link. Usually some special controller is created to track those clicks (you cannot expect to get response immediately, but hours/days). To my knowledge, `javamail` does not have this built-in. – dveim Sep 06 '16 at 18:46
  • Okay will take a look into that. Thank you for the input and pointers! –  Sep 06 '16 at 18:47