If I have a set of emails which I retrieve from a Hive Table called users in this spark code below:
val sparkConf = new SparkConf().setAppName("YOUR_APP_NAME").setMaster("local[10]")
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
val hiveContext = new HiveContext(sc)
hiveContext.setConf("hive.metastore.uris", "METASTORE_URI_NAME_HERE")
val df = hiveContext.sql("SELECT email FROM USERS")
Now df consists of a dataframe with one row of all email addresses. Is there a way in scala where I can validate the email addresses for example something like this: (https://pypi.python.org/pypi/validate_email) except this one is in python I need one in scala. Or would NLP be a good use case for this as well?
I am stuck on how to validate these email addresses and I need more than some Regex. I need a way to check if the domain of the email address has an SMTP server.
Something like this (except in scala):
is_valid = validate_email('example@example.com',check_smtp_connection = True)