1

//EDIT: This question is kind of a sub-question. For a shorter and better example, which has better replies, check This Post

I'm very new into python and even newer into pandas. I'm working with it for at least a month and I think I got most of the basics together.

My current task is to write values into a certrain cell, in a certain space inside of an xslx-file.

Situation

  • I have a very big excel-file including various data, from names to email-adresses and everything. As well I have two lists (.txt-files) with the same email-adresses of the excel-file in it, yet those emails got verified either if they match certain security-cheks or not. Depending on the outcome, they got stored inside of the "Secured.txt" or the "Unsecured.txt" file.

  • To write and read in the excel-file, I use pandas.

Task

Next to the 'Emails'-column in the excel file, there is a column in which you mark with an entry either if the email is secured, or unsecured. My actual task is to insert those entrys, depending in which text-file the email lies.

Possible Solution

My approach to solve this problem is to read out each .txt-file and store each email-adress in a variable using a list and a for-loop. Iterating through those emails, I know want to look for the location of the email-adress inside of the excel-file and access the cell right next to it. Same row, different column. Since the emails got sorted matching for their security-validation before, I just can put in the according value into the validation-cell right next to the email.

Question

My question is the following: How do I approach a specific row based on a value in it? I want to find the place of the cell which includes the actual content of the variable "mails", so I can move over to the cell right next to it. Since I know all the names of the columns, I actually just need the index of the row in which the email lies. I got the x-coordinate and need the y-coordinate.

Example

What I have up until now is the readout of the .txt-file:

import pandas as pd
import os
import re


#fetching the mail adress through indexnumber out of the list
with open('Protected/Protected G.txt', 'r') as file:

    #creating the regex pattern to sort out the mail adresses
    rgx = '\S+@\S+'

    #read the file and convert the list into a string
    content = file.readlines()
    content_str = ''.join(content)

    #get the mails out of the "list" with regex
    mails = re.findall(rgx, content_str)

    #put each mailadress in a variable
    for item in mails:
        print(item)

This dummy-dataframe represents the excel sheet I'm working with:

Dummy-Dataframe:

    Forename    Last Name   Email           Protection

1   John        Kennedy     John@gmx.net

2   Donald      Trump       Donald@gmx.net

3   Bill        Clinton     Bill@gmx.net

4   Richard     Nixton      Richard@gmx.net

I know want to pass the actual adress, stored in the variable 'item', to some kind of "locate"-function of pandas in order to find out in which row the actual email lies. As soon as I know in which row the adress lies, I can now tell pandas to write either an "x", saying the mail is protected, or an "o", meaning the mail is unprotected, in the very next column.

My finished dataframe could look like this:

Finished Dataframe:

    Forename    Last Name   Email           Protection

1   John        Kennedy     John@gmx.net    x

2   Donald      Trump       Donald@gmx.net  o

3   Bill        Clinton     Bill@gmx.net    x

4   Richard     Nixton      Richard@gmx.net x

I really appreciate the help.

Mowgli
  • 157
  • 1
  • 10
  • This is not related to the question, but you can use `for adress in mails:` directly. This way you don’t need to calculate length nor use `range` – Mirac7 Dec 24 '18 at 22:34
  • Changed this. Appreciated. – Mowgli Dec 24 '18 at 22:59
  • Sorry I didn’t explain it well. By replacing the range loop, you no longer need to perform `adress = mails[item]`. Correct values are already assigned and they can be used e.g. `for adress in mails: print(adress)` – Mirac7 Dec 24 '18 at 23:05
  • could you please post a dummy dataframe which resembles your data and n expected output dataframe(the way you want to write to an excel). Thank you. :) – anky Dec 25 '18 at 07:49
  • seems like this is a duplicate of [this](https://stackoverflow.com/questions/53985081/conditional-writing-to-xlsx/53985157?noredirect=1#comment94811250_53985157) one? – anky Dec 31 '18 at 11:49

2 Answers2

1

To make sure I understand you have a text file for protected and one for unprotected. I am making a large assumption you never have an email in both.

import pandas as pd

df = pd.read_csv('Protected/Protected G.txt', header = None, sep = " ")
df.columns = ['Protected Emails']

df2 = pd.read_excel('dummy-excel')

if df2['Email'].isin(df) :
    df2['Protection'] = 'x'
else :
    df2['Protection'] = 'o'

writer = pd.ExcelWriter('ProtectedEmails.xlsx')
df2.to_excel(writer,'Sheet1') #or whatever you want to name your sheet
writer.save()

maybe something like that, though I don't know what the text file of emails looks like.

Shenanigator
  • 1,036
  • 12
  • 45
  • This is a very good answer. First off, for clarification: Yes, I have a seperate text file for protected, and one for unprotected. Your assumption is right, there's never an email in both, since it can only be either protected or unprotected. Since the question is too long, complicated and confusing, I refered to another question which displays the same content, but way smaller. But for this post, you totally answered my question. Thank you! – Mowgli Jan 01 '19 at 15:27
0

Your question is different from the content. This is a simple answer might, somehow, be useful.

Assume that this is a dataframe: Z = pd.DataFrame([1,2,4,6])

Now, let us access to number 4. There is a single column. Usually, the first column is assigned the name 0 as a heading. The required number, 4, exists in the third place of the dataframe. As python starts the indexes of lists, dfs, arrays.. etc from 0, then the number of index of number 4 is 2.

print(Z[0][2])

This would output [4]

Try applying the same thing on your data. Just male sure to know the names of the headings. Sometimes they are not numbers, but strings.

Abdulaziz Al Jumaia
  • 436
  • 2
  • 5
  • 22