-1

I am new to Machine Learning and python. Recently i have been working with Amazon fine food review data from kaggle and its code. What i don't understand is how is the 'partiton' method used here ? Moreover, What actually does last 3 lines of code do ?

    %matplotlib inline
    import sqlite3
    import pandas as pd
    import numpy as np
    import nltk
    import string
    import matplotlib.pyplot as plt
    import seaborn as sns
    from sklearn.feature_extraction.text import TfidfTransformer
    from sklearn.feature_extraction.text import TfidfVectorizer

    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.metrics import confusion_matrix
    from sklearn import metrics
    from sklearn.metrics import roc_curve, auc
    from nltk.stem.porter import PorterStemmer



    # using the SQLite Table to read data.
    con = sqlite3.connect('./amazon-fine-food-reviews/database.sqlite') 




    #filtering only positive and negative reviews i.e. 
    # not taking into consideration those reviews with Score=3
    filtered_data = pd.read_sql_query("""
    SELECT *
    FROM Reviews
    WHERE Score != 3
    """, con) 




    # Give reviews with Score>3 a positive rating, and reviews with a 
    score<3 a negative rating.
    def partition(x):
    if x < 3:
        return 'negative'
    return 'positive'

    #changing reviews with score less than 3 to be positive vice-versa
    actualScore = filtered_data['Score']
    positiveNegative = actualScore.map(partition) 
    filtered_data['Score'] = positiveNegative

2 Answers2

0

creates an array called actualScore using the column Score from filtered_data

actualScore = filtered_data['Score']

creates array positiveNegative coding negative for values <3 and positive for >3

positiveNegative = actualScore.map(partition)

overwrites old column score with new coded values

filtered_data['Score'] = positiveNegative

Vince Miller
  • 190
  • 1
  • 2
  • 15
0

I think Actually to replace Score column in table with positve or negetive, we use method called partition.Get the Score column as dataframe actualScore, then map the dataframe with replacing values of whether it is positive or negetive. Then replace values in score column with positiveNegative.