I have a csv file named movie_reviews.csv and the data inside looks like this:
1 Pixar classic is one of the best kids' movies of all time. 1 Apesar de representar um imenso avanço tecnológico, a força 1 It doesn't enhance the experience, because the film's timeless appeal is down to great characters and wonderful storytelling; a classic that doesn't need goggles or gimmicks. 1 As such Toy Story in 3D is never overwhelming. Nor is it tedious, as many recent 3D vehicles have come too close for comfort to. 1 The fresh look serves the story and is never allowed to overwhelm it, leaving a beautifully judged yarn to unwind and enchant a new intake of young cinemagoers. 1 There's no denying 3D adds extra texture to Pixar's seminal 1995 buddy movie, emphasising Buzz and Woody's toy's-eye- view of the world. 1 If anything, it feels even fresher, funnier and more thrilling in today's landscape of over-studied demographically correct moviemaking. 1 If you haven't seen it for a while, you may have forgotten just how fantastic the snappy dialogue, visual gags and genuinely heartfelt story is. 0 The humans are wooden, the computer-animals have that floating, jerky gait of animated fauna. 1 Some thrills, but may be too much for little ones. 1 Like the rest of Johnston's oeuvre, Jumanji puts vivid characters through paces that will quicken any child's pulse. 1 "This smart, scary film, is still a favorite to dust off and take from the ""vhs"" bin" 0 All the effects in the world can't disguise the thin plot.
the first columns with 0s and 1s is my label.
I want to first turn the texts in movie_reviews.csv into vectors, then split my dataset based on the labels (all 1s to train and 0s to test). Then feed the vectors into a classifier like random forest.