Let's start with a reproducible example, which is a data frame called key
composed by 8 columns and 3 rows:
key <- structure(c("Make Professional Maps with QGIS and Inkscape",
"Gain the skills to produce original, professional, and aesthetically pleasing maps using free software",
"English", "Inkscape 101 for Beginners - Design Vector Graphics",
"Learn how to create and design vector graphics for free!", "English",
"Design & Create Vector Graphics With Inkscape 2016", "The Beginners Guide to designing and creating Vector Graphics with Inkscape. No Experience needed!",
"English", "Design a Logo for Free in Inkscape", "Learn from an award winning, published logo design professional!",
"English", "Inkscape - Beginner to Pro", "If you want to have a decent learning curve, you are new to the program or even in design, this course is for you.",
"English", "Creating 2D Textures in Inkscape", "A guide to creating colorful and interesting textures in inkscape.",
"English", "Vector Art in Inkscape - Icon Design | Make Vector Graphics",
"Learn Icon Design by creating Vector Graphics using the .SVG and PNG format with the Free Software Inkscape!",
"English", "Inkscape and Bootstrap 3 -> Responsive Web Design!",
"Design responsive websites using Free tools Inkscape and Bootstrap 3! Mood Boards and Style Tiles to Mobile First!",
"English"), .Dim = c(3L, 8L), .Dimnames = list(c("Title", "Short_Description",
"Language"), c("1", "2", "4", "5", "6", "9", "13", "15")))
I would like to extract keywords of every column independently. For such purpose, I use the udpipe
package from R.
As I want to run the functions in every column, I run a for
loop.
Before starting, we create the model with English as reference (see this link for more info):
library(udpipe)
ud_model <- udpipe_download_model(language = "english")
ud_model <- udpipe_load_model(ud_model$file_model)
Ideally, my final output would be a dataframe with 8 columns, and so many rows as keywords were extracted.
I tried two methods:
Method 1: using dplyr
library(dplyr)
keywords <- list()
for(i in ncol(keywords_en_t)){
keywords[[i]] <- keywords_en_t %>%
udpipe_annotate(ud_model,s)
as.data.frame()
}
Method 2:
key <- list()
stats <- list()
for(i in ncol(keywords_en_t)){
key[[i]] <- as.data.frame(udpipe_annotate(ud_model, x = keywords_en_t[,i]))
stats[[i]] <- subset(key[[i]], upos %in% "NOUN")
stats <- txt_freq(x = stats$lemma)
}
Output
In both cases, or I get some errors or the output is not the expected.
As said, the output I expect is a dataframe with 8 columns representing in rows the keywords
Any idea?