3

I'm working with a dataframe that has really long names that is more than 25 characters. I'm trying to make a bar graph (with plotly) with all of these organizations name, but the names get cut off because they're super long. I've already tried to the margins like the following:

plot_ly(x = number, y = org_name, type = 'bar') %>% 
layout(margin = list(l = 150))

It works but the bar graph doesn't look nice so the alternative I'm trying to do is abbreviate any organization's name that are longer than 25 characters. However, I'm having a hard time doing so. One way I tried to abbreviate it is to create a new column called abbrv, use substring to get the first 25 characters of the organization name and then do "...", and then put it in the column. While for the organization's name that isn't greater than 25, I would just put an NA in the abbrv column like the following:

for(i in dataframe.name$org_name){
 if(nchar(i) > 25){
 dataframe.name$abbrv <- paste0(substring(i, 0, 25), "...")
 }
 else{
  dataframe.name$abbrv <- "NA"
}

The only thing with this way is now that I have the abbrv column (if it works), how will I make sure that plotly displays the abbrv column if the organization name is greater than 25 characters and if it doesn't then it displays the normal organization name.

Anyways, I talked enough about that, but that was one approach I tried to do, but it doesn't quite work since the abbrv column puts "NA" for ALL of the rows in the column, no matter how long the organization's names are. Another approach I was trying to do is use the replace function such as:

for(i in dataframe.name$org_name){
 if(nchar(i) > 25){
   dataframe.name[i].replace(
     to_replace=i,
     value= abbreviate(i)
   )
}

But I get errors for that one as well. At this point, I'm not even sure what to do and how to abbreviate the long names in my dataframe? I'm really lost and confused on what to do and how to exactly abbreviate the long names. If anyone can help me out, that'll be great! Thanks.

*******Edit*******

So now I'm using this code:

for(i in 1:nrow(dfname)){
 if(nchar(dfname$orgname[i]) > 25){
   dfname$abbrv.column <- substring(dfname$orgname[i], 0, 25)
 }  
 else{
   dfname$abbrv.column <- dfname$orgname
 }
}

This isn't quite working though because all of the entries are the same organization name

fairlyMinty
  • 413
  • 8
  • 22
  • 1
    What if you'd put the full name in the abbrv column if it's shorter than 25 characters? And then you'd just use the abbrv column for the plot. Just change this `dataframe.name$abbrv <- "NA"` to this `dataframe.name$abbrv <- dataframe.name$org_name` – f.lechleitner Nov 30 '17 at 07:36
  • Hmmm, that isn't a bad idea either, but unfortunately now the abbrv column is filled with only ONE organization name, in this case, it's the organization name of the last row in the dataframe. I don't know why it's doing this, it was doing the same thing earlier too, except the column was filled with "NA." – fairlyMinty Nov 30 '17 at 07:49

2 Answers2

6

dataframe.name$abbr is a vector of all abbreviations in the dataframe, not just a single name.

It is the reason all entries in dataframe.name$abbr are being set to NA; the last name is in the dataframe is 25 characters or less, so all entries in dataframe.name$abbr are assigned NA.

@brettljausn has a decent suggestion: just do away with the NAs completely and only truncate where the character count exceeds 25.

Something like this should work a treat:

dataframe.name$abbrv <- substring( dataframe.name$org_name, 0, 25 )

I would try to use abbreviate first though:

dataframe.name$abbrv <- abbreviate( dataframe.name$org_name )
Zaid
  • 36,680
  • 16
  • 86
  • 155
  • Would you suggest to keep the for loop still to go through each entry in the dataframe? – fairlyMinty Nov 30 '17 at 07:56
  • Now I get the error: Warning message: In abbreviate(i) : abbreviate used with non-ASCII chars Does this matter? – fairlyMinty Nov 30 '17 at 07:58
  • You don't need the for loop at all – Zaid Nov 30 '17 at 07:58
  • Oh, I don't need a for loop? I'm trying to check and see if each organization name has more than 25 characters though, and if it does then I abbreviate it. – fairlyMinty Nov 30 '17 at 07:59
  • Based on the code in your edit, I'd rewrite that as: `dfname$abbrv.column <- ifelse( nchar(dfname$abbrv.column) > 25, substring(dfname$orgname,0,25), "NA" )`. But why do you insist on keeping the `NA`s? – Zaid Nov 30 '17 at 08:04
  • I actually did just try that, but it seems like it isn't applying the "NA." I think it'll be okay though because when I tried to use the abbreviate function, it worked! Thank you for all of your help!! The NAs don't matter as much :) I was just using NA as a replacement for the organization names who didn't have character length > 25 – fairlyMinty Nov 30 '17 at 08:06
  • Don't forget to accept the answer if it addressed your question – Zaid Nov 30 '17 at 08:07
  • Accept the answer? How do you do that? Sorry, I'm new to making posts on stackoverflow – fairlyMinty Nov 30 '17 at 08:08
  • Oh yeah, one more question... When I did `dfname$abbrv.column <- ifelse( nchar(dfname$abbrv.column) > 25, substring(dfname$orgname,0,25), "NA" )`, it didn't quite work and didn't replace the organizations > 25 with NA.. Any reason why that didn't work? – fairlyMinty Nov 30 '17 at 08:11
  • Change the check to `nchar(dfname$orgname) > 25`? – Zaid Nov 30 '17 at 08:13
  • Thank you for all your help! – fairlyMinty Nov 30 '17 at 08:18
0

Base R abbreviate. Limit to 8 characters including the "."

> abbreviate(names(iris), minlength = 8)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
  "Spl.Lngt"   "Spl.Wdth"   "Ptl.Lngt"   "Ptl.Wdth"    "Species" 
Antex
  • 1,364
  • 4
  • 18
  • 35