Circlize chord diagram with multiple levels of data

Question

I am finding myself a bit stuck, I want to show flows between regions for trafficked species via a chord diagram on circlize, but am unable to work out how to plot when column 1 and 2 represent the "connection", column 3 is the "factor" of interest and column 4 are the values. I have included a sample of data below (yes I am aware indonesia is a region), as you can see each species is not unique to a particular region. I would like to produce a plot similar to the one included below but replace the "countries" with "species" for each region. Is this possible to do?

import_region    export_region  species                flow
North America    Europe         Acanthosaura armata     0.0104
Southeast Asia   Europe         Acanthosaura armata     0.0022
Indonesia        Europe         Acanthosaura armata     0.1971
Indonesia        Europe         Acrochordus granulatus  0.7846
Southeast Asia   Europe         Acrochordus granulatus  0.1101
Indonesia        Europe         Acrochordus javanicus   2.00E-04
Southeast Asia   Europe         Acrochordus javanicus   0.0015
Indonesia        North America  Acrochordus javanicus   0.0024
East Asia        Europe         Acrochordus javanicus   0.0028
Indonesia        Europe         Ahaetulla prasina       4.00E-04
Southeast Asia   Europe         Ahaetulla prasina       4.00E-04
Southeast Asia   East Asia      Amyda cartilaginea      0.0027
Indonesia        East Asia      Amyda cartilaginea      5.00E-04
Indonesia        Europe         Amyda cartilaginea      0.004
Indonesia        Southeast Asia Amyda cartilaginea      0.0334
Europe           North America  Amyda cartilaginea      4.00E-04
Indonesia        North America  Amyda cartilaginea      0.1291
Southeast Asia   Southeast Asia Amyda cartilaginea      0.0283
Indonesia        West Asia      Amyda cartilaginea      0.7614
South Asia       Europe         Amyda cartilaginea      2.8484
Australasia      Europe         Apodora papuana         0.0368
Indonesia        North America  Apodora papuana         0.324
Indonesia        Europe         Apodora papuana         0.0691
Europe           Europe         Apodora papuana         0.0106
Indonesia        East Asia      Apodora papuana         0.0129
Europe           North America  Apodora papuana         0.0034
East Asia        East Asia      Apodora papuana         2.00E-04
Indonesia        Southeast Asia Apodora papuana         0.0045
East Asia        North America  Apodora papuans         0.0042

example of diagram similar to what I would like, please click link below: chord diagram

Have you written some code for this which you can showcase in your post? — RBT, Aug 28 '16 at 07:01
I am honestly unsure how to add a second level and every attempt I have made has been a bit of a guess. Code to produce a Region-region flow chord diagram with data similar to my own (minus the species level) can be found here https://github.com/gjabel/migest/blob/master/demo/cfplot_reg2.R — Koalaitystats, Aug 28 '16 at 09:07

score 5 · Accepted Answer · answered Aug 28 '16 at 15:34

In circlize package, the ChordDiagram() function only allows a "from" column, a "to" column and a optional "value" column. However, in your case, actually we can make some transformation for the original data frame to modify it into a three-column data frame.

In you example, you want to distinguish e.g. Acanthosaura_armata in North America from Acanthosaura_armata in Europe, one solution is to merge region names and species names such as Acanthosaura_armata|North_America to form a unique identifier. Next I will demonstrate how to visualize this dataset by circlize package.

Read in the data. Note I replaced space with underscores.

df = read.table(textConnection(
"import_region    export_region  species                flow
North_America    Europe         Acanthosaura_armata     0.0104
Southeast_Asia   Europe         Acanthosaura_armata     0.0022
Indonesia        Europe         Acanthosaura_armata     0.1971
Indonesia        Europe         Acrochordus_granulatus  0.7846
Southeast_Asia   Europe         Acrochordus_granulatus  0.1101
Indonesia        Europe         Acrochordus_javanicus   2.00E-04
Southeast_Asia   Europe         Acrochordus_javanicus   0.0015
Indonesia        North_America  Acrochordus_javanicus   0.0024
East_Asia        Europe         Acrochordus_javanicus   0.0028
Indonesia        Europe         Ahaetulla_prasina       4.00E-04
Southeast_Asia   Europe         Ahaetulla_prasina       4.00E-04
Southeast_Asia   East_Asia      Amyda_cartilaginea      0.0027
Indonesia        East_Asia      Amyda_cartilaginea      5.00E-04
Indonesia        Europe         Amyda_cartilaginea      0.004
Indonesia        Southeast_Asia Amyda_cartilaginea      0.0334
Europe           North_America  Amyda_cartilaginea      4.00E-04
Indonesia        North_America  Amyda_cartilaginea      0.1291
Southeast_Asia   Southeast_Asia Amyda_cartilaginea      0.0283
Indonesia        West_Asia      Amyda_cartilaginea      0.7614
South_Asia       Europe         Amyda_cartilaginea      2.8484
Australasia      Europe         Apodora_papuana         0.0368
Indonesia        North_America  Apodora_papuana         0.324
Indonesia        Europe         Apodora_papuana         0.0691
Europe           Europe         Apodora_papuana         0.0106
Indonesia        East_Asia      Apodora_papuana         0.0129
Europe           North_America  Apodora_papuana         0.0034
East_Asia        East_Asia      Apodora_papuana         2.00E-04
Indonesia        Southeast_Asia Apodora_papuana         0.0045
East_Asia        North_America  Apodora_papuans         0.0042"),
header = TRUE, stringsAsFactors = FALSE)

Also, I removed some rows which have very tiny values.

df = df[df[[4]] > 0.01, ]

Assign colors for species and regions.

library(circlize)
library(RColorBrewer)
all_species = unique(df[[3]])
color_species = structure(brewer.pal(length(all_species), "Set1"), names = all_species)
all_regions = unique(c(df[[1]], df[[2]]))
color_regions = structure(brewer.pal(length(all_regions), "Set2"), names = all_regions)

Group by species

First I will demonstrate how to group the chord diagram by species.

As mentioned before, we use species|region as unique identifier.

df2 = data.frame(from = paste(df[[3]], df[[1]], sep = "|"),
                 to = paste(df[[3]], df[[2]], sep = "|"),
                 value = df[[4]], stringsAsFactors = FALSE)

Next we adjust the order of all sectors to first order by species, then by regions.

combined = unique(data.frame(regions = c(df[[1]], df[[2]]), 
    species = c(df[[3]], df[[3]]), stringsAsFactors = FALSE))
combined = combined[order(combined$species, combined$regions), ]
order = paste(combined$species, combined$regions, sep = "|")

We want the color of the links to be the same as the color of regoins

grid.col = structure(color_regions[combined$regions], names = order)

Since the chord diagram is grouped by species, gaps between species should be larger than inside each species.

gap = rep(1, length(order))
gap[which(!duplicated(combined$species, fromLast = TRUE))] = 5

With all settings ready, we now can make the chord diagram:

In following code, we set preAllocateTracks so that circular lines which represents species will be added afterwards.

circos.par(gap.degree = gap)
chordDiagram(df2, order = order, annotationTrack = c("grid", "axis"),
    grid.col = grid.col, directional = TRUE,
    preAllocateTracks = list(
        track.height = 0.04,
        track.margin = c(0.05, 0)
    )
)

Circular lines are added to represent species:

for(species in unique(combined$species)) {
    l = combined$species == species
    sn = paste(combined$species[l], combined$regions[l], sep = "|")
    highlight.sector(sn, track.index = 1, col = color_species[species], 
        text = species, niceFacing = TRUE)
}
circos.clear()

And the legends for regions and species:

legend("bottomleft", pch = 15, col = color_regions, 
    legend = names(color_regions), cex = 0.6)
legend("bottomright", pch = 15, col = color_species, 
    legend = names(color_species), cex = 0.6)

The plot looks like this:

Group by regions

The code is similar that I will not explain it but just attach the code in the post. The plot looks like this:

## group by regions
df2 = data.frame(from = paste(df[[1]], df[[3]], sep = "|"),
                 to = paste(df[[2]], df[[3]], sep = "|"),
                 value = df[[4]], stringsAsFactors = FALSE)

combined = unique(data.frame(regions = c(df[[1]], df[[2]]), 
    species = c(df[[3]], df[[3]]), stringsAsFactors = FALSE))
combined = combined[order(combined$regions, combined$species), ]
order = paste(combined$regions, combined$species, sep = "|")
grid.col = structure(color_species[combined$species], names = order)

gap = rep(1, length(order))
gap[which(!duplicated(combined$species, fromLast = TRUE))] = 5

circos.par(gap.degree = gap)
chordDiagram(df2, order = order, annotationTrack = c("grid", "axis"),
    grid.col = grid.col, directional = TRUE,
    preAllocateTracks = list(
        track.height = 0.04,
        track.margin = c(0.05, 0)
    )
)
for(region in unique(combined$regions)) {
    l = combined$regions == region
    sn = paste(combined$regions[l], combined$species[l], sep = "|")
    highlight.sector(sn, track.index = 1, col = color_regions[region], 
        text = region, niceFacing = TRUE)
}
circos.clear()

legend("bottomleft", pch = 15, col = color_regions, 
    legend = names(color_regions), cex = 0.6)
legend("bottomright", pch = 15, col = color_species, l
    egend = names(color_species), cex = 0.6)

I'm not sure whether to make a new post for this or not, but is it possible to use log transformation on this type of diagram? The problem seems to arise with the axis, which does not display the correct scale when plotted. thank you so much in advance :) — Koalaitystats, Sep 14 '16 at 05:15
It does not support directly, but you can first do log-transformation to your data and remove axis when making the chord diagram (by setting `annotationTrack = c("grid")`, later manually add axes which show log-axis: (e.g. circos.axis(at = c(0, 1, 2), labels = c(1, 10, 100), sector.index = ..., track.index = ...)) — Zuguang Gu, Sep 17 '16 at 20:56
Also @Koalaitystats, how do you want to log transform your data? just `log10(df$flow)` or `log10(df$flow*1e4)`? — Zuguang Gu, Sep 17 '16 at 20:59
would i need to manually add an axis to each sector, for example logged values of 3 +5 and logged values of 4 + 4 do not equal the same raw number values so I could not amend circos axis for all sectors at once? — Koalaitystats, Sep 18 '16 at 04:24
i ended up making a separate post if that is easier to reply on :) http://stackoverflow.com/questions/39504096/r-circlize-chord-diagram-with-log-scale — Koalaitystats, Sep 18 '16 at 04:25

Circlize chord diagram with multiple levels of data

1 Answers1

Group by species

Group by regions