4

Hi I'm struggling to get a CDF plot from a function in R.

Context: I've produced a graph and done a log-log plot of the degree distribution of the nodes, which seems to follow a power-law. But I want to do the CDF on a log scale as an added measure.

All the examples I have found so far only show how to do a CDF for one column or row in a data frame/table, can anyone help me please??

So this works to get the CDF of x or y:

x <- c(1, 2, 3, 4, 5, 6, 7, 11, 13, 14)
y <- c(48, 18, 9, 7, 5, 2, 2, 1, 1, 1)
ecdf(x)
ecdf(y)
plot(ecdf(x)) ### or same with y

But I can't get the CDF for a data frame for xy or a data frame for example:

x <- c(1, 2, 3, 4, 5, 6, 7, 11, 13, 14)
y <- c(48, 18, 9, 7, 5, 2, 2, 1, 1, 1)
dt <- data.table(x, y)
ecdf(dt)               ####obviously won't work

Here's an example I found on Google of what I want to do: example http://igraph.org/r/doc/fit_power_law.html

Note the box in the upper right corner is not needed, but knowing how to find alpha and p would be useful.

Thanks for the help! :)

UPDATE 1 July 2018

I found the solution a while ago but forgot to add it here. As I'm at it I'll add all the steps to how I verified the power-law fit for my data (methodology from Clauset, Shalizi and Newman (2009) Power-law distributions in empirical data. SIAM Review 51(4):661-703).

First test the fit for a power-law:

x <- c(1, 2, 3, 4, 5, 6, 7, 11, 13, 14)
y <- c(48, 18, 9, 7, 5, 2, 2, 1, 1, 1)

logEstimate <- lm(log(y) ~ log(x))
summary(logEstimate)

Then plot against the expected fit:

logypred <- predict(logEstimate)
d.f <- data.frame( x = x, y = y )

plot(
    y ~ x,
    data = d.f,
    type = "n",
    log  = "xy",
    xlab = "Degree k",
    ylab = "Vertices",
    xlim = c( 1, 100 ),
    ylim = c( 1, 100 ) )
abline(
    h   = c( seq( 1, 9, 1 ), seq( 10, 90, 10 ), seq( 100, 1000, 100 ) ),
    lty = 3,
    col = colors()[ 440 ] )
abline(
    v   = c( seq( 1, 9, 1 ), seq( 10, 90, 10 ), seq( 100, 1000, 100 ) ),
    lty = 3,
    col = colors()[ 440 ] )
points( y ~ x, data = d.f )
box()
lines(exp(logypred)~x, col=2)

To test a better fit against other models I did the following (this is for exponential model, removing log() to do linear):

 exponential.model <- lm(log(x)~ y)
 summary(exponential.model)

Last step is to use the KS-test from the igraph package I managed to get the CDF rather easily (KS.p): http://igraph.org/r/doc/fit_power_law.html

pandanotp
  • 41
  • 8

1 Answers1

1

The package Emcdf provides the means to calculate and plot bivariate or multivariate cumulative distribution functions.

For your data:

library(Emcdf)
df <- data.frame(x = c(1, 2, 3, 4, 5, 6, 7, 11, 13, 14),
                 y = c(48, 18, 9, 7, 5, 2, 2, 1, 1, 1))

plotcdf(as.matrix(df))

Levelplot

plotcdf(as.matrix(df), type = "wireframe")

Wireframe

Is this what you are looking for?

LAP
  • 6,605
  • 2
  • 15
  • 28
  • Thanks for the answer LAP. Unfortunately this isn't what I'm looking for, though I will be able to use it for somethign else!! :). A search has given me some detail on what I am exactly trying to do though, it's to plot Pr(X>x) on the y axis for the x I defined above. Not sure if that helps. – pandanotp Feb 27 '18 at 13:17
  • I've edited my question, maybe it makes more sense now. – pandanotp Feb 27 '18 at 13:23