Hi I'm struggling to get a CDF plot from a function in R.
Context: I've produced a graph and done a log-log plot of the degree distribution of the nodes, which seems to follow a power-law. But I want to do the CDF on a log scale as an added measure.
All the examples I have found so far only show how to do a CDF for one column or row in a data frame/table, can anyone help me please??
So this works to get the CDF of x or y:
x <- c(1, 2, 3, 4, 5, 6, 7, 11, 13, 14)
y <- c(48, 18, 9, 7, 5, 2, 2, 1, 1, 1)
ecdf(x)
ecdf(y)
plot(ecdf(x)) ### or same with y
But I can't get the CDF for a data frame for xy or a data frame for example:
x <- c(1, 2, 3, 4, 5, 6, 7, 11, 13, 14)
y <- c(48, 18, 9, 7, 5, 2, 2, 1, 1, 1)
dt <- data.table(x, y)
ecdf(dt) ####obviously won't work
Here's an example I found on Google of what I want to do: example http://igraph.org/r/doc/fit_power_law.html
Note the box in the upper right corner is not needed, but knowing how to find alpha and p would be useful.
Thanks for the help! :)
UPDATE 1 July 2018
I found the solution a while ago but forgot to add it here. As I'm at it I'll add all the steps to how I verified the power-law fit for my data (methodology from Clauset, Shalizi and Newman (2009) Power-law distributions in empirical data. SIAM Review 51(4):661-703).
First test the fit for a power-law:
x <- c(1, 2, 3, 4, 5, 6, 7, 11, 13, 14)
y <- c(48, 18, 9, 7, 5, 2, 2, 1, 1, 1)
logEstimate <- lm(log(y) ~ log(x))
summary(logEstimate)
Then plot against the expected fit:
logypred <- predict(logEstimate)
d.f <- data.frame( x = x, y = y )
plot(
y ~ x,
data = d.f,
type = "n",
log = "xy",
xlab = "Degree k",
ylab = "Vertices",
xlim = c( 1, 100 ),
ylim = c( 1, 100 ) )
abline(
h = c( seq( 1, 9, 1 ), seq( 10, 90, 10 ), seq( 100, 1000, 100 ) ),
lty = 3,
col = colors()[ 440 ] )
abline(
v = c( seq( 1, 9, 1 ), seq( 10, 90, 10 ), seq( 100, 1000, 100 ) ),
lty = 3,
col = colors()[ 440 ] )
points( y ~ x, data = d.f )
box()
lines(exp(logypred)~x, col=2)
To test a better fit against other models I did the following (this is for exponential model, removing log()
to do linear):
exponential.model <- lm(log(x)~ y)
summary(exponential.model)
Last step is to use the KS-test from the igraph package I managed to get the CDF rather easily (KS.p): http://igraph.org/r/doc/fit_power_law.html