Questions tagged [zipf]

Zipf's law (/ˈzɪf/) is an empirical law formulated using mathematical statistics that refers to the fact that many types of data studied in the physical and social sciences can be approximated with a Zipfian distribution, one of a family of related discrete power law probability distributions.

The law is named after the American linguist George Kingsley Zipf (1902–1950), who popularized it and sought to explain it, though he did not claim to have originated it.
Source: https://en.wikipedia.org/wiki/Zipf%27s_law

25 questions
0
votes
1 answer

How to calculate zipf exponent in R?

The generalized Zipf's law states that, if we rank a collection of n objects in non-decreasing order according to their size, the product of a power of the rank and of the size of each object is constant throughout the collection, i.e. where r is…
Mark
  • 1,577
  • 16
  • 43
0
votes
1 answer

Finding 'a' value of zipf distribution

I found this python function that generates a zipf distribution based on an 'a' value and a 'size' value, where size is analogous to total number of elements in a frequency table:…
samuel
  • 1
  • 1
0
votes
1 answer

Why is '[UNK]' word the first in word2vec vocabulary?

If the vocabulary is ordered from the more frequent word to the less frequent, placing '[UNK]' at the beginning means that it occurs most. But what if '[UNK]' isn't the most frequent word? Should I put it at another place in the vocabulary,…
Ricardo
  • 3
  • 2
0
votes
1 answer

How to add a zipf curve to a bar plot of word frequency?

plt.figure() plt.bar([key for val,key in lst], [val for val,key in lst]) plt.xlabel("Terms") plt.ylabel("Counts") plt.show() I have a list of tuples (count, term) that has been sorted in descending order of count (i.e the number of times a term…
Paw in Data
  • 1,262
  • 2
  • 14
  • 32
0
votes
1 answer

Generate missing values on the dataset based on ZIPF distribution

Currently, I want to observe the impact of missing values on my dataset. I replace data point (10, 20, 90 %) to missing values and observe the impact. This function below is to replace a certain per cent data point to missing. def dropout(df,…
user46543
  • 1,033
  • 4
  • 13
  • 23
0
votes
2 answers

Tidy text: Compute Zipf's law from the following term-document matrix

I tried the code from http://tidytextmining.com/tfidf.html. My result can be seen in this image. My question is: How can I rewrite the code to produce the negative relationship between the term frequency and the rank? The following is the…
SChatcha
  • 129
  • 1
  • 3
  • 10
0
votes
1 answer

Zipf_plot() : How to compare two objects in one graph?

I'm trying to use the Zipf_plot function from the tm package to compare two different document-term-matrices - and I'm not an R expert .. Maybe you could tell me, if there's a way to fit both in this function? Zipf_plot(x, type = "l", ... ) I…
Bay
  • 3
  • 1
-1
votes
1 answer

Unable to Plot Zipf's Distribution Graph

I am new to python and machine learning. I want to plot Zipf's distribution graph for a text file. But my code gives error. Following is my python code import re from itertools import islice #Get our corpus of medical words frequency =…
-1
votes
1 answer

Why do I get a TypeError in this Python program?

# I'm trying to make a Zipf's Law observation using a dictionary that stores every word, and a counter for it. I will later sort this from ascending to descending to prove Zipf's Law in a particular text. I'm taking most of this code from Automate…
-2
votes
2 answers

How to edit a graph in Python (Zipf's Law)

I need help making a bar chart showing the frequency of the ten most common words in the file. Next to each bar is a second bar whose height is the frequency predicted by Zipf’s Law. (For example, suppose the most common word appears 100 times.…
Stiff
  • 47
  • 7
1
2