-1

I have a table called posts in my database and each record represents a post with title, url and body.

Is it possible to use Faker to generate titles and bodies in English? The body should be actual English text.

I went through the docs and didn't find what I'm looking for.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Amanda Ferrari
  • 1,168
  • 5
  • 17
  • 30
  • 4
    If Faker can't do it, find a different tool that does. This sounds like a feature request to Faker. Generating arbitrary English text is non-trivial, which is why Faker emits junk Latin by default. If you want to do this yourself look at a Markov Chain to generate it based on sample text you collect into a corpus for training. – tadman Nov 29 '19 at 19:25
  • 3
    Why do you want to generate fake "real English posts and text"? For testing and development regular old Lorem works fine. The only reason I can think of for generating fakery is in spam or phishing. – the Tin Man Nov 29 '19 at 19:58
  • @AmandaFerrari we may appreciate a sense of humor, but you'll very likely want help again from Tin Man. – lacostenycoder Nov 29 '19 at 20:32
  • I was able to find a very good dataset with all that I needed :D – Amanda Ferrari Nov 29 '19 at 21:12

1 Answers1

4

For English texts, why can't you do this?

Faker::Quote.matz
=> "I believe that the purpose of life is, at least in part, to be happy. Based on this belief, Ruby is designed to make programming not only easy but also fun. It allows you to concentrate on the creative side of programming, with less stress."

Or try:

Faker::Hipster.sentences.sample
=> "Cornhole drinking actually pop-up brooklyn williamsburg wayfarers."

For Titles, just do

[Faker::Company.name,  Faker::Company.industry].join(' - ')
=> "Schmitt-Kohler - E-Learning"

But if you really need some random actual sentences you can pull them from random book texts and use this gem to easily grab sentences.

require "tactful_tokenizer"
t = TactfulTokenizer::Model.new
string = `curl http://www.textfiles.com/etext/FICTION/alger-cast-544.txt`;
sentences = t.tokenize_text(string);
sentences.count
=> 6322
sentences.sample

"I'm just staying at a place on Fourteenth Street, but I can't
afford to stay there long, for they charge a dollar a day."
lacostenycoder
  • 10,623
  • 4
  • 31
  • 48