Questions tagged [words]

A word is a single distinct meaningful element of a data. Programming-related questions concerning Microsoft Word should NOT use this tag - use the tag [ms-word] instead. Questions on general usage of Microsoft Word are off-topic for Stack Overflow and should be asked on Super User instead.

692 questions
15
votes
4 answers

Split sentence into words but having trouble with the punctuations in C#

I have seen a few similar questions but I am trying to achieve this. Given a string, str="The moon is our natural satellite, i.e. it rotates around the Earth!" I want to extract the words and store them in an array. The expected array elements…
Richard N
  • 895
  • 9
  • 19
  • 36
14
votes
6 answers

Generating random words in Java?

I wrote up a program that can sort words and determine any anagrams. I want to generate an array of random strings so that I can test my method's runtime. public static String[] generateRandomWords(int numberOfWords){ String[] randomStrings = new…
Mr_CryptoPrime
  • 628
  • 2
  • 11
  • 25
13
votes
3 answers

Finding the most popular words in a list

I have a list of words: words = ['all', 'awesome', 'all', 'yeah', 'bye', 'all', 'yeah'] And I want to get a list of tuples: [(3, 'all'), (2, 'yeah'), (1, 'bye'), (1, 'awesome')] where each tuple is... (number_of_occurrences, word) The list should…
Maciej Ziarko
  • 11,494
  • 13
  • 48
  • 69
13
votes
2 answers

Calculating a relative Levenshtein distance - make sense?

I am using both Daitch-Mokotoff soundexing and Damerau-Levenshtein to find out if a user entry and a value in the application are "the same". Is Levenshtein distance supposed to be used as an absolute value? If I have a 20 letter word, a distance of…
Joseph Tura
  • 6,290
  • 8
  • 47
  • 73
12
votes
4 answers

Vim: Invert string (by words)

This is my string: "this is my sentence" I would like to have this output: "sentence my is this" I would like to select a few words on a line (in a buffer) and reverse it word by word. Can anyone help me?
Reman
  • 7,931
  • 11
  • 55
  • 97
12
votes
3 answers

how to generate list of (unique) words from text file in ubuntu?

I have an ASCII text file. I want to generate a list of all "words" from that file using one or more Ubuntu commands. A word is defined as an alpha-num sequence between delimiters. Delimiters are by default whitespaces but I also want to experiment…
I Z
  • 5,719
  • 19
  • 53
  • 100
11
votes
20 answers

Calculating frequency of each word in a sentence in java

I am writing a very basic java program that calculates frequency of each word in a sentence so far i managed to do this much import java.io.*; class Linked { public static void main(String args[]) throws IOException { BufferedReader…
Sigma
  • 742
  • 2
  • 9
  • 24
10
votes
4 answers

Can I tell if a std::string represents a number using stringstream?

Apparently this is suposed to work in showing if a string is numerical, for example "12.5" == yes, "abc" == no. However I get a no reguardless of the input. std::stringstream ss("2"); double d; ss >> d; if(ss.good())…
alan2here
  • 3,223
  • 6
  • 37
  • 62
9
votes
1 answer

Postgres word_similarity not comparing words

"Returns a number that indicates how similar the first string to the most similar word of the second string. The function searches in the second string a most similar word not a most similar substring. The range of the result is zero (indicating…
Cristiano Coelho
  • 1,675
  • 4
  • 27
  • 50
9
votes
1 answer

Python regex for finding all words in a string

Hello I am new into regex and I'm starting out with python. I'm stuck at extracting all words from an English sentence. So far I have: import re shop="hello seattle what have you got" regex = r'(\w*) ' list1=re.findall(regex,shop) print list1 This…
TNT
  • 480
  • 1
  • 4
  • 11
8
votes
3 answers

Extract Images and Words with coordinates and sizes from PDF

I've read much about PDF extractions and libraries (as iText) but i just haven't found a solution to extract images and text (with coordinates) from a PDF. The task is to scan PDF with catalog of products and extract each image. There is an image…
Alex
  • 1,237
  • 3
  • 18
  • 29
8
votes
5 answers

where can I find a good wordlist

I'm looking for a file that is a wordlist and also is set up by type of word. For example something in this format Nouns: { bus car deck elephant ... } Adjectives { awful bashful ... } Advervb { ... } Any ideas?
qwertymk
  • 34,200
  • 28
  • 121
  • 184
7
votes
2 answers

Regular Expression - Exclude list of words for a name

I'm trying to make a regular expression that accepts this: Only a-z, 0-9, _ chars, with a minimum length of 3 admin, static, my and www are rejected. For the first part, I already managed to do it with : ^[a-zA-Z0-9\\_]{3,}$ But I don't know how…
Cyril N.
  • 38,875
  • 36
  • 142
  • 243
7
votes
7 answers

What's a good measure for classifying text documents?

I have written an application that measures text importance. It takes a text article, splits it into words, drops stopwords, performs stemming, and counts word-frequency and document-frequency. Word-frequency is a measure that counts how many times…
bodacydo
  • 75,521
  • 93
  • 229
  • 319
7
votes
4 answers

How do I count the total number of words in a Pandas dataframe cell and add those to a new column?

A common task in sentiment analysis is to obtain the count of words within a Pandas data frame cell and create a new column based on that count. How do I do this?
muninn
  • 473
  • 1
  • 4
  • 12
1
2 3
46 47