Questions tagged [tokenize]

Tokenizing is the act of splitting a string into discrete elements called tokens.

Tokenizing is the act of splitting a stream of text into discrete elements called tokens using a delimiter present in the stream. These tokens can then be processed further, for example to search for a value or assign to an array for looping.

Example (VBA):

Dim tokens As Variant
Dim sampleString As String
Dim i As Long

sampleString = "The quick brown fox jumps over the lazy dog."

' tokenize string based on space delimiter
tokens = Split(sampleString, " ")

' list tokens
For i = LBound(tokens) To UBound(tokens)
  MsgBox tokens(i)
Next i

Related Tags:

Links:

2964 questions

vote

1 answer

C++ tokenizer separator not compiling

I want to split by comma, and I have the following class which is instantiated with a comma-separated line. The class is as follows: #include #include #include #include #include #include…

c++ boost tokenize

asked Sep 01 '14 at 11:14

David Villasmil

vote

4 answers

Strtok removes first character in token C

I have this little problem when trying to tokenize a string from a http request directed at my "home made" Http server. Basicly I am using these lines of code to tokneize. token = strtok(bufptr, "\n"); while(token != NULL){ …

c string pointers tokenize strtok

asked Aug 25 '14 at 13:08

Thomas Holden

vote

1 answer

SWI-Prolog tokenize_atom/2 replacement?

What I need to do is to break atom to tokens. E. g.: tokenize_string('Hello, World!', L). would unify L=['Hello',',','World','!']. Exactly as tokenize_atom/2 do. But when I try to use tokenize_atom/2 with non-latin letters it fails. Is there any…

prolog tokenize dcg

asked Mar 27 '10 at 11:15

evgeniuz

2,599
5
30
36

vote

2 answers

Solr: Combining PatternTokenizerFactory and PathHierarchyTokenizerFactory?

In short: In schema.xml I want to declare an analyzer to break apart a field with the PatternTokenizer, and then I want to have those values to be processed by PathHierarchyTokenizer. (Path Tokenizer breaks up paths like "a/b/c" into [a, a/b,…

csv solr tokenize taxonomy

asked Aug 02 '14 at 00:05

Mark Bennett

1,446
2
19
37

vote

2 answers

How to tokenize cpp source?

For example, there is a cpp source code like: #include using namespace std; int main() { int counta=0; int countb=0; while(cin.get()!='*') count++; cout<

c++ tokenize code-analysis

asked Jul 26 '14 at 13:33

Jialin

2,415
2
14
10

vote

2 answers

Tokenizing a String to Pass as char * into execve()

My knowledge of C is very limited. I'm trying to tokenize a String passed to a server from a client, because I want to use passed arguments toexecve. The arguments passed viabufferneeds to be copied to*argv and tokenized such thatbuffer's tokens can…

c pointers tokenize arrays execve

asked Jul 17 '14 at 06:25

rice2007

vote

1 answer

Tokenizing japanese string and converting to hiragana

I am using string tokenizer and transform APIs to convert kanji characters to hiragana. The code in query (What is the replacement for Language Analysis framework's Morpheme analysis deprecated APIs) converts most of kanji characters to hiragana but…

c++ objective-c macos tokenize cjk

asked Jul 15 '14 at 10:00

Nitesh

2,681
4
27
45

vote

1 answer

Creating a syntax tree from tokens

I'm trying to create a tiny interpreter for TI-BASIC syntax. This is a snippet of TI-BASIC I'm trying to interpret A->(2+(3*3)) I've tokenized the code above into this sequence of tokens: Token{type=VARIABLE, content='A'} Token{type=ASSIGN,…

java tokenize abstract-syntax-tree

asked Jul 09 '14 at 19:06

August

12,410
3
35
51

vote

1 answer

Error in Keyword extraction using lucene

I'm completely new to the text extraction concept. When I was searching for an example I found one which has implemented using Lucene. I just tried to run it in eclipse but it gave an error. This is the error I'm getting : (TokenStream contract…

java lucene tokenize feature-extraction

asked Jun 25 '14 at 08:42

user3774248

vote

2 answers

feed treetagger in R with text in string rather than text in file

I use TreeTagger from R, through the Korpus package. Calling the treetag function requires me to indicate a filename, which contains the text to be processed. However, I would like to provide a string rather than a filename, because I have a do some…

r nlp tokenize

asked Jun 14 '14 at 06:46

Marc G.

vote

2 answers

Recognition of first and last name as one entity

I am interested in Natural Language processing. I am wondering if there is a good known algorithm that in a text one can determine first and last name as one entity. For example If we have this: Last week John Wayne went to Europe. I want to have a…

nlp tokenize named-entity-recognition

asked Jun 11 '14 at 06:01

TJ1

7,578
19
76
119

vote

1 answer

How to tokenize only certain words in Lucene

I'm using Lucene for my project and I need a custom Analyzer. Code is: public class MyCommentAnalyzer extends Analyzer { @Override protected TokenStreamComponents createComponents( String fieldName, Reader reader ) { Tokenizer source =…

java dictionary lucene tokenize

asked Jun 10 '14 at 16:01

PatrickBateman1981

vote

1 answer

Tokenizing a string in Clojure

I am trying to tokenize a string using clojure. The basic tokenization rules require the string to be split into separate symbols as follows: String literals of the form "hello world" are a single token Every word that is not part of a string…

regex clojure tokenize

asked Jun 05 '14 at 12:25

Yechiel Labunskiy

vote

2 answers

XSLT: Splitting element contents on a text delimiter, keeping elements

I'm trying to parse element contents and split them on a delimiter, while keeping all elements in the parent. I don't need -- don't want -- to find the delimiter inside the child elements.

Some text more text;…

xslt xml-parsing tokenize

asked Jun 01 '14 at 23:19

Keith Davies

vote

1 answer

Tokenizing a string with strtok() causes crash in c

Im trying to create a function that tokenizes a given string with given delimeters, puts the tokens in a 2D char array and returns it. Below the code is displayed: char** stringTokenizer(const char* str, const char* delims){ char** tokens; …

c string segmentation-fault tokenize strtok

asked May 13 '14 at 14:26

Chris Kon

Prev 1 2 3

…

100