Questions tagged [tokenize]

Tokenizing is the act of splitting a string into discrete elements called tokens.

Tokenizing is the act of splitting a stream of text into discrete elements called tokens using a delimiter present in the stream. These tokens can then be processed further, for example to search for a value or assign to an array for looping.

Example (VBA):

Dim tokens As Variant
Dim sampleString As String
Dim i As Long

sampleString = "The quick brown fox jumps over the lazy dog."

' tokenize string based on space delimiter
tokens = Split(sampleString, " ")

' list tokens
For i = LBound(tokens) To UBound(tokens)
  MsgBox tokens(i)
Next i

Related Tags:

Links:

2964 questions
1
vote
1 answer

Lucene Tokenizer deprecated

The following Analyzer extension has a number of deprecated sub classes. What are the non-deprecated replacements? For StandardTokenizer, StandardFilter, LowerCaseFilter, and StopFilter -- as used below. public class PorterAnalyzer extends Analyzer…
Katedral Pillon
  • 14,534
  • 25
  • 99
  • 199
1
vote
2 answers

QUEX_PATH issue while using tokenizer

I'm trying to install trainable-tokenizer. I have installed all the dependencies as per the README. this is trainable-tokenizer https://github.com/jirkamarsik/trainable-tokenizer. i have installed quex.deb using installer from quex.org which is a…
Tejus Prasad
  • 6,322
  • 7
  • 47
  • 75
1
vote
1 answer

ElasticSearch search for special characters with pattern analyzer

I'm currently using a custom analyzer with the tokenizer set to be the pattern (\W|_)+ So so each term is only letters and split on any non letter. As an example I have a document with the contents [dbo].[Material_Get] and another with…
Nived
  • 1,804
  • 1
  • 15
  • 29
1
vote
0 answers

Multi-(programming)-language tokenization

Is there a tool (or the like) which does multi (programming) language tokenization? So input should be a source code file, the tool should then auto detect the language, tokenize the file and output the tokens as xml/json/..
Felix Engelmann
  • 399
  • 5
  • 17
1
vote
1 answer

Tokenizer only prints the first token

I am having trouble building a tokenizer. I am new to c++ and was wondering if anyone could help. When I run the program, I enter the user input as "x = a + 1". When i do this, the only token output is the x. I want to display "x\n = a\n +\n…
1
vote
2 answers

Choose formats in sscanf in c

I am trying to parse a string Connected to a:b:c:d completed (reauth) id=5 using sscanf() in c language. My format string is Connected to %s completed %s id=%s. But In some cases my string is Connected to a:b:c:d completed id=5. I am not getting…
Krishna M
  • 1,135
  • 2
  • 16
  • 32
1
vote
1 answer

C++ Tokenize String - Not Working

I am having trouble tokenizing a string in order to add the substrings to vectors in an iterative loop. I have this below. When I run it, I am getting a return value of 1 from this function call, which I'm pretty sure is not…
Jonathon Anderson
  • 1,162
  • 1
  • 8
  • 24
1
vote
1 answer

Conditional jump or move depends on uninitialised value(s) strcat

I understand that this valgrind error is occurred because I was trying to use something uninitialized. The code below is one that causes this error. What it's doing is it is trying to read Racket code and get each symbols such as + or define.…
harumomo503
  • 371
  • 1
  • 7
  • 16
1
vote
3 answers

Trouble tokenizing for binary tree

I am trying to tokenize a textfile and then put the tokens in a binary tree where the token that has a lower value goes on the left branch of the tree and the token that has a higher value goes to the right and repeated values have an updated count.…
sukurity
  • 55
  • 3
  • 8
1
vote
3 answers

Tokenizing a String - C

I'm trying to tokenize a string in C based upon \r\n delimiters, and want to print out each string after subsequent calls to strtok(). In a while loop I have, there is processing done to each token. When I include the processing code, the only…
Delfino
  • 967
  • 4
  • 21
  • 46
1
vote
4 answers

boost tokenizer but keeping delimiter

maybe it is easy , but I could not find the answer myself. I want to use boost::tokenizer but keep the delimiters with the string My string is a bunch of numbers like these "1.00299 344.2221-25.112-33112" the result should be : "1.00299" …
1
vote
1 answer

ElasticSearch: Attempting to get spelling suggestion on proper names

Before I begin, let me just say that I'm no ElasticSearch expert, but I am currently tasked with tweaking some analyzers to get spelling suggestions working better in a couple of different situations. I've seen examples of people who are doing…
Cari
  • 997
  • 1
  • 10
  • 16
1
vote
1 answer

XSLT - Tokenizing template to italicize and bold XML element text

I have the following tokenizing template implemented in my XSLT.
  • user2285167
    • 55
    • 1
    • 2
    • 11
    1
    vote
    2 answers

    how to separate tokens, using multiple ways

    I have the following code, and it is currently working. However I am trying to read the tokens in three separate ways. The first token or number is to select, the second token, is to select an operation (insert or delete), and the rest of the tokens…
    CrisAlfie
    • 111
    • 13
    1
    vote
    1 answer

    XQuery on string with substrings

    Here's an example of code from the database I'm using: Sajama Andes
    Dreamus
    • 45
    • 1
    • 9