Questions tagged [burrows-wheeler-transform]

Burrows-Wheeler is an algorithm used in data compression.

Burrows-Wheeler transformation was developed in 1994 at DEC Systems research by Michael Burrows and David Wheeler who published an original paper. The algorithm permutes the order of characters, allowing for repetitive elements to be easily compressed. Recently, the algorithm has been used for data compression in high-throughput DNA sequencing.

29 questions
12
votes
2 answers

How to sort array suffixes in block sorting

I'm reading the block sort algorithm from the Burrows and Wheeler paper. This a step of the algorithm: Suppose S= abracadabra Initialize an array W of N words W[0, ... , N - 1], such that W[i] contains the characters S'[i, ... , i + k - 1] arranged…
Guido Tarsia
  • 1,962
  • 4
  • 27
  • 45
7
votes
1 answer

Using suffix array algorithm for Burrows Wheeler transform

I've sucessfully implemented a BWT stage (using regular string sorting) for a compression testbed I'm writing. I can apply the BWT and then inverse BWT transform and the output matches the input. Now I wanted to speed up creation of the BW index…
Bim
  • 1,008
  • 1
  • 10
  • 29
6
votes
3 answers

Burrows-Wheeler Transform without EOF character

I need to perform a well-known Burrows-Wheeler Transform in linear time. I found a solution with suffix sorting and EOF character, but appending EOF changes the transformation. For example: consider the string bcababa and two rotations s1 =…
4
votes
1 answer

Why is bzip2's maximum blocksize 900k?

bzip2 (i.e. this program by Julian Seward)'s lists available block-sizes between 100k and 900k: $ bzip2 --help bzip2, a block-sorting file compressor. Version 1.0.6, 6-Sept-2010. usage: bzip2 [flags and input files in any order] -1 .. -9 …
saladi
  • 3,103
  • 6
  • 36
  • 61
4
votes
1 answer

fast algorithm for move to front transform

I'm trying to find the fastest algorithm for the move to front transformation. The one that's used for example in conjunction with burrows wheeler transform. The best I've managed so far does about 15MB/s on Core i3 2.1GHz. But I'm sure that it's…
Martin
  • 911
  • 7
  • 21
2
votes
2 answers

Burrows-Wheeler Transform (BWT) repeating string

I'm writing Burrows-Wheeler Transform and its reverse in Python. It works fine for small strings, but it fell apart when I tested a bigger string. At certain points, the string seems to loop over. I'm sure it must have to do with the final loop of…
2
votes
1 answer

Fast implementation of BWT in Lua

local function fShallowCopy(tData) local tOutput = {} for k,v in ipairs(tData) do tOutput[k] = v end return tOutput end local function fLexTblSort(tA,tB) --sorter for tables for i=1,#tA do if tA[i]~=tB[i] then …
HDeffo
  • 59
  • 5
2
votes
5 answers

Idiomatic string rotation in Clojure

How to idiomatically rotate a string in Clojure for the Burrows-Wheeler transform? I came up with this, which uses (cycle "string"), but feels a bit imperative: (let [s (str "^" "banana" "|") l (count s) c (cycle s) m (map #(take l…
Petrus Theron
  • 27,855
  • 36
  • 153
  • 287
2
votes
2 answers

Recursive create all rotations of string in scala

I've been playing around with trying to recreate the example of the Burrows-Wheeler transform on wikipedia. To add to the fun I'm trying to do so by a recursive strategy. However, I get stuck in the first step, creating all rotations of the string.…
Johan
  • 689
  • 7
  • 17
1
vote
1 answer

Is radix sort used for suffix sorting?

I'm trying to implement block sorting. This is from the Burrows Wheeler paper. (Before this step, you create a V suffix array of S) Q4. [radix sort] Sort the elements of V , using the first two characters of each suffix as the sort key. This can be…
1
vote
2 answers

Optimization of the Burrows Wheeler transform

Hello I am having some difficulty optimizing the burrows wheeler transform. I'm trying to transform text files, however transforming large text files like the bible take way too long. Any idea on how to proceed? public…
nope
  • 223
  • 4
  • 15
1
vote
2 answers

Burrows Wheeler Transform (BWT)

I am having difficulties in grasping the decode algorithm for the Burrows Wheeler transform (BWT.) I've done reading online and went through some sample code, but, they all seem to be using a 'primary index' to decode an encoded string. My question…
DeepHouse
  • 21
  • 1
  • 3
1
vote
1 answer

Reverse BWT without knowing last character

Usually in Burrows-Wheeler Transform algorithm, a $ character is used to signal the end of string, but in so many cases, this $ is omitted. I was wondering how it can be reversed without knowing the position of the last character? For example, I…
Thang Do
  • 316
  • 2
  • 16
1
vote
1 answer

Burrow wheeler implementation for large strings

I have tried rotating a really large string in burrow wheelers cyclic string array. But my input is about 200000 characters and when the input is this big i am unable to run the code as it runs out of heap space. My prof said that the only way to…
Tj2491
  • 23
  • 7
0
votes
1 answer

Distance Coding (DC) BWT

i am trying to write BWT with Huffman compression program with Java. In BWT i want to implement Distance Coding (DC). I am looking for some examples, but there isn't so much of them. I found this…
Streetboy
  • 4,351
  • 12
  • 56
  • 101
1
2