Questions tagged [longest-substring]

Longest Substring is a classic computer science problem: given two strings, find the common strings, then return the string(s) in common with the greatest length.

The longest common substring problem is a derivative of the edit distance problem, which focuses on the most common typing errors:

  • namely character omissions
  • insertions
  • substitutions
  • reversals

The idea is to compute the minimum number of operations that it would take to transform one string into another. The longest common substring problem follows with the following constraints:

  • substitutions are forbidden
  • only exact character match, insert, and delete are allowable edit operations

References

181 questions
0
votes
1 answer

Having trouble with two of my functions for text analysis

I'm having trouble trying to find the amount of unique words in a speech text file (well actually 3 files), I'm just going to give you my full code so there is no misunderstandings. #This program will serve to analyze text files for the number of…
BBEng
  • 155
  • 3
  • 17
0
votes
2 answers

Trying to figure out longest path algorithm python

I'm trying to make a python script, that gets me the longest repeated character in a given matrix (horizontally and vertically). Example: I have this matrix: afaaf rbaca rlaff Giving this matrix for input, it should result: a 3 You can see that…
0
votes
3 answers

C-Program Determine the largest substring in a char array without 'e' or 'E'

I have a problem as stated in the title. Here more details. My Problem is: a) develop a C-function which gets a char array as input parameteter and which determines the largest substring in this char array without 'e' 'E'. Print the length of that…
0
votes
2 answers

Extracting the longest sequence from the tab delim file

I have tab delim file file which contains which contains the following information >fasta >ss_23_122_0_1 MJSDHWTEZTZEWUIASUDUAISDUASADIASDIAUSIDAUSIDCASDAS >ss_23_167_0_1 WEIURIOWERWKLEJDSAJFASDGASZDTTQZWTEZQWTEZUQWEZQWTEZQTWEZTQW …
Carol
  • 367
  • 2
  • 3
  • 18
0
votes
1 answer

longest common subsequence function does not work for all examples

EDIT: UP The code does not work properly with the strings below. "1 11 23 1 18 9 15 23 5" "11 1 18 1 20 5 11 1" EDIT: I noticed, that if I change 20 to 40 in second string, the function works properly... For strings: "12 4 55 11 8 43 22 90 5 88…
user4132350
0
votes
0 answers

longest common substring for two strings

I am looking to find the substring of two different strings; the problem is as follows: Given two strings x = X1...Xn and y = Y1...Ym, find the length of the longest common substring, and the largest k for which in the indices i and j with…
0
votes
3 answers

Detect and remove signature in forum text messages using R

I've a collection of text messages scraped from a forum into a data frame. Here's a reproducible example: example.df <- data.frame(author=c("Mikey", "Donald", "Mikey", "Daisy", "Minnie", "Daisy"), message=c("Hello World!…
Gabriele B
  • 2,665
  • 1
  • 25
  • 40
0
votes
2 answers

Longest Common Subsequences of 2 Arrays of Bytes

I wanted to compare the LCS of two files from their binary, therefore i used the usual LCS source code, and using the GenStr command to change the bytes of the file to String first. The problem is, I received memory out of bound error because…
Anonymous
  • 1
  • 1
0
votes
1 answer

longest common substring between 2 HUGE files - out of memory: java heap space

I'm completely brain fried after this, I need to find the longest common substring between 2 files, a small one and a HUGE one. I don't even know where to start to begin the search, heres what I have so far import java.io.BufferedReader; import…
Steven R
  • 323
  • 2
  • 6
  • 18
0
votes
0 answers

How to find top 10 frequent substring from string databases

Assume I have a txt file, each line represents a string. Is there some efficient way to find out top 10 frequent substrings. The difficulty is that there are too large size of substring permutation for a given string. Given a N length of string, it…
shijie xu
  • 1,975
  • 21
  • 52
0
votes
1 answer

Go : longest common subsequence back tracing

My code works for Computing the length of the LCS but I apply the same code for Reading out an LCS on the following link, http://en.wikipedia.org/wiki/Longest_common_subsequence_problem but some strings are missing. Could you tell me what I am…
user2671513
0
votes
1 answer

perl loops within subroutines to display the longest repeating string thats selected for a particular subsection of the string

I was wondering if anyone knows how to simplify, or generalize this code. It gives the correct answer, however it is only applicable to the current situation. My code is as follows: sub longestRepeat{ # list of…
0
votes
3 answers

Optimisation ideas - Longest common substring

I have this program which is supposed to find the Longest Common Substring of a number of strings. Which it does, but if the strings are very long (i.e. >8000 characters long), it works slowly (1.5 seconds). Is there any way to optimise that? The…
Chiffa
  • 1,486
  • 2
  • 19
  • 38
0
votes
2 answers

Clojure performance - why does the "ugly" "array swap trick" improve lcs performance?

This is a follow up to @cgrand's answer to the question "Clojure Performance For Expensive Algorithms." I haven been studying it and trying to apply some of his techniques to my own experimental Clojure perf tuning. One thing I am wondering about is…
noahlz
  • 10,202
  • 7
  • 56
  • 75
0
votes
1 answer

How is python's difflib.find_longest_match implemented?

Originally wanted an algorithm to find the longest substring between two python Strings. The general answer for the best runtime was "to construct a suffix tree", based on the online consensus for a linear runtime. However, there are zero examples…
Lucas Ou-Yang
  • 5,505
  • 13
  • 43
  • 62