2

Alright so I made a program that processes data, but I need to run this function on the string, and sometimes the data is 29,000,000+ characters long. If I run a loop like

for x = 1, 29000000, 1 do
end
print("Done")

It finishes instantly, now I'm not asking for someone to try and make this finish instantly, but how can I make it finish faster because it currently takes 3+ hours to get to 10 percent done, so basically is there a way to maybe allow lua to use more cpu or maybe make my function more efficient

local function interpret(action, input, key)
    local byte, char, decrypt, encrypt, input, output, sub = string.byte, string.char, key.decrypt, key.encrypt, input, '', string.sub
    if (action == "decrypt") then
        for x = 1, (#input), 1 do
            output = (output .. (char(((byte(decrypt[sub(input, x, x)]) - (x + 2)) + 1) % 256)))
            if x % 10000 == 0 then print(x) end
        end
    else
        for x = 1, (#input), 1 do
            output = output .. (encrypt[char(((byte(sub(input, x, x)) + x) + 1) % 256)])
            if x % 10000 == 0 then print(x) end
        end
    end
    return (output);
end
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
128Gigabytes
  • 193
  • 1
  • 10
  • Thanks for fixing the formatting, I instantly saw I messed up and was fixing it haha – 128Gigabytes Aug 09 '16 at 18:55
  • 1
    I'm not a Lua expert, but if it is doing what I think it is doing, it probably detects that the first loop does nothing and cuts it out. If you make it do actual work, it'll slow down a lot. I think your 'instant' is just the compiler being smart. – Cody Aug 09 '16 at 18:56
  • No, I tried making it do stuff, simple stuff, and it still finished in like 2 seconds. % x is just a heavy thing to process 29,000,000 times, but I have to use it. – 128Gigabytes Aug 09 '16 at 19:00
  • 3
    Instead of constantly creating a new string (which the `output = (output .. ` does), go through https://www.lua.org/pil/11.6.html or http://stackoverflow.com/questions/19138974/does-lua-optimize-the-operator and implement a similar solution. – nos Aug 09 '16 at 19:00
  • I tried it with the % 256 taken out, and had to make a few changes to keep it from breaking (That is the whole reason the %256 is there) its pretty much instant now but now it doesn't do what I need, I need something faster than %255 that will do the same thing. – 128Gigabytes Aug 09 '16 at 19:05
  • 1
    Oh sweet, I didn't realize I was creating a string every time I did that. Its much faster now, thanks. – 128Gigabytes Aug 09 '16 at 19:29
  • `x % 256` and `x & 0xff` (or, if you prefer, `x & 255`) are equivalent (at least for non-negative numbers). I don't know whether Lua optimizes the latter to the former. But even if it doesn't, that's not going to have as much impact as creating a new string each time. – Keith Thompson Aug 09 '16 at 20:00
  • 1
    @KeithThompson: Only the most recent versions of Lua (>= 5.3) have bitwise operators like `&`. When in doubt I would stick to using `%`. If you really want to push performance to the limit I would recommend doing the computation in C anyway... – hugomg Aug 09 '16 at 20:21
  • You will have much faster encryption using a 3rd party library with real encryption such as AES. – zaph Aug 09 '16 at 20:48
  • I know using someone elses encrypting software would be faster and easier but this is more fun, plus it can run on my phone. – 128Gigabytes Aug 10 '16 at 01:24
  • As long as you are not concerned about security and no one else is using it that's fine – zaph Aug 10 '16 at 13:44
  • @zaph I don't think its really a security concern because the way it scrambles the data is based on your password so unless they know your password they would be pretty hard up to reverse the encryption, like if your password starts with a M it will do something different than if it started with any other possible letter, and if the second letter is i it will do other stuff, if the third letter is 0 it will do something, and so on, so reversing it would be hard because you have no idea what any of the bytes originally where and no idea what even the first letter of the password is. – 128Gigabytes Aug 18 '16 at 05:15
  • ["Schneier's Law"](https://www.schneier.com/blog/archives/2011/04/schneiers_law.html): Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break. – zaph Aug 18 '16 at 11:17
  • @zaph Yeah...Thats kinda the point ._. I'm a little confused now, I wasn't bragging I actually wanted to know what you saw wrong with it – 128Gigabytes Aug 18 '16 at 15:32
  • Compare it to AES, see [Advanced Encryption Standard](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard). Assume there is a reason it is more complex, that a number of rounds are required. AES was selected from submissions in a multi-year competition where entries were reviewed by many cryptographic domain experts and revised as shortcomings were found. Unless you feel your encryption is superior use AES. Finally, AES has hardware support instructions in many Intel CPUs and full hardware support in some phones. An iPhone-6S can encrypt ~450MB/s. – zaph Aug 18 '16 at 15:59
  • @Zaph I feel there is a difference between what you are saying and what I am saying, you are basically saying there are better ways to do it, and I'm just saying mine is not a bad way. – 128Gigabytes Aug 19 '16 at 00:48
  • OK, so I understand that you are attached to the code you designed and it was fun, we are are. But the professional thing to do is to choose the more secure over personal preferences. If others will be using the app you owe them the best security you can provide, they are relying on you to make the right choice. But being a free-range code monkey is more fun, let's hope the engineer of the bridge I drive over every day not make the choice in the strength based on what was more fun. Don't feel bad, most developers do not behave as professionals. Perhaps being professional is over-rated. – zaph Aug 19 '16 at 03:30
  • No one expect me is ever going to use it for anything, you are basically just saying no one should every try and do something themselves because its been done before, and then you just straight up insult me by saying I'm an example of how to do something wrong. Its as unbreakable as it needs to be for my purposes and I came here for help making it a little faster, not to have someone tell me to give up and use what already exist because mine sucks. – 128Gigabytes Aug 19 '16 at 03:38
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/121305/discussion-between-128gigabytes-and-zaph). – 128Gigabytes Aug 19 '16 at 03:57

1 Answers1

2

When a program takes hours and hours to run (when it doesn't look like that should be the case) its likely that you are using an algorithm that has bad asymptotic time complexity. Run some experiments and see how the time it takes to run the algorithm varies with the length of the input string.

For this problem we ideally want the time to grow linearly with the size input. Doubling the size of the input string should result in doubling the computation time. However, in your case (as the comments suggest) I suspect that your algorithm is quadratic. Doubling the input size is probably resulting in quadrupling the computation time. Growing the input by 10x is probably increasing computation time by 100x.

If your algorithm is quadratic then the only way to make it run at a reasonable time for a long input is to replace the algorithm with another one with better asymptotic complexity. No matter how much you fine tune your program or how beefy is the computer hardware you use, the asymptotic complexity will catch up with you. For example, even if you fine tune your algorithm so that it runs 10000x faster, its only going to be able to deal with inputs that are 100x as large as they were before.


In your particular program the source of the problem is using the .. operator to build a large string one character at a time. In Lua this takes time proportional to the length of the strings you pass to .. because .. works by copying its inputs into a brand new string.

The most common workaround is to store the parts of your string in a table and use table.concat to join it together in the end.

local result = {}
for i = 1, (#input), 1 do
    result[i] = char(((byte(decrypt[sub(input, i, i)]) - (i + 2)) + 1) % 256)
end
return table.concat(result)
hugomg
  • 68,213
  • 24
  • 160
  • 246
  • Alright so its much faster now, but the new problem is at around 33,000,000 characters it errors not enough memory because the table is too long, what should I do – 128Gigabytes Aug 09 '16 at 22:04
  • At that point I would consider not using Lua to write my decrypt function. You could write your function in C and then use the C API to call that function from Lua. (btw, how much memory is your Lua process consuming before it errors out?) – hugomg Aug 09 '16 at 22:10
  • About 33 megabytes, also I think I solved the problem, I just make it table.concat the output table whenever it gets to 1,000,000 indexes. Testing it on a 255 MB text file. – 128Gigabytes Aug 09 '16 at 22:14
  • It didn't work, it worked all the way until the end when I tried to concat all the 1,000,000 character strings into one, is there a way to write to a file without doing it all at once, so I could write 1,000,000 characters onto it, then remove the string from memory and do the next 1,000,000, because io.write just replaces the file it doesn't add to the end of it. – 128Gigabytes Aug 09 '16 at 22:35
  • io.write always appends to the end of the end of the file. The operation that can ve destructive is io.open. If you use the "w" flag then it erases the file before opening it for writing and if you use the "a" flag it will not. – hugomg Aug 10 '16 at 03:55
  • try `str:gsub('.', function(symbol) .... end)` or library like https://github.com/starwing/lbuffer – FareakyGnome Aug 10 '16 at 04:19
  • Depending on what you are doing, it might be easier to just print the output to stdout, without saving it to memory, and then use your shell to redirect the output file descriptor to a file. – hugomg Aug 10 '16 at 04:23