How many is "millions" and how long is a "huge amount of time"? Porter stemming isn't a complicated algorithm and should be reasonably quick. I suspect you're I/O limited rather than anything else. Still... there may be some improvements you can eke out.
If order is not important and you don't need every copy of each stem, you may find it simpler (and more memory-efficient) to use a dictionary and/or set to store your stems. This will let you avoid needing to stem words you've already seen, which should improve performance, and store each stem only once.
For example:
seenwords = set()
seenstems = set()
for line in input_file:
line = line.lower().split()
seenstems.union(porter.stem(token) for token in line if token not in seenwords)
seenwords.union(line)
This can stem words more than once if they're on the same line, but for subsequent lines they won't need to be stemmed any longer. You could also process the words one by one, which will avoid stemming them multiple times in the same line, but there's some speed advantage in using the generator expression rather than a for
loop.