0

[ Python ]

I have a string and I know it's size in memory. I want to get the number of characters (a rough estimate) will it contain.

Actual case, I want to send a report through mail, the content of the mail is exceeding the permitted size is allowed for it. I want to split the mail into multiple according to maximum size. But I don't have a way to co-relate the maximum size to number of characters in the string.

import smtplib

smtp = smtplib.SMTP('server.name')
smtp.ehlo()
max_size = smtp.esmtp_features['size']

message_data = <some string data exceeding the max_size>
# Now, how can I get the number of characters in message_data which exceeds the max_szie

Thanks,

Program Questions
  • 440
  • 2
  • 6
  • 20
  • If its a string, `len(message_data)` will give you a character count. If its a unicode string you still need to know its size when its encoded. If that's an issue, you could pick your favorite enocoding and try it. `encoded = message_data.encode('utf-8')`. Then its `len(encoded)`. – tdelaney Apr 07 '16 at 03:40
  • Thanks for such a quick reply, but how would I know how many chars. weigh equal to max size. 2 ** n (n = no. of chars.), would be a good expression to go with? – Program Questions Apr 07 '16 at 03:42
  • I don't understand... `len(message_data)` is the number of characters in the string. `2**n` grows rather quickly so I don't see how that applies! – tdelaney Apr 07 '16 at 03:54

1 Answers1

1

The number of chars in a string is the size in memory in bytes to which you must deduct 37 (python 2.7 / mac os)

import sys

def estimate_chars():
    "size in bytes"
    s = ""
    for idx in range(100):
        print idx * 10, sys.getsizeof(s), len(s)
        s += '1234567890'

estimate_chars()

result: (chars | bytes | len)

0 37 0
10 47 10
20 57 20
30 67 30
40 77 40
50 87 50
...
960 997 960
970 1007 970
980 1017 980
990 1027 990
Reblochon Masque
  • 35,405
  • 10
  • 55
  • 80
  • The number of characters in a string is `len(some_string)`. This seems a round-about way of using an implementation detail to calculate something that already has a better way to do it. – tdelaney Apr 07 '16 at 03:55
  • Yes, indeed, you are entirely correct; I went that way because my first thought was that the OPs text was not in memory and the number of characters needed to be estimated from its known byte size. – Reblochon Masque Apr 07 '16 at 04:00
  • 1
    Thanks ReblochonMasque and tdelaney – Program Questions Apr 07 '16 at 06:14