Page 1 of 1

encode string

Posted: Mon Feb 23, 2015 11:03 am
by Stefano58
Hi everyone,
in python3, how can I convert strings into bytes using the encode procedures?
String is not constant, bytes must be in range 0x00 - 0xFF and only one byte per character.
The procedure , b=str.encode('utf_8') ,does not work as I need.
regards

Re: encode string

Posted: Mon Feb 23, 2015 2:19 pm
by jojopi
In Python 3, the str type is for a sequence of Unicode characters. If what you actually have is a sequence of octet values, then you should already be using a bytes type.

Alternatively, if you can decide how you want to map Unicode characters with code points ranging from 0x0 to 0x10ffff onto bytes ranging from 0x0 to 0xff, with exactly one byte per character, then you can write code to achieve that.

Re: encode string

Posted: Mon Feb 23, 2015 2:39 pm
by paddyg
if your string contains characters that are always mappable to bytes (<256) then you can use
u = u'The quick brown fox etc'
b = bytes(u, 'ascii')
and I should think others are available(but don't know). If none fit your requirements then, as jojopi says, you could easily make your own mapping.

Re: encode string

Posted: Mon Feb 23, 2015 2:54 pm
by Stefano58
Hi,
my string contains characters that are always mappable to bytes (<256), but ('ascii') works only if the characters are mappable to bytes (<128)

Re: encode string

Posted: Mon Feb 23, 2015 4:44 pm
by jojopi
The first 256 code points of Unicode are the same as those of ISO 8859-1, so you could almost get away with .encode('iso-8859-1', 'replace').

Almost, because depending on how your Unicode is normalized, even characters that could be represented using ISO 8859-1 code points, may not be.

Re: encode string

Posted: Mon Feb 23, 2015 5:33 pm
by paddyg
Yes, quite right. Now I look back at what I vaguely remembered rather than the first thing that pops into my head:
bytes(u, 'latin1')
which is the same as iso-8859-1. If that doesn't work you could work your way though the ones available https://docs.python.org/2/library/codec ... -encodings