Stefano58
Posts: 11
Joined: Tue Feb 03, 2015 11:37 am

encode string

Mon Feb 23, 2015 11:03 am

Hi everyone,
in python3, how can I convert strings into bytes using the encode procedures?
String is not constant, bytes must be in range 0x00 - 0xFF and only one byte per character.
The procedure , b=str.encode('utf_8') ,does not work as I need.
regards

User avatar
jojopi
Posts: 3274
Joined: Tue Oct 11, 2011 8:38 pm

Re: encode string

Mon Feb 23, 2015 2:19 pm

In Python 3, the str type is for a sequence of Unicode characters. If what you actually have is a sequence of octet values, then you should already be using a bytes type.

Alternatively, if you can decide how you want to map Unicode characters with code points ranging from 0x0 to 0x10ffff onto bytes ranging from 0x0 to 0xff, with exactly one byte per character, then you can write code to achieve that.

User avatar
paddyg
Posts: 2555
Joined: Sat Jan 28, 2012 11:57 am
Location: UK

Re: encode string

Mon Feb 23, 2015 2:39 pm

if your string contains characters that are always mappable to bytes (<256) then you can use
u = u'The quick brown fox etc'
b = bytes(u, 'ascii')
and I should think others are available(but don't know). If none fit your requirements then, as jojopi says, you could easily make your own mapping.
also https://groups.google.com/forum/?hl=en-GB&fromgroups=#!forum/pi3d

Stefano58
Posts: 11
Joined: Tue Feb 03, 2015 11:37 am

Re: encode string

Mon Feb 23, 2015 2:54 pm

Hi,
my string contains characters that are always mappable to bytes (<256), but ('ascii') works only if the characters are mappable to bytes (<128)

User avatar
jojopi
Posts: 3274
Joined: Tue Oct 11, 2011 8:38 pm

Re: encode string

Mon Feb 23, 2015 4:44 pm

The first 256 code points of Unicode are the same as those of ISO 8859-1, so you could almost get away with .encode('iso-8859-1', 'replace').

Almost, because depending on how your Unicode is normalized, even characters that could be represented using ISO 8859-1 code points, may not be.

User avatar
paddyg
Posts: 2555
Joined: Sat Jan 28, 2012 11:57 am
Location: UK

Re: encode string

Mon Feb 23, 2015 5:33 pm

Yes, quite right. Now I look back at what I vaguely remembered rather than the first thing that pops into my head:
bytes(u, 'latin1')
which is the same as iso-8859-1. If that doesn't work you could work your way though the ones available https://docs.python.org/2/library/codec ... -encodings
also https://groups.google.com/forum/?hl=en-GB&fromgroups=#!forum/pi3d

Return to “Python”