tony power
Posts: 33
Joined: Tue Mar 08, 2016 9:07 pm

Comparing text variable including emoji utf-8 code?

Thu Aug 18, 2016 4:07 pm

Hi

I am new to Python. I have python telegram bot on my raspberry pi. The bot displays markup keyboard buttons these buttons contains emoji and text and some of them only emoji.
Image
How can I compare text in Python which includes emoji Bytes (utf-8) with text entered by user

python telegram bot
https://github.com/eternnoir/pyTelegramBotAPI

emojii taken from
http://apps.timwhitlock.info/emoji/tables/unicode

here is snippet of my code

Code: Select all

### \xF0\x9F\x93\xB7 is camera emoji
str = 'CAPTURE \xF0\x9F\x93\xB7'

# first comparsion try 
# message.text entered by user
if(message.text == u'CAPTURE \xF0\x9F\x93\xB7' )
### do something

# second comparsion try
if(message.text == 'CAPTURE \xF0\x9F\x93\xB7' )
### do something
comparsion does not work it gives warning about unicode comparsion

### error message
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

scotty101
Posts: 3958
Joined: Fri Jun 08, 2012 6:03 pm

Re: Comparing text variable including emoji utf-8 code?

Thu Aug 18, 2016 4:17 pm

This code worked for me

Code: Select all

### \xF0\x9F\x93\xB7 is camera emoji
message = 'CAPTURE \xF0\x9F\x93\xB7'

# first comparsion try 
# message.text entered by user
if(message == u'CAPTURE \xF0\x9F\x93\xB7' ):
    print(True)
### do something

# second comparsion try
if(message == 'CAPTURE \xF0\x9F\x93\xB7' ):
    print(True)
### do something
Are you using Python3?
Electronic and Computer Engineer
Pi Interests: Home Automation, IOT, Python and Tkinter

tony power
Posts: 33
Joined: Tue Mar 08, 2016 9:07 pm

Re: Comparing text variable including emoji utf-8 code?

Thu Aug 18, 2016 4:19 pm

scotty101 wrote:This code worked for me

Code: Select all

### \xF0\x9F\x93\xB7 is camera emoji
message = 'CAPTURE \xF0\x9F\x93\xB7'

# first comparsion try 
# message.text entered by user
if(message == u'CAPTURE \xF0\x9F\x93\xB7' ):
    print(True)
### do something

# second comparsion try
if(message == 'CAPTURE \xF0\x9F\x93\xB7' ):
    print(True)
### do something
Are you using Python3?
I am using Python2.7

scotty101
Posts: 3958
Joined: Fri Jun 08, 2012 6:03 pm

Re: Comparing text variable including emoji utf-8 code?

Thu Aug 18, 2016 4:27 pm

That is probably your issue. Unicode support for Python 2 wasn't great.

Have a read of this excellent presentation
http://farmdev.com/talks/unicode/

Mantra of it is Decode to Unicode early, Use Unicode everywhere inside, Decode late when strings leave your control.
Electronic and Computer Engineer
Pi Interests: Home Automation, IOT, Python and Tkinter

scotty101
Posts: 3958
Joined: Fri Jun 08, 2012 6:03 pm

Re: Comparing text variable including emoji utf-8 code?

Thu Aug 18, 2016 4:30 pm

Can you post the result of

Code: Select all

type(message.text)
That will say what type of string message.text is encoded as.
Electronic and Computer Engineer
Pi Interests: Home Automation, IOT, Python and Tkinter

tony power
Posts: 33
Joined: Tue Mar 08, 2016 9:07 pm

Re: Comparing text variable including emoji utf-8 code?

Thu Aug 18, 2016 4:52 pm

scotty101 wrote:Can you post the result of

Code: Select all

type(message.text)
That will say what type of string message.text is encoded as.

Code: Select all

<type 'unicode'>
the library should work on python3 too according to the description I reinstalled it from git not using pip but the shapes of buttons messed up and not showing emoji and crashing

tony power
Posts: 33
Joined: Tue Mar 08, 2016 9:07 pm

Re: Comparing text variable including emoji utf-8 code?

Thu Aug 18, 2016 6:53 pm

I got it working now

Code: Select all

 ## this code works now on python2.7        
# -*- coding: utf-8 -*-
cmd = message.text
        if (cmd.encode('utf-8') == 'CAPTURE \xF0\x9F\x93\xB7' ):
            ## do something

User avatar
jojopi
Posts: 3268
Joined: Tue Oct 11, 2011 8:38 pm

Re: Comparing text variable including emoji utf-8 code?

Thu Aug 18, 2016 8:29 pm

tony power wrote:if (cmd.encode('utf-8') == 'CAPTURE \xF0\x9F\x93\xB7' ):
To make this work also in Python3, the UTF-8 encoded literal should be a bytes, not a string:

Code: Select all

if (cmd.encode('utf-8') == b'CAPTURE \xF0\x9F\x93\xB7'):
Better still, compare two strings and specify the emoji by code point instead of by UTF-8 encoding:

Code: Select all

if (cmd == u'CAPTURE \U0001f4f7'):
You can also include Unicode characters in strings literally or by name u'CAPTURE \N{camera}', but this does not appear to work for emoji in Python2.

Return to “Python”