Discussion:
Removing space characters ... char(160)? ... char(194)?
Amer Neely
2007-02-13 08:52:32 UTC
Permalink
Hi all.
I'm trying to weed out garbage that comes from copying and pasting stuff
from a web page.
Some of the data has spaces, but a *different* kind of space ... a
char(160) kind ... I think ... I figured this out by copying the space
select ascii(' ');
... where the space was pasted in.
update tmp_AAPT_OnlineAnalyser_ChargeTypeSummary set Service_Number =
replace( Service_Number, char(160), '' );
Query OK, 0 rows affected (0.00 sec)
Rows matched: 313 Changed: 0 Warnings: 0
So it's not finding char(160) in Service_Number. If I try another way to
select ascii( right( Service_Number, 1 ) ) from
tmp_AAPT_OnlineAnalyser_ChargeTypeSummary;
... gives me a big set of results, all 194 ( ie char(194) ). But when I
select char(160), char(194);
+-----------+-----------+
| char(160) | char(194) |
+-----------+-----------+
| <A0> | <C2> |
+-----------+-----------+
... and both the <A0> and <C2> results are in reverse video. The <A0>
*looks* like the stuff I'm getting at the end of fields when I just do a
select from the table in the MySQL command-line client, eg the 1st
0298437600<A0>
( <A0> is reversed ).
Lastly, maybe I shouldn't add this, but when I construct the space
my $space_character = chr(160);
When I do: perl -e "print chr(160);"
I get: á

This is also with Win2K and ActiveState.

I've been following several threads on character sets and collation as
well. I have a database that contains accented data (Canadian French)
that doesn't render correctly in a browser window. I'm going to try
converting it and the tables to utf8 Unicode. Then make sure the
character set for the HTML is also utf8.
my $sql = "update tmp_AAPT_OnlineAnalyser_ChargeTypeSummary set
Service_Number = replace( Service_Number, '" . $space_character . "', '' )";
it works! But the *exact* same Perl code running on a Linux client fails
( doesn't update the field anyway ). It defies logic.
Who knows what's going on?
--
Amer Neely
w: www.softouch.on.ca/
b: www.softouch.on.ca/blog/
Perl | MySQL programming for all data entry forms.
"We make web sites work!"
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Jerry Schwartz
2007-02-13 15:18:41 UTC
Permalink
The character set used by Windows is not the same as UTF-8. That causes
problems when you feed Windows text into an interface that is expecting
UTF-8. I know it drives me crazy.

If you pull up a web page that is in French, and check the page encoding in
your browser, you can try changing it from UTF-8 to Windows or vice versa.
You should see that the accented characters change, so you'll have an
example in front of you.

The browser will typically render the page according to the character set
specified in the HTML header (I think), or it makes a best guess, or it uses
its default. Although this only affects the rendering of the page, so far as
the browser is concerned, it does affect copy and paste. If you copy from a
page that is rendered in the Windows character set, and paste it into an
interface (even another browser window) that is UTF-8, then you'll get
unexpected (garbage) characters.

The same thing applies with editors. Although even Notepad allows saving a
file as UTF-8, I don't know what that accomplishes because it doesn't
actually do any character translation.

To make matters worse, a console window uses (by default) yet another
character set (ANSI).

In any case, what I have been doing with my applications is to translate the
incoming text from Windows to UTF-8. First, though, I check to see if the
text is already UTF-8 by doing a dummy translation from UTF-8 to UTF-8; if
the results are unchanged, then I know that particular text was already
UTF-8 and that it shouldn't be remapped.

You will also run into this problem if you copy and paste from a PDF, I
suspect.

This whole thing gives me a headache. I hope someone else who really
understands this stuff will respond, so we can both learn.

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
-----Original Message-----
Sent: Tuesday, February 13, 2007 3:53 AM
Subject: Re: Removing space characters ... char(160)? ... char(194)?
Hi all.
I'm trying to weed out garbage that comes from copying and
pasting stuff
from a web page.
Some of the data has spaces, but a *different* kind of space ... a
char(160) kind ... I think ... I figured this out by
copying the space
select ascii(' ');
... where the space was pasted in.
update tmp_AAPT_OnlineAnalyser_ChargeTypeSummary set
Service_Number =
replace( Service_Number, char(160), '' );
Query OK, 0 rows affected (0.00 sec)
Rows matched: 313 Changed: 0 Warnings: 0
So it's not finding char(160) in Service_Number. If I try
another way to
select ascii( right( Service_Number, 1 ) ) from
tmp_AAPT_OnlineAnalyser_ChargeTypeSummary;
... gives me a big set of results, all 194 ( ie char(194)
). But when I
select char(160), char(194);
+-----------+-----------+
| char(160) | char(194) |
+-----------+-----------+
| <A0> | <C2> |
+-----------+-----------+
... and both the <A0> and <C2> results are in reverse
video. The <A0>
*looks* like the stuff I'm getting at the end of fields
when I just do a
select from the table in the MySQL command-line client, eg the 1st
0298437600<A0>
( <A0> is reversed ).
Lastly, maybe I shouldn't add this, but when I construct the space
my $space_character = chr(160);
When I do: perl -e "print chr(160);"
I get: á
This is also with Win2K and ActiveState.
I've been following several threads on character sets and
collation as
well. I have a database that contains accented data (Canadian French)
that doesn't render correctly in a browser window. I'm going to try
converting it and the tables to utf8 Unicode. Then make sure the
character set for the HTML is also utf8.
my $sql = "update tmp_AAPT_OnlineAnalyser_ChargeTypeSummary set
Service_Number = replace( Service_Number, '" .
$space_character . "', '' )";
it works! But the *exact* same Perl code running on a Linux
client fails
( doesn't update the field anyway ). It defies logic.
Who knows what's going on?
--
Amer Neely
w: www.softouch.on.ca/
b: www.softouch.on.ca/blog/
Perl | MySQL programming for all data entry forms.
"We make web sites work!"
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-***@m.gmane.org
Loading...