Sanitizing WordPress UTF-8 – or Howto get rid of mixed Latin1 and UTF8 mysql exports

Actually my attention comes to some weird characters in my wordpress blog. Such as ü or ö as represantants of ä and ö. So i had a look into my mysql-db and saw that it was still on latin1. On my way to the clearance i got over that explanation. But still all howtos doesn’t work out for me. There are others allready had a look into it like fischerlander or Haidong’s Blog.

The Problem is more complex as it may look in the first place. WordPress or PHPBB put the post in the database as they receive it from the Browser. While you can set your Browser on a special Characterset (iso-8859-1) – rather than auto-recognition – and ignore so the servers needs the result are this awkward entries. And furthermore i had the problem that the users had used Windows-1250, UTF-16 and iso-8859-1 mixed.

Startin’ to solve this problem i wrote first this little perl script to generate the awkward chars:

#!/usr/bin/perl

  use strict;
  use Encode qw/encode decode/;
  my @list = ( 0x00 .. 0xFFFF );
#  my $letter = "ä";

open CSV, ">lat_new.csv";
  foreach my $letter (@list){
  $letter = chr($letter);
  print CSV $letter.";;";
  my $utf8 = encode("UTF-8", $letter);
  print CSV $utf8.";;";
  my $utf16 = encode("UTF-16", $letter);
  print CSV $utf16."\n";
  }
  close CSV;

As this results in a big list and 99% of it is useless i just tried to identify the chars from the several codings that leads to my personal abuse. This results in the following little list of latin-code-combination and UTF-8 representants. It is very likely, if you have chinese or russian characters, you will have a slightly other list :-)

ö;ö;ß;ß;ü;ü;ö;ö;ü;ü;–;–;ö;ö;ü;ü; ; ;¡;¡;¢
;¢;£;£;¤;¤;Â¥;¥;¦;¦;§;§;¨;¨;©;©;ª;ª;«;«;¬;¬;­;­;
®;®;¯;¯;°;°;±;±;²;²;³;³;´;´;µ;µ;¶;¶;·;·;¸;¸;¹;¹;º;º;»;»;
¼;¼;½;½;¾;¾;¿;¿;À;À;Á;Á;Â;Â;Ã;Ã;Ä;Ä;Ã…;Å;Æ;
Æ;Ç;Ç;È;È;É;
É;Ê;Ê;Ë;Ë;ÃŒ;Ì;Í;Í;ÃŽ;Î;Ï;Ï;Ð;Ð;Ñ;Ñ;Ã’;Ò;Ó;Ó;Ô;Ô;Õ;Õ;Ö;Ö;×;×;Ø;Ø;Ù;Ù;Ú;Ú;Û;Û;Ü;Ü;Ý;Ý;Þ;Þ;ß;ß;à ;à;á;á;â;â;
ã;ã;ä;ä;Ã¥;å;æ;æ;ç;ç;è;è;é;é;ê;ê;ë;ë;ì;ì;í;í;î;î;ï;ï;ð;
ð;ñ;ñ;ò;ò;ó;ó;ô;ô;õ;õ;ö;ö;÷;÷;ø;ø;ù;ù;ú;ú;û;û;
ü;ü;ý;ý;þ;þ;ÿ;ÿ;

I used a little perl-script to sanitize my mysql-export and to dump it into a new file. For security purposes i created a new database in utf8.

#!/usr/bin/perl

use strict;

open CSV, "<my_csv.csv"; #List from above
$li =~ s/\n//;
close CSV;

#  use strict;
#  use Encode qw/encode decode/;
#  my @list = ( 0x00 .. 0xFFFE );
#  my $letter = "ä";
#  my %lat_utf;
#open CSV, ">lat_new.csv";
#  foreach my $letter (@list){
#  $letter = chr($letter);
#  print CSV $letter.";;";
#  my $utf8 = encode("UTF-8", $letter);
#  $lat_utf{$utf8} = $letter;
#  print CSV $utf8.";;";
#  my $utf16 = encode("UTF-16", $letter);
#  print CSV $utf16."\n";
#  }
#  close CSV;

my %lat_utf = split(/;/,$li);
#print %lat_utf;
open FILE, "<mydatabase.csv";
my @lines = <FILE>;
close FILE;

foreach my $key (keys %lat_utf){
#   print $key. "\n";
#@lines = map{ s/$key/$lat_utf{$key}/g } (@lines);
    my @nel;
    foreach my $lin (@lines){
#       print $lin."\n";
        if ( $lin =~ /$key/g ) {
            print $lin;
        }
        $lin =~ s/$key/$lat_utf{$key}/g;
#       print $lin."\n";
        push @nel, $lin;
    }
    @lines = @nel;
}

open FILE, ">wdrede_kjr-UTF-8_fixed_new.sql" or die $!;

foreach my $lin (@lines){
  print FILE $lin;
}

close FILE;

This results in my special case to a more or less clean wordpress-blog. BUT i realy would think that wordpress and phpbb should check the browser settings and in what codepage the user had put in the comment or content. Because – otherwise it wouldn’t help much to have this converted over a long time…

Posted in Allgemein, Joes Tests | Tagged , , , , , | 10 Comments

Number systems: when 1 is greater than 1

As i mentioned before you can go for an other number system (http://en.wikipedia.org/wiki/Field_%28mathematics%29) than the standard decimal. The Fibunacci-System as i named it creates a line of numbers which represants the possible maximum at the given place. An Example

Fibunacci generated Places 1 1 2 3
Valid Range on given place 0 – 1 0 – 1 0 – 2 0 – max of place
Fibunacci number 1 0 2
Decimal 5 1 0 4 0
Fibunacci number 1 1 1
Decimal 6 1 0 2 3

The Rule for calculate a fib-number  to a decimal is Sum over all places(placenumber * number)

As you see the Fibunacce 10 is greater than fib 1 – recalculated to decimal it means 1 > 1. Looks a bit awkwards. How can one apple be bigger than one apple  - maybe he is bigger – :-)

This system creates every decimal number more than once. Some of the people i talked about it found this realy not easy to understand.

But i think we are used to our time and date number system which is indeed  a number system persisting not on a sequence rather than a mixed set (which kids really have their issues to learn):

lesser than seconds seconds minutes hours days year
10 exp -n 60 60 24 356,25 10 exp n

Not a step closer to my prime problem but fun it is still :-D

Posted in Allgemein, Deep Thoughts, Joe Writes, Joes Tests | Leave a comment

Prime numbers: Using other number systems

Basicaly our normal number system persists on a very basic sequence:

F(n) = F(n-1) + 1 with a fix basis (Binary, Decimal, Hexadecimal)

But it came to my mind that basicaly prime numbers does not match that criteria very well to identify the nature behind them. As many had tried a sequence in that number system can’t be easily found.

We are used to our Clocks - They have a number system which persists on a 10/60/60/24 system. To calculate in this kind of systems is quite a bit harder as we are used to it. The idea to get a better grip on primes is now to change our usual kind of number system:

For example Fibunacci-Sequence: to a basis of one number
creates: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 while every number is a change of basis

Dezimal/
Fibunacci
1 1 2 3 5
1 1
2 1 1
2 0 0 1
3 1 0 1
4 1 1 1
4 0 0 2
5 1 0 2
6 1 1 2
3 0 0 0 1

And so on.  As you see the system does not delivers an uniqueness of an normal number system does – the question i like to lock at is: How does a system like above reacts on prime numbers? – because i have the feeling as if the primes have some unique charaterstics in those kind of number systems. As i’m not a full-time mathematician, but i am really keen to discuss this here or via email: jrspam-prime@web-d….de because it looks to me as if nobody has ever thought of changing the number systems on behalf of the prime number problem. Follow up will come…

Posted in Allgemein, Deep Thoughts, Joes Tests, Just Thoughts | Tagged , , , , | Leave a comment

Iphone 3.1.2 and Pwnagetool – Network: not Active

Actually i’m not sure at all but this is what i’ve seen so far:

Today i was back from my vacation trip and i thought o’right going to upgrade to Firmware 3.1.2 and have a look into it. Right after seen it is going quit well i adjusted the new pwnagetool and everything went fine – besides 2-3h later i saw i don’t get a carrier signal. I had a look on the iphone of my wife and see. Everything is fine. Next step was to go back to the regular firmware and see – iphone is back to be an iphone and not just an very expensive version of an itouch. As i’m not really sure where this behaviour:

Network : Not active

in settings come from from my unawarness or from an new ability to see if an iphone is pwned… i will wait as i have not time for plain’ around and the phone has to work. The rest is more or less nice but not necessary.

Nochmal in Kurz auf Deutsch: Habe heute das iphone mal wieder mit dem pwnagetool bearbeitet. Leider hatte ich im anschluss kein Telefon mehr. In den Einstellungen stand: unter

Netzwerk : nicht Aktiv

obs vom jailbreaken kommt kann ich nur ahnen – jedenfalls ging es wieder nach dem zurücksetzen auf die normale firmware.

Posted in Allgemein, Joes Tests | Leave a comment

XML & XSL for use in Websites – starcraft2.com

Today – i was on starcraft2.com and i had a look into the sources (just for a look) and was nearly astonished as they use in broad XML and XSL for Browsers to render the page. The effect is so amazing… i like the way you can tell the browser to include for example navigation and just put in the content. How this is indexed by the searchengines i have to find out soon. But it is really good style blizzard shows here…

Update:

It seems that google indicates the sites of starcraft2.com – that leads to an interesting conclusion:

the search-robot uses an engine that interprets the loaded sites and build the index on this interpretet html. Which by the way means u don’t know if google has some kind of testing mechanism where it looks for seo-builded sites against that what a user will see. It’s still interesting. i will test this soon…

Posted in Allgemein, Joes Daylies, Just Thoughts | Leave a comment

Ulas Karre wieder da

Wie ich es gestern geschrieben hatte- wer will die Mühle schon haben?
http://www.focus.de/politik/deutschland/alicante-diebe-wollen-schmidts-auto-nicht-mehr_aid_421482.html

Posted in Allgemein, Thoughts about News | Tagged | Leave a comment

Öl ins Sommerloch- Twitter, Ula, Autobahn

So jetzt mach ich auch mal einen Lückenfüller: Es ist Sommer und es gibt ein Sommerloch. Das spannenste was gerade bei Twitter rauf und runter diskutiert wird ist Michael Jacksons tot. Der Iran ist in der digitalen 2.0 Versenkung verschwunden. Vielleicht auch weil keine Nachrichten aktuell nach draußen dringen. Das Ula sich ihren Dienstwagen hat klauen lassen – ist ja wohl dermaßen Sommerlochig, dass ich auch nicht anders weiß, als darüber zu Berichten. Vor allem frage ich mich wozu die irgendwer klaut? Die Mühle ist ja wohl sowas von unverkäuflich… In dem Sinne, wo war eigentlich der viel gescholtene Fahrer? Eigentlich war es doch dieser Trottel der sich die Mühle hat klauen lassen, oder nicht? Ich sach nur Fahrdienst…

Apropos Fahrdienst, die Autobahnen auf dem Weg zur Arbeit sind leer und die Analysten haben die Deutsche Bank trotz eines Milliardengewinnes wegen Risiko abgewertet. Die Börse spielt also, wie immer verrückt, und der Konsum-Klima-Index steigt. Immerhin hat ARD ihr Online-Auftritt aktualisiert (ich find’s gut). Es geht uns also gut und wir genießen unsere wohlverdienten Ferien?

Facebook hat mich mit Mafiawars gefangen genommen – und fast schon wieder ausgespuckt. Twittern und Facebook-Status aktualisiere ich mit dem gejailbreakten Iphone / qtweet (Nette App. die Facebook und Twitter aktualisiert).

Die Arbeit ist durch Abwesenheit durch Kollegen geprägt (siehe leere Autobahn). Der unvermeidliche Stress, der sich durch die Beendigung der Sommerferien ankündigt, ist schon jetzt spürbar.

Die Leere wird gefüllt durch ein paar Ebay-Auktionen von mir und das unvermeidliche Ausmisten des “Ebay-Regals”, dass schon seit vor dem letzten Umzug staub statt Geld sammelt. Die Feststellung, dass die teure UMTS-Karte von damals heute im besten Fall noch 10 EUR bringt ist ernüchternd – aber egal.

Immerhin steigen die Aktien wieder – das Aussitzen der Krise hat sich zwar noch nicht gelohnt – aber mein virtueller Verlust wird jeden Tag kleiner. Die Haussuche und Grundstücks-suche in meiner Wohngegend ist quasi unmöglich und man muss Millionär (siehe steigende Aktien) sein um was gescheites zu bekommen. Also dauert das auch an.

Immerhin sorgt das billige Öl und die leere Autobahn dafür das ich mit meinem Bimmer (amerikanisch für BMW ) in der Regel mit 160-180 die leere linke Spur belegen kann – sofern nicht mal wieder ein Audi-Fahrer links die Spur blockiert (*zwinker – gell Makkus*).

Immerhin ist Cabrio-Wetter – leider habe ich keinen. Aber vielleicht gönne ich mir noch mal ein Motoradersatz – die Piaggio MP3 LT darf man mit Autoführerschein fahren… find ich geil und will ich unbedingt mal probe-fahren.

Sodala – so ich höre jetzt noch ein bisschen Last.fm und geniesse den ruhigen, schönen deutschen Ferien-Abend.

Posted in Allgemein, Joes Daylies, Thoughts about News | Tagged , , , , , , , | Leave a comment