Hopefully this has been the last time I had such trouble with encodings. I think I finally managed to understand what caused all the troubles and how to avoid them in the future.
- dpkg-reconfigure locales (mine are 75-78, notably de_DE.utf-8)
- export LANG=de_DE.utf-8
- “file” actually checks iso/utf-8 files correctly (otherwise use hexdumps and encoding manpages!)
- Read in java files with encoding set correctly! (use iconv to convert strange input files to something java understands)
- Write out java files with encoding set correctly, yes force it! Between 4 and 5 you can have full control over conversion of encodings
- Yes there are java io classes that let you set encoding
- Configure database driver correctly. jdbc.encoding=utf-8 (it does not matter database doesn’t know utf-8)
- Use the same encoding whereever you can!!!
- Configure your shell+terminalprogram to display encoding correctly (see 1)
- Remember: utf-8 uses multibyte characters that’s the strange button in your terminal program
- Don’t forget to configure log4j to write utf-8 encoded files (really helps with debugging)
- Set your eclipse (or netbeans or what) to utf-8 encoding
- Configure your ant file so that your source-code is read as utf-8 in case you have utf-8 Strings in your classes …
- Don’t forget to write your encoding inside your xml header in your xml files.
- If you don’ like utf-8 substitute with your favorite encoding.
- MOST IMPORTANT: use the same encoding everywhere for your stuff. If something fails later, you know it’s not your stuff.
- Sometimes check your stuff on a different machine without any locales set just so you know you managed to program your java independent from your local settings!
- Aehm: without thorough tests you are doomed! And listen when your
tests scream: ERROR! - Believe me when I say: ISO is not enough!
That’s the stuff I learned from all my troubles with encodings. Any questions? I think I may be able to answer a few at least where it concerns java and shell configuration 😉