The 29 7-bit ASCII characters that are not legal in XML

base 10 base 16 base 02 symbol description
0 00 0000 0000 NUL Null character
1 01 0000 0001 SOH Start of Heading
2 02 0000 0010 STX Start of Text
3 03 0000 0011 ETX End of Text
4 04 0000 0100 EOT End of Transmission
5 05 0000 0101 ENQ Enquiry
6 06 0000 0110 ACK Acknowledge
7 07 0000 0111 BEL Bell, Alert
8 08 0000 1000 BS Backspace
11 0B 0000 1011 VT Vertical Tabulation
12 0C 0000 1100 FF Form Feed
14 0E 0000 1110 SO Shift Out
15 0F 0000 1111 SI Shift In
16 10 0001 0000 DLE Data Link Escape
17 11 0001 0001 DC1 Device Control One (XON)
18 12 0001 0010 DC2 Device Control Two
19 13 0001 0011 DC3 Device Control Three (XOFF)
20 14 0001 0100 DC4 Device Control Four
21 15 0001 0101 NAK Negative Acknowledge
22 16 0001 0110 SYN Synchronous Idle
23 17 0001 0111 ETB End of Transmission Block
24 18 0001 1000 CAN Cancel
25 19 0001 1001 EM End of medium
26 1A 0001 1010 SUB Substitute
27 1B 0001 1011 ESC Escape
28 1C 0001 1100 FS File Separator
29 1D 0001 1101 GS Group Separator
30 1E 0001 1110 RS Record Separator
31 1F 0001 1111 US Unit Separator

regular expressions to match these characters

Not all of these are tested.

PCRE, Java, or oXygen:
[\u0000-\u0008\u000C\u000E-\u0019]
W3C:
[#x00-#x08#x0C#x0E-#x19]
Emacs:
[^␤[:print:]]     ; where ‘␤’ is typed CTL-J
grep or egrep:
[^[:print:]]