search - How to find special characters in Linux Vim
2014-07
I want to find special characters in a text file. It is known that the UTF-8 encoded file contains
Chinese characters ,
"-",
"^A"(control-A, which is one of special characters),
numbers,
alphabets, and
some other characters. <- This is what I want to find out.
I'm using Vim in Linux to find other special characters.
I used
/[^^A0-9a-zA-Z-]
to find that, but this will also show Chinese characters. How do filter Chinese characters and show only the other special characters in the file?
The Unicode codepoint range for CJK UNIFIED IDEOGRAPHS is 0x4E00-0x9FFF
; you'd have to exclude that range of characters from your [...]
collection (probably using the \%uNNNN
regular expression atom).
Unfortunately, Vim currently cannot search for ranges larger than 256 characters, so you'd have to combine multiple collections ([...]\|[...]\|[...]\|...
), or choose a different approach.
I'm using VIM do alot of work for me using the macros.
There's alot of text in columns and I want the macro to move between columns effortlessly by pressing the w key to "move to the beginning of the next word"
For example:
DataSourceName string ""
DetailFields []string
DynamicControlBorder boolean empty may be void
EscapeProcessing boolean True
FetchDirection long 1000
FetchSize long 12
Filter string ""
GroupBy string ""
HavingClause string ""
However when I do this, VIM only does this for letters; whenever it encounters a "[" or a " it interprets this as another word, messing up the macro because it now appears that there is an additional column.
Is there any setting I can change to make vim ignore the special characters and treat them just like the letters by skipping over them?
You can change the definition of a word in vim by using
:set iskeyword=<specification>
Remeber to change it back, though, when you have finished with the special usage.
:set iskeyword?
will show the current usage. My terminal responds with
iskeyword=@,48-57,_,192-255
for all the letters a-z and A-Z (@), digits 0 to 9 (ASCII 48-57), underscore and international letters (ASCII 192-255)