linux - Permanently set `find` regextype

07
2014-07
  • blahdiblah

    GNU find allows the use of different regex formats via the -regextype option, but as far as I can tell I have to type out -regextype <whatever> every single time which is onerous.

    If one wants a different flavor of regex, chances are that one always wants it rather than switching the type used with every command. The ideal would be to set the preferred regex flavor via a preference file (~/.find_profile) or environment variable ($FIND_OPTS), but I haven't found any indication that this is possible.

    How can I set a regextype permanently?

  • Answers
  • blahdiblah

    I haven't done a lot of testing of this, but it seems to work. It takes apart the argument list to find and concatenates the arguments back into another argument list but inserts -regextype posix-awk in front of any -iregex or -regex arguments it finds.

    Manipulating the argument list in the shell this way sometimes fails to handle certain quoting constructs properly, but it should work fine in most cases.

    Just put this function in your ~/.bashrc or the rc file of whatever shell you run.

    find ()
    {
        args=
        for arg in $*
        do
            case $arg in
                -ireges|-regex)
                    args="$args -regextype posix-awk $arg"
                    ;;
                *)
                    args="$args $arg"
                    ;;
            esac
        done
        set -f
        command find $args
        set +f
    }
    
  • blahdiblah

    Using an alias for find allows for setting regextype automatically, but limits the other available syntax:

    alias find='find -regextype <whatever>'
    

    The problem is that many optional arguments to find now can't be used, because -regextype must be part of the find command's [expression] block:

    find [-H] [-L] [-P] [-D debugopts] [-Olevel] [path...] [expression]
    

    With the above alias, trying to use the symbolic link, debug and optimization options gives the error: find: unknown predicate '-<whatever>' as does specifying a path: find: paths must precede expression: <wherever>. All those options are optional, so find still works, but it's more limited.

  • blahdiblah

    Another option is a wrapper script around find a la this perl script made explicitly for setting -regextype.

    The drawback of this solution is that it's more involved, and somewhat fragile. The linked script doesn't handle -D, for example.


  • Related Question

    command line - How to Combine find and grep for a complex search? ( GNU/linux, find, grep )
  • Petruza

    I'm trying to do a text search in some files that share a similar directory structure, but are not in the same directory tree, in GNU/Linux.

    I have a web server with many sites that share the same tree structure (Code Igniter MVC PHP framework), so I want to search in a specific directory down the tree for each site, example:

    /srv/www/*/htdocs/system/application/

    Where * is the site name. And from those application directories, I want to search all the tree down to its leaves, for an *.php file that has some text pattern inside, let's say "debug(", no regular expression needed.

    I know how to use find and grep but I'm not good at combining them.

    How would I do this?
    Thanks in advance!


  • Related Answers
  • nagul

    Try

    find /srv/www/*/htdocs/system/application/ -name "*.php" -exec grep "debug (" {} \; -print
    

    This should recursively search the folders under application for files with .php extension and pass them to grep.

    An optimization on this would be to execute:

    find /srv/www/*/htdocs/system/application/ -name "*.php" -print0 | xargs -0 grep -H "debug ("
    

    This uses xargs to pass all the .php files output by find as arguments to a single grep command e.g grep "debug (" file1 file2 file3. The -print0 option of find and -0 option of xargs ensure the spaces in file and directory names are correctly handled. The -H option passed to grep ensures that the filename is printed in all situations. (By default, grep prints the filename only when multiple arguments are passed in.)

    From man xargs:

       -0     Input items are terminated by a null character instead of by whitespace, and the quotes and  backslash
              are  not  special  (every  character  is  taken literally).  Disables the end of file string, which is
              treated like any other argument.  Useful when input items might contain white space, quote  marks,  or
              backslashes.  The GNU find -print0 option produces input suitable for this mode.
    
  • Daniel Andersson

    find is not even needed for this example, one can use grep directly (at least GNU grep):

    grep -RH --include='*.php' "debug (" /srv/www/*/htdocs/system/application/
    

    and we are down to a single process fork.