Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Page 1 of 2 1 2 >
Topic Options
#367832 - 11/11/2016 23:07 Regex help!
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31607
Loc: Seattle, WA
I'm not seeing a precise answer on StackOverflow, or in any of my google results...

I'm trying to write a regex to stuff into a GIT BLAME command. It goes like this:

I know:
- The FILENAME I want to search.
- The FUNCTIONNAME I want to retrieve from that file.

I want GIT BLAME on every line of that function, giving me the email address of every person who worked on every line of that function, from its declaration at the top, down through every line of the function down to its closing curlybrace.

The syntax would be:
GIT BLAME -e -L*RegexToGetFUNCTIONNAME* FILENAME

I have half an example which is based on this blog entry and it goes like this:

Code:
git blame -e -L/^myFunctionNameIamLookingFor/,/^}/ C:\folder\folder\folder\folder\filename.cs


This *works*, but, it has an achilles heel. It stops at the first closing curlybrace it finds. Sometimes that's fine, it returns the entire function and everything is hunky dory. But if the interior of the function contains some curly braces (for instance if there is a loop or an IF statement inside the function), then the regex halts halfway through the function as soon as it hits a closing curlybrace.

I have seen some items on StackOverflow which talk about using regex nesting to get the correct levels within nested braces, but what I'm missing is the glue to get the entire line of the function name itself, everything up to its first curly brace, and then everything up to the final curly brace that closes that particular function.

Regex is still pretty baffling to me, alas. This seems like a pretty advanced use of it.

Anyone good with Regex know how to handle this situation?
_________________________
Tony Fabris

Top
#367833 - 11/11/2016 23:15 Re: Regex help! [Re: tfabris]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31607
Loc: Seattle, WA
By the way, I could avoid the regex if I could just get the syntax correct where I just feed the function name to GIT BLAME.

Supposedly according to the dox on git Blame I can just say -L :functionname or perhaps -L :functionnname:filename and it'll work. It doesn't.
_________________________
Tony Fabris

Top
#367834 - 12/11/2016 01:52 Re: Regex help! [Re: tfabris]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14503
Loc: Canada
Originally Posted By: tfabris
It stops at the first closing curlybrace it finds.


Does it really? The caret (^) in front of the curly brace means "find a clsing curly brace that is the first character of a line".

So unless the function has other non-indented closing curly braces, it should work as intended.

Doesn't it?

Top
#367835 - 12/11/2016 01:57 Re: Regex help! [Re: tfabris]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14503
Loc: Canada
I don't have a git repo handy here at the moment to test with, but if that indeed isn't working, then try this instead:

Replace the ^} expression with ^[}]

Top
#367836 - 12/11/2016 01:58 Re: Regex help! [Re: mlord]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31607
Loc: Seattle, WA
Every curly brace I encounter in this procedure, whether it's the one I want or the ones I don't, will be indented by some amount. The functions I'm looking for will be, at the very least, indented one or two levels already.

In my tests running the actual Git command, it didn't seem to care about the indent level. It found my function name just fine, and then gave me everything up to the next closing curlybrace, no matter where it lay.

In any case, I'd like the final solution to be agnostic to indentation levels. Although our coding style practices are decent here, I can't guarantee formatting.

https://twitter.com/udellgames/status/788690145822306304
_________________________
Tony Fabris

Top
#367837 - 12/11/2016 01:59 Re: Regex help! [Re: tfabris]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31607
Loc: Seattle, WA
Thanks, I will try your suggestion on Monday.
_________________________
Tony Fabris

Top
#367838 - 12/11/2016 11:57 Re: Regex help! [Re: tfabris]
peter
carpal tunnel

Registered: 13/07/2000
Posts: 4181
Loc: Cambridge, England
You can't match a C/C++/C# function body with a regex, because a regex can't count the brace pairs. Your best bet (if you can't get -L:funcname to work -- perhaps your git is too old?) might be to try and end the blame by matching the start of the next function instead. Is there a mandatory doc-comment or something you could look for?

Peter

Top
#367839 - 12/11/2016 13:26 Re: Regex help! [Re: peter]
Roger
carpal tunnel

Registered: 18/01/2000
Posts: 5685
Loc: London, UK
Originally Posted By: peter
a regex can't count the brace pairs


Absolutely true, but if git's using PCRE, those aren't regexes in the technical sense, and here's some Perl that uses a "regex" to match the brace pairs: http://stackoverflow.com/a/133771/8446
_________________________
-- roger

Top
#367840 - 14/11/2016 19:45 Re: Regex help! [Re: mlord]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31607
Loc: Seattle, WA
Originally Posted By: mlord
I don't have a git repo handy here at the moment to test with, but if that indeed isn't working, then try this instead:

Replace the ^} expression with ^[}]


This also didn't work: Regex returned everything up to and including the first closing curlybrace it finds within the interior of the function.
_________________________
Tony Fabris

Top
#367841 - 14/11/2016 19:58 Re: Regex help! [Re: Roger]
canuckInOR
carpal tunnel

Registered: 13/02/2002
Posts: 3212
Loc: Portland, OR
Originally Posted By: Roger
Originally Posted By: peter
a regex can't count the brace pairs


Absolutely true, but if git's using PCRE, those aren't regexes in the technical sense, and here's some Perl that uses a "regex" to match the brace pairs: http://stackoverflow.com/a/133771/8446

Git's use of PCRE is spotty. git-grep uses it (with the -P switch), but git-blame has no equivalent, and does not use PCRE. The simple test is to use an embedded modifier:
Code:
$ git grep '(?i)pauseexec' ../../omanalyze.cpp
$ git grep -P '(?i)pauseexec' ../../omanalyze.cpp
../../omanalyze.cpp:int pauseExec(int value)
../../omanalyze.cpp:    exit(pauseExec(1));
../../omanalyze.cpp:    exit(pauseExec(1));
../../omanalyze.cpp:            return pauseExec(0);
../../omanalyze.cpp:    return pauseExec ( 0 );
$ git blame -L '/pauseExec/,/}/' ../../omanalyze.cpp
8a714b87 (canuck 2015-05-15 16:10:54 +0000 37) int pauseExec(int value)
8a714b87 (canuck 2015-05-15 16:10:54 +0000 38) {
8a714b87 (canuck 2015-05-15 16:10:54 +0000 39)     if (pauseAtExit) {
8a714b87 (canuck 2015-05-15 16:10:54 +0000 40)         printf("hit return to exit...");
8a714b87 (canuck 2015-05-15 16:10:54 +0000 41)         char buf[ 128 ];
8a714b87 (canuck 2015-05-15 16:10:54 +0000 42)         if (fgets(buf, sizeof(buf) - 1, stdin));
8a714b87 (canuck 2015-05-15 16:10:54 +0000 43)     }
$ git blame -L '/(?i)pauseexec/,/}/' ../../omanalyze.cpp
fatal: -L parameter '(?i)pauseexec' starting at line 1: No match


Part of the problem that Tony's facing is that -L takes 2 parameters -- the start of the range, and the end of the range. Those are independent regex, so he can't use back references to match indent (/^(\s*)funcName/,/^\1}/ won't work), nor would any fancy PCRE, even if it were available.

On first look, the -L :<funcname> option seems like it's the answer:
Code:
$ git blame -L :'pauseExec' ../../omanalyze.cpp
8a714b87 (canuck 2015-05-15 16:10:54 +0000 37) int pauseExec(int value)
8a714b87 (canuck 2015-05-15 16:10:54 +0000 38) {
8a714b87 (canuck 2015-05-15 16:10:54 +0000 39)     if (pauseAtExit) {
8a714b87 (canuck 2015-05-15 16:10:54 +0000 40)         printf("hit return to exit...");
8a714b87 (canuck 2015-05-15 16:10:54 +0000 41)         char buf[ 128 ];
8a714b87 (canuck 2015-05-15 16:10:54 +0000 42)         if (fgets(buf, sizeof(buf) - 1, stdin));
8a714b87 (canuck 2015-05-15 16:10:54 +0000 43)     }
[...]
8a714b87 (canuck 2015-05-15 16:10:54 +0000 46) }
8a714b87 (canuck 2015-05-15 16:10:54 +0000 47) 


However, -L :<funcname> doesn't understand indented definitions. For example, suppose the following code:
Code:
class fooImpl
{
    [...]

    static bool limitMemoryFootprint()
    {
        switch(m_platform) {
        // [...]
        }
    }

    [...]
}


Then looking for the limitMemoryFootprint() method fails:
Code:
$ git blame -L :'limitMemoryFootprint' test.cc
fatal: -L parameter 'limitMemoryFootprint' starting at line 1: no match


Remove the indentation, and it works just fine.

I'd say that Tony's best bet is to figure out the start and end line numbers of the function outside of git, and use those rather than regex parameters to the -L option (caution: line-noise ahead):
Code:
$ perl -ne'if(/pauseExec/){$a=$.;tr/{/{/||next}if($a){$b+=tr/{//;$b-=tr/}//;$b||exit print "$a,$.\n"}' ../../omanalyze.cpp 
37,46
$ git blame -L 37,46 ../../omanalyze.cpp
8a714b87 (canuck 2015-05-15 16:10:54 +0000 37) int pauseExec(int value)
8a714b87 (canuck 2015-05-15 16:10:54 +0000 38) {
8a714b87 (canuck 2015-05-15 16:10:54 +0000 39)     if (pauseAtExit) {
[...]
8a714b87 (canuck 2015-05-15 16:10:54 +0000 46) }



Top
#367842 - 14/11/2016 19:58 Re: Regex help! [Re: peter]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31607
Loc: Seattle, WA
Originally Posted By: peter
(if you can't get -L:funcname to work -- perhaps your git is too old?)

>git --version
git version 2.10.0.windows.1
I'm pretty sure that's within the last few weeks at least.

Can anyone help me figure out the precise syntax by which this feature is supposed to work? Because that's what I really want: Git should be able to figure out for me the contents of the function if I name the function for it. Their own docs say so, I just can't get it to work.

If I try to follow the instructions here.... https://git-scm.com/docs/git-blame
It says:
Quote:
If “:<funcname>” is given in place of <start> and <end>, it is a regular expression that denotes the range from the first funcname line that matches <funcname>, up to the next funcname line. “:<funcname>” searches from the end of the previous -L range, if any, otherwise from the start of file. “^:<funcname>” searches from the start of file.

That looks like exactly what I want. So I try it:
Code:
git blame -e -L:JustTheNameOfMyFunction C:\folder\folder\folder\filename.cs


No joy, it gives me the following error:
Code:
fatal: -L parameter 'JustTheNameOfMyFunction' starting at line 1: no match

Which is not true. There is a perfectly good function of that name inside the file.

Googling for the correct syntax, some of the responses say that it should be L:Funcname:filename instead of L:Funcame filename. But that DEFINITELY does not work because when I try it, Git doesn't even try to run, it just spits its Usage statement back at me again as if I hadn't supplied the correct parameters.
_________________________
Tony Fabris

Top
#367843 - 14/11/2016 20:02 Re: Regex help! [Re: canuckInOR]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31607
Loc: Seattle, WA
Originally Posted By: canuckInOR

However, -L :<funcname> doesn't understand indented definitions. (...) Remove the indentation, and it works just fine.



AHA!!!!! That must be the reason. Of course all of my function definitions are indented because they are all in classes. So Git is just PLAIN OLD STUPID with regard to this then?
_________________________
Tony Fabris

Top
#367844 - 14/11/2016 20:04 Re: Regex help! [Re: tfabris]
canuckInOR
carpal tunnel

Registered: 13/02/2002
Posts: 3212
Loc: Portland, OR
Originally Posted By: tfabris
So I try it:
Code:
git blame -e -L:JustTheNameOfMyFunction C:\folder\folder\folder\filename.cs


No joy, it gives me the following error:
Code:
fatal: -L parameter 'JustTheNameOfMyFunction' starting at line 1: no match

Which is not true. There is a perfectly good function of that name inside the file.

It's indented, isn't it?

Top
#367845 - 14/11/2016 20:05 Re: Regex help! [Re: tfabris]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31607
Loc: Seattle, WA
One option I saw in StackOverflow somewhere was to make the "end" of the regex be something that searched for common keywords that are used normally to denote the beginning of the *next* function and key off of that. For instance, the next time it hits "public" or "private" or "static" or somesuch. Can you think of situations in which that might fail?
_________________________
Tony Fabris

Top
#367846 - 14/11/2016 20:05 Re: Regex help! [Re: canuckInOR]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31607
Loc: Seattle, WA
Originally Posted By: canuckInOR
It's indented, isn't it?


Indeed. Our posts were crossing since we're posting to the thread at the same time. Sorry about that.
_________________________
Tony Fabris

Top
#367847 - 14/11/2016 20:25 Re: Regex help! [Re: tfabris]
canuckInOR
carpal tunnel

Registered: 13/02/2002
Posts: 3212
Loc: Portland, OR
Originally Posted By: tfabris
So Git is just PLAIN OLD STUPID with regard to this then?
Yeah, and a patch doesn't look trivial, either. From a cursory look at the source, it may be leaning on the xdiff library for this functionality.

Top
#367848 - 14/11/2016 21:02 Re: Regex help! [Re: tfabris]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31607
Loc: Seattle, WA
I'm not sure that I want to go to the extra trouble of trying to determine the line numbers ahead of time. For this particular application I'm OK with a compromise that gets me most of the way there.

Most of the code files it'll scan will be unit/integration tests files which will look something like this:

Code:
/// <summary>
/// Func description
/// </summary>
/// <param name = "param"> param descriptions... </param>
[Test]
[TestCase("test case params")]
[TestCase("test case params")]
[Some Decorator]
[Some other decorator]
public void MethodName(param param)
{
   (source code)
}

/// <summary>
/// Func description/// </summary>
/// <param name = "param"> param descriptions... </param>
[Test]
[TestCase("test case params")]
[TestCase("test case params")]
[Some Decorator]
[Some other decorator]
public void MethodName(param param)
{
   (source code)
}


What I'd like to do is try a regex which will look for the first "MethodName", then go searching for the next instance of one of the keywords that usually precedes the next MethodName (I'm thinking something along the lines of public|private|protected|internal|sealed or something), then from that point rolling back to the last closing curly brace it found before hitting that keyword. I'm currently having trouble working out that exact syntax in such a way that I can get the git blame command to accept it, mostly because I'm not good at regex yet.
_________________________
Tony Fabris

Top
#367849 - 14/11/2016 21:49 Re: Regex help! [Re: canuckInOR]
canuckInOR
carpal tunnel

Registered: 13/02/2002
Posts: 3212
Loc: Portland, OR
Originally Posted By: canuckInOR
Originally Posted By: tfabris
So Git is just PLAIN OLD STUPID with regard to this then?
Yeah, and a patch doesn't look trivial, either. From a cursory look at the source, it may be leaning on the xdiff library for this functionality.

This is more interesting than my real work, so...

I was wrong about it leaning on xdiff -- at least for my test. But it is indeed a non-trivial patch.

The three functions in question are all in line-range.c: match_funcname, find_funcname_matching_regexp, and parse_range_funcname.

We start out in parse_range_funcname, and move through the lines of code until we find a regex match for the function name by calling find_funcname_matching_regexp. That finds the start of the line, and then calls match_funcname. All that does is check that the line starts with an alpha, underscore, or $ character. Then we start looking for the next line that starts with an alpha, underscore, or $ character -- the next line that we find that matches is the end of the range.

So not only does this code fail to work for indented code, it fails to work for non-indented code, as well. The problem is that git, since it doesn't know the language, can't distinguish between the function being called in the code, and the definition of the function.

Lets assume we implement a patch to skip leading indent, and match the indent level. What if the code looks like this:
Code:
class foo {
    int some_other_func() {
        // ...
        if (funcname()) {
            // ...
        }
        return 0;
    }

    int funcname() {
        // ...
    }
};


So now we say "aha, I found funcname, and it has a leading indent of 8 spaces," and the resulting range will give you:
Code:
        if (funcname()) {
            // ...
        }


Clearly not what we want, either. So git is using the notion that a function signature must start at the beginning of the line, or it's not the function, but a function call. And I can't think of any other reliable alternative

So... ultimately, if you want something better than git's rudimentary assumption, you need to actually parse the language itself -- and that's clearly not git's job.

Top
#367850 - 14/11/2016 21:49 Re: Regex help! [Re: tfabris]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31607
Loc: Seattle, WA
Quote:
Clearly not what we want, either. So git is using the notion that a function signature must start at the beginning of the line, or it's not the function, but a function call. And I can't think of any other reliable alternative So... ultimately, if you want something better than git's rudimentary assumption, you need to actually parse the language itself -- and that's clearly not git's job.


I see the inherent problem there. So, agreed, doing -L<functionname> is out of the running.


Quote:
git-blame has no equivalent, and does not use PCRE.


If it's not using PCRE, then what *is* it using?

I can't get any of the online regex testers to work using command which do work in Git Blame. For instance, if I go here: https://regex101.com/

When I try to enter this regex (which *WORKS* in git blame though it halts at the first curlybrace it finds):
Code:
/^MyFunctionNameToLocate/,/^}/


All the online regex testers reject that syntax. When I try all the other options at that site:
- PCRE
- Javascript
- Python
- Golang
... none of them work with that syntax.

Whereas if I enter a syntax which works in those online testers, then those syntaxes are rejected by (or fail to find the correct results in) the GIT BLAME command.
_________________________
Tony Fabris

Top
#367851 - 14/11/2016 22:04 Re: Regex help! [Re: tfabris]
canuckInOR
carpal tunnel

Registered: 13/02/2002
Posts: 3212
Loc: Portland, OR
Originally Posted By: tfabris
I'm not sure that I want to go to the extra trouble of trying to determine the line numbers ahead of time. For this particular application I'm OK with a compromise that gets me most of the way there.

Most of the code files it'll scan will be unit/integration tests files which will look something like this:

Code:
/// <summary>
/// Func description
/// </summary>
/// <param name = "param"> param descriptions... </param>
[Test]
[TestCase("test case params")]
[TestCase("test case params")]
[Some Decorator]
[Some other decorator]
public void MethodName(param param)
{
   (source code)
}

/// <summary>
/// Func description/// </summary>
/// <param name = "param"> param descriptions... </param>
[Test]
[TestCase("test case params")]
[TestCase("test case params")]
[Some Decorator]
[Some other decorator]
public void MethodName(param param)
{
   (source code)
}


What I'd like to do is try a regex which will look for the first "MethodName", then go searching for the next instance of one of the keywords that usually precedes the next MethodName (I'm thinking something along the lines of public|private|protected|internal|sealed or something), then from that point rolling back to the last closing curly brace it found before hitting that keyword. I'm currently having trouble working out that exact syntax in such a way that I can get the git blame command to accept it, mostly because I'm not good at regex yet.

This is fairly well formed. Maybe something like:
Code:
git blame -L '/methodName/,/<summary>/' file.cc

or
Code:
git blame -L '/methodName/,/\[Test]/' file.cc # backwhack or you get a character class...

or
Code:
git blame -L '/methodName/,/\(public\|private\|protected\|internal\|sealed\)/' file.cc # backwhack or you get literal (, |, ) characters

Top
#367852 - 14/11/2016 22:09 Re: Regex help! [Re: tfabris]
canuckInOR
carpal tunnel

Registered: 13/02/2002
Posts: 3212
Loc: Portland, OR
Originally Posted By: tfabris
Quote:
git-blame has no equivalent, and does not use PCRE.


If it's not using PCRE, then what *is* it using?

POSIX extended regex. smile

Top
#367853 - 14/11/2016 22:12 Re: Regex help! [Re: tfabris]
canuckInOR
carpal tunnel

Registered: 13/02/2002
Posts: 3212
Loc: Portland, OR
Originally Posted By: tfabris

When I try to enter this regex (which *WORKS* in git blame though it halts at the first curlybrace it finds):
Code:
/^MyFunctionNameToLocate/,/^}/

Oh, and that's not one regex -- that's two different regex, separated by a comma:
Code:
/^MyFunctionNameToLocate/
and
Code:
/^}/
.

Top
#367854 - 15/11/2016 00:16 Re: Regex help! [Re: canuckInOR]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31607
Loc: Seattle, WA
Thanks so much for your excellent examples! You clearly understand regex deeply, and are therefore my new best friend.

Originally Posted By: canuckInOR
This is fairly well formed. Maybe something like:


I forgot to say that I can't guarantee that format. For instance, some of the functions might not be [Test] functions (they might be helper functions), or they might not have the <summary>. So your first two examples won't work.

Your third example is the exact thing I wanted to try. I had tried some variants of it before, but I couldn't get the syntax correct. Your syntax was correct except for one trick:
Code:
'private\' is not recognized as an internal or external command, operable program or batch file.

Simple enough fix: Surrounding your query in double quotes instead of single quotes worked as you intended (double quotes are required at the DOS command line to make it ignore the | symbol as a pipe-to-another-command syntax).

So when I did that, it worked! It showed the entire function. But it has another issue to solve:

In addition to the function I want to scan, it also returns the introductory lines of the next function (the HTML comments and the test decorators). I want it to *not* include those lines, I want it to stop at the closing curlybrace just BEFORE those lines. I don't know how to do the regex syntax to pull off that trick. Though I think it should theoretically be doable because I don't think it requires counting and keeping track of curly brace counts. Do you know how to do that?

Something else about your example: the function I'm scanning might, in some cases, be the very last function in the file. In that case, your example gets the error message:
Code:
fatal: -L parameter '\(public\|private\|protected\|internal\|sealed\)' starting at line 638: No match

What is the syntax to work around that so that it gets me to the end of the file even if it can't find the (public|private|etc.)?

I also thought of a way to make it more robust. The functions I'm scanning should always be an Nunit test, and by definition, nunit tests must be declared "public void", so I could include that before the name of the function in the first part of the regex so that I can be certain I'm finding the actual test function.
_________________________
Tony Fabris

Top
#367855 - 15/11/2016 00:23 Re: Regex help! [Re: peter]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31607
Loc: Seattle, WA
Originally Posted By: peter
might be to try and end the blame by matching the start of the next function instead. Is there a mandatory doc-comment or something you could look for?


I also want to acknowledge, giving credit where due, that Peter was the first to mention this idea in this discussion thread. At that point I had already seen examples on StackOverflow pointing out the idea of looking for things like (Private|Public|etc.) to find the start of the next function, but their syntax wasn't working for me, probably because they were all written in PCRE or something.
_________________________
Tony Fabris

Top
#367856 - 15/11/2016 12:26 Re: Regex help! [Re: tfabris]
Roger
carpal tunnel

Registered: 18/01/2000
Posts: 5685
Loc: London, UK
Originally Posted By: tfabris
I want GIT BLAME on every line of that function, giving me the email address of every person who worked on every line of that function, from its declaration at the top, down through every line of the function down to its closing curlybrace.


Taking a step back: why? You're looking for a technical solution to a social problem. There's a reason why cvs annotate became svn (and then git) blame...

On the other hand, you might be interested in Your code as a crime scene.
_________________________
-- roger

Top
#367860 - 15/11/2016 17:53 Re: Regex help! [Re: tfabris]
canuckInOR
carpal tunnel

Registered: 13/02/2002
Posts: 3212
Loc: Portland, OR
Originally Posted By: tfabris
Thanks so much for your excellent examples! You clearly understand regex deeply, and are therefore my new best friend.

Heh. I lurked on comp.lang.perl.misc for a decade. smile

Quote:
But it has another issue to solve:

In addition to the function I want to scan, it also returns the introductory lines of the next function (the HTML comments and the test decorators). I want it to *not* include those lines, I want it to stop at the closing curlybrace just BEFORE those lines. I don't know how to do the regex syntax to pull off that trick. [...] Do you know how to do that?

I'm not certain this is be possible with git/POSIX expressions. What you need is to say "match } followed by a newline followed by anything (including newlines) that's not a } followed by something in this set of keywords." That will get you the last } character immediately preceding one of your special keywords.

In PCRE, that would be:
Code:
/}\n[^}]*?(?:keyword|list)/ms

The problem is that git compiles the regex with the REG_NEWLINE flag, so '.' and other match-any-character operators (such as [^}]) will not match a newline. Making matters worse, POSIX regex (even the extended regex) does not define any escape characters, so you can't even explicitly say "match a newline" aside from using the ^ and $ operators (which technically don't match newlines, but match the empty boundary immediately following or preceding the newline). The net effect is that can't do any matches that cross line boundaries.

Quote:
Something else about your example: the function I'm scanning might, in some cases, be the very last function in the file. In that case, your example gets the error message:
Code:
fatal: -L parameter '\(public\|private\|protected\|internal\|sealed\)' starting at line 638: No match

What is the syntax to work around that so that it gets me to the end of the file even if it can't find the (public|private|etc.)?

In this case, you have to supply two -L arguments. The first is the one you just made, the other is:
Code:
-L '/MethodName/,'

So you're not specifying an end to the range.

Quote:
I also thought of a way to make it more robust. The functions I'm scanning should always be an Nunit test, and by definition, nunit tests must be declared "public void", so I could include that before the name of the function in the first part of the regex so that I can be certain I'm finding the actual test function.

That's a good addition.

Top
#367861 - 15/11/2016 18:21 Re: Regex help! [Re: Roger]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31607
Loc: Seattle, WA
Originally Posted By: Roger
Taking a step back: why? You're looking for a technical solution to a social problem.


Good question. I'm not trying to solve a problem here, I'm merely trying to automate an existing step in the normal day to day business.

We have thousands of unit tests and also thousands of integration tests, written by dozens of different devs. If a unit test fails, no problem, we instantly know which change caused the failure, and the build system automatically emails them.

Integration test failures are trickier: I already have the changelist data which preceded the failure, but sometimes the changes in that changelist are totally unrelated to the integration test failure. When that happens, my next step is to look up who the test author(s) are, so I can open up a discussion about the test failure. The test author is the most likely person to be able to locate the source of the issue the fastest, and would also be the most logical person to make any corrections. In some cases, the integration tests are intermittently flaky, and in that situation I would assign a task to the test's main author to fix it so that it's not flaky any more.

I'm just trying to automate the step of looking up who the test author(s) are. It normally takes me a few minutes each time. According to the chart, I can waste about six days trying to automate this. smile
_________________________
Tony Fabris

Top
#367862 - 15/11/2016 22:34 Re: Regex help! [Re: canuckInOR]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31607
Loc: Seattle, WA
Originally Posted By: canuckInOR
In this case, you have to supply two -L arguments. The first is the one you just made, the other is:


I'm assuming that you mean that, a single "Git Blame" command will contain two -L parameters to the same command.... That didn't seem to work when I tried it, alas. Still got a "no match", as if it only took the first of the two parameters.

If I do it by itself, it works, but, that's two separate blame commands. I'm hoping to do this with a single command.

Isn't there a regex marker for "end of file" so that I can search for (keyword|keyword|keyword|eof)?
_________________________
Tony Fabris

Top
#367863 - 15/11/2016 22:50 Re: Regex help! [Re: tfabris]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31607
Loc: Seattle, WA
I've been trying \z but it's not working.

When I put it in the selection list as just "\z" it fails because it just stops at the very first letter z in the file. If I put it in as "\\z" it gets "no match found".
_________________________
Tony Fabris

Top
#367864 - 15/11/2016 23:26 Re: Regex help! [Re: tfabris]
canuckInOR
carpal tunnel

Registered: 13/02/2002
Posts: 3212
Loc: Portland, OR
Originally Posted By: tfabris
Originally Posted By: canuckInOR
In this case, you have to supply two -L arguments. The first is the one you just made, the other is:


I'm assuming that you mean that, a single "Git Blame" command will contain two -L parameters to the same command.... That didn't seem to work when I tried it, alas. Still got a "no match", as if it only took the first of the two parameters.

Huh... so it does. That seems counter to the documentation, which specifies -L can be used multiple times. It seems to be an all-or-nothing approach. Either both match, or the whole thing fails.

Quote:
Isn't there a regex marker for "end of file" so that I can search for (keyword|keyword|keyword|eof)?

In other regex languages there are, but not in POSIX.

Top
Page 1 of 2 1 2 >