1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-08 01:36:08 +02:00
Commit Graph

119 Commits

Author SHA1 Message Date
Ed Maste d1b1384d61 userdiff: remove empty subexpression from elixir regex
The regex failed to compile on FreeBSD.

Also add /* -- */ mark to separate the two regex entries given to
the PATTERNS() macro, to make it consistent with patterns for other
content types.

Signed-off-by: Ed Maste <emaste@FreeBSD.org>
Reviewed-by: Jeff King <peff@peff.net>
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-13 12:20:48 -08:00
Junio C Hamano 9502b616f1 Merge branch 'jh/userdiff-python-async'
The userdiff machinery has been taught that "async def" is another
way to begin a "function" in Python.

* jh/userdiff-python-async:
  userdiff: support Python async functions
2019-12-05 12:52:44 -08:00
Josh Holland 077a1fda82 userdiff: support Python async functions
Python's async functions (declared with "async def" rather than "def")
were not being displayed in hunk headers. This commit teaches git about
the async function syntax, and adds tests for the Python userdiff regex.

Signed-off-by: Josh Holland <anowlcalledjosh@gmail.com>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-20 16:31:43 +09:00
Łukasz Niemier a807200f67 userdiff: add Elixir to supported userdiff languages
Adds support for xfuncref in Elixir[1] language which is Ruby-like
language that runs on Erlang[3] Virtual Machine (BEAM).

[1]: https://elixir-lang.org
[2]: https://www.erlang.org

Signed-off-by: Łukasz Niemier <lukasz@niemier.pl>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-10 15:26:26 +09:00
Stephen Boyd 8da56a4848 userdiff: fix some corner cases in dts regex
While reviewing some dts diffs recently I noticed that the hunk header
logic was failing to find the containing node. This is because the regex
doesn't consider properties that may span multiple lines, i.e.

	property = <something>,
		   <something_else>;

and it got hung up on comments inside nodes that look like the root node
because they start with '/*'. Add tests for these cases and update the
regex to find them. Maybe detecting the root node is too complicated but
forcing it to be a backslash with any amount of whitespace up to an open
bracket seemed OK. I tried to detect that a comment is in-between the
two parts but I wasn't happy so I just dropped it.

Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-21 17:44:12 +09:00
Stephen Boyd 3c81760bc6 userdiff: add a builtin pattern for dts files
The Linux kernel receives many patches to the devicetree files each
release. The hunk header for those patches typically show nothing,
making it difficult to figure out what node is being modified without
applying the patch or opening the file and seeking to the context. Let's
add a builtin 'dts' pattern to git so that users can get better diff
output on dts files when they use the diff=dts driver.

The regex has been constructed based on the spec at devicetree.org[1]
and with some help from Johannes Sixt.

[1] https://github.com/devicetree-org/devicetree-specification/releases/latest

Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-21 15:09:34 -07:00
Junio C Hamano a41dad4330 Merge branch 'ml/userdiff-rust'
The pattern "git diff/grep" use to extract funcname and words
boundary for Rust has been added.

* ml/userdiff-rust:
  userdiff: two simplifications of patterns for rust
  userdiff: add built-in pattern for rust
2019-06-21 11:24:08 -07:00
Johannes Sixt 33be7b38c2 userdiff: two simplifications of patterns for rust
- Do not enforce (but assume) syntactic correctness of language
  constructs that go into hunk headers: we only want to ensure that
  the keywords actually are words and not just the initial part of
  some identifier.

- In the word regex, match numbers only when they begin with a digit,
  but then be liberal in what follows, assuming that the text that is
  matched is syntactially correct.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-30 09:53:26 -07:00
Boxuan Li 2731a7840a userdiff: fix grammar and style issues
Signed-off-by: Boxuan Li <liboxuan@connect.hku.hk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-29 09:50:31 -07:00
Boxuan Li 91bf382fcf userdiff: add Octave
Octave pattern is almost the same as matlab, except
that '%%%' and '##' can also be used to begin code sections,
in addition to '%%' that is understood by both. Octave
pattern is merged into Matlab pattern. Test cases for
the hunk header patterns of matlab and octave under
t/t4018 are added.

Signed-off-by: Boxuan Li <liboxuan@connect.hku.hk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-19 10:45:28 +09:00
Marc-André Lureau d74e78602e userdiff: add built-in pattern for rust
This adds xfuncname and word_regex patterns for Rust, a quite
popular programming language. It also includes test cases for the
xfuncname regex (t4018) and updated documentation.

The word_regex pattern finds identifiers, integers, floats and
operators, according to the Rust Reference Book.

Cc: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-17 12:54:02 +09:00
Junio C Hamano 3434569fc2 Merge branch 'nd/style-opening-brace'
Code clean-up.

* nd/style-opening-brace:
  style: the opening '{' of a function is in a separate line
2019-01-18 13:49:52 -08:00
Nguyễn Thái Ngọc Duy 3b3357626e style: the opening '{' of a function is in a separate line
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-10 15:41:09 +09:00
Nguyễn Thái Ngọc Duy bd7ad45b64 notes-cache.c: remove the_repository references
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12 14:50:06 +09:00
Junio C Hamano 11877b9ebe Merge branch 'nd/the-index'
Various codepaths in the core-ish part learn to work on an
arbitrary in-core index structure, not necessarily the default
instance "the_index".

* nd/the-index: (23 commits)
  revision.c: reduce implicit dependency the_repository
  revision.c: remove implicit dependency on the_index
  ws.c: remove implicit dependency on the_index
  tree-diff.c: remove implicit dependency on the_index
  submodule.c: remove implicit dependency on the_index
  line-range.c: remove implicit dependency on the_index
  userdiff.c: remove implicit dependency on the_index
  rerere.c: remove implicit dependency on the_index
  sha1-file.c: remove implicit dependency on the_index
  patch-ids.c: remove implicit dependency on the_index
  merge.c: remove implicit dependency on the_index
  merge-blobs.c: remove implicit dependency on the_index
  ll-merge.c: remove implicit dependency on the_index
  diff-lib.c: remove implicit dependency on the_index
  read-cache.c: remove implicit dependency on the_index
  diff.c: remove implicit dependency on the_index
  grep.c: remove implicit dependency on the_index
  diff.c: remove the_index dependency in textconv() functions
  blame.c: rename "repo" argument to "r"
  combine-diff.c: remove implicit dependency on the_index
  ...
2018-10-19 13:34:02 +09:00
Nguyễn Thái Ngọc Duy acd00ea049 userdiff.c: remove implicit dependency on the_index
[jc: squashed in missing forward decl in userdiff.h found by Ramsay]

Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21 09:50:58 -07:00
Torsten Bögershausen d64324cb60 Make git_check_attr() a void function
git_check_attr() returns always 0.
Remove all the error handling code of the callers, which is never executed.
Change git_check_attr() to be a void function.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-12 15:15:34 -07:00
Nguyễn Thái Ngọc Duy 7a400a2c02 attr: remove an implicit dependency on the_index
Make the attr API take an index_state instead of assuming the_index in
attr code. All call sites are converted blindly to keep the patch
simple and retain current behavior. Individual call sites may receive
further updates to use the right index instead of the_index.

There is one ugly temporary workaround added in attr.c that needs some
more explanation.

Commit c24f3abace (apply: file commited with CRLF should roundtrip
diff and apply - 2017-08-19) forces one convert_to_git() call to NOT
read the index at all. But what do you know, we read it anyway by
falling back to the_index. When "istate" from convert_to_git is now
propagated down to read_attr_from_array() we will hit segfault
somewhere inside read_blob_data_from_index.

The right way of dealing with this is to kill "use_index" variable and
only follow "istate" but at this stage we are not ready for that:
while most git_attr_set_direction() calls just passes the_index to be
assigned to use_index, unpack-trees passes a different one which is
used by entry.c code, which has no way to know what index to use if we
delete use_index. So this has to be done later.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 14:14:42 -07:00
Kana Natsuno 1ab631647e userdiff: support new keywords in PHP hunk header
Recent version of PHP supports interface, trait, abstract class and
final class.  This patch fixes the PHP hunk header regexp to support
all of these keywords.

Signed-off-by: Kana Natsuno <dev@whileimautomaton.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-06 14:59:28 -07:00
Junio C Hamano e215e89791 Merge branch 'tl/userdiff-csharp-async'
Update funcname pattern used for C# to recognize "async" keyword.

* tl/userdiff-csharp-async:
  userdiff.c: add C# async keyword in diff pattern
2018-03-15 15:00:47 -07:00
Thomas Levesque a12cec99f8 userdiff.c: add C# async keyword in diff pattern
Currently C# async methods are not shown in diff hunk headers. I just
added the async keyword to the csharp method pattern so that they are
properly detected.

Signed-off-by: Thomas Levesque <thomas.levesque@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-08 11:03:32 -08:00
Alban Gruin 1dbf0c0ad6 userdiff: add built-in pattern for golang
This adds xfuncname and word_regex patterns for golang, a quite
popular programming language. It also includes test cases for the
xfuncname regex (t4018) and updated documentation.

The xfuncname regex finds functions, structs and interfaces.  Although
the Go language prohibits the opening brace from being on its own
line, the regex does not makes it mandatory, to be able to match
`func` statements like this:

    func foo(bar int,
    	baz int) {
    }

This is covered by the test case t4018/golang-long-func.

The word_regex pattern finds identifiers, integers, floats, complex
numbers and operators, according to the go specification.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-01 13:36:49 -08:00
Junio C Hamano 376a1da839 Merge branch 'ik/userdiff-html-h-element-fix'
The built-in pattern to detect the "function header" for HTML did
not match <H1>..<H6> elements without any attributes, which has
been fixed.

* ik/userdiff-html-h-element-fix:
  userdiff: fix HTML hunk header regexp
2017-09-28 14:47:54 +09:00
Ilya Kantor 9c03caca2c userdiff: fix HTML hunk header regexp
Current HTML header regexp doesn't match headers without attributes.

So it fails to match <h1>...</h1>, while <h1 class="smth">...</h1> matches.

Make attributes optional to fix this.  The regexp is still far from
perfect, but now it at least handles the common case.

Signed-off-by: Ilya Kantor <iliakan@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-24 10:13:28 +09:00
Rene Scharfe 460c7eb2bf userdiff: release strbuf after use in userdiff_get_textconv()
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-07 08:49:28 +09:00
Brandon Williams b2141fc1d2 config: don't include config.h by default
Stop including config.h by default in cache.h.  Instead only include
config.h in those files which require use of the config system.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15 12:56:22 -07:00
Junio C Hamano 2aef63d31c attr: convert git_check_attrs() callers to use the new API
The remaining callers are all simple "I have N attributes I am
interested in.  I'll ask about them with various paths one by one".

After this step, no caller to git_check_attrs() remains.  After
removing it, we can extend "struct attr_check" struct with data
that can be used in optimizing the query for the specific N
attributes it contains.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01 13:46:52 -08:00
Junio C Hamano 7bd18054d2 attr: rename function and struct related to checking attributes
The traditional API to check attributes is to prepare an N-element
array of "struct git_attr_check" and pass N and the array to the
function "git_check_attr()" as arguments.

In preparation to revamp the API to pass a single structure, in
which these N elements are held, rename the type used for these
individual array elements to "struct attr_check_item" and rename
the function to "git_check_attrs()".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01 13:46:52 -08:00
William Duclot 0719f3eecd userdiff: add built-in pattern for CSS
CSS is widely used, motivating it being included as a built-in pattern.

It must be noted that the word_regex for CSS (i.e. the regex defining
what is a word in the language) does not consider '.' and '#' characters
(in CSS selectors) to be part of the word. This behavior is documented
by the test t/t4018/css-rule.
The logic behind this behavior is the following: identifiers in CSS
selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
character are not part of the identifier, but an indicator of the nature
of the identifier in HTML/XML (class or id). Diffing ".class1" and
".class2" must show that the class name is changed, but we still are
selecting a class.

Logic behind the "pattern" regex is:
    1. reject lines ending with a colon/semicolon (properties)
    2. if a line begins with a name in column 1, pick the whole line

Credits to Johannes Sixt (j6t@kdbg.org) for the pattern regex and most
of the tests.

Signed-off-by: William Duclot <william.duclot@ensimag.grenoble-inp.fr>
Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-03 14:45:56 -07:00
Zoë Blade 69f9c87d46 userdiff: add support for Fountain documents
Add support for Fountain, a plain text screenplay format.  Git
facilitates not just programming specifically, but creative writing
in general, so it makes sense to also support other plain text
documents besides source code.

In the structure of a screenplay specifically, scenes are roughly
analogous to functions, in the sense that it makes your job easier
if you can see which ones were changed in a given range of patches.

More information about the Fountain format can be found on its
official website, at http://fountain.io .

Signed-off-by: Zoë Blade <zoe@bytenoise.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-23 14:44:51 -07:00
Дилян Палаузов 5d308512ff do not include the same header twice
A few files include the same header file directly more than once.

As all these headers protect themselves against repeated inclusion
by the "#ifndef FOO_H / #define FOO_H / ... / #endif" idiom, leave
only the first inclusion and remove the later inclusion as a no-op
clean-up.

Signed-off-by: Дилян Палаузов <git-dpa@aegee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-13 13:16:12 -08:00
Johannes Sixt 8a2e8da367 userdiff: have 'cpp' hunk header pattern catch more C++ anchor points
The hunk header pattern 'cpp' is intended for C and C++ source code, but
it is actually not particularly useful for the latter, and even misses
some use-cases for the former.

The parts of the pattern have the following flaws:

- The first part matches an identifier followed immediately by a colon
  and arbitrary text and is intended to reject goto labels and C++
  access specifiers (public, private, protected). But this pattern also
  rejects C++ constructs, which look like this:

    MyClass::MyClass()
    MyClass::~MyClass()
    MyClass::Item MyClass::Find(...

- The second part matches an identifier followed by a list of qualified
  names (i.e. identifiers separated by the C++ scope operator '::')
  separated by space or '*' followed by an opening parenthesis (with
  space between the tokens). It matches function declarations like

    struct item* get_head(...
    int Outer::Inner::Func(...

  Since the pattern requires at least two identifiers, GNU-style
  function definitions are ignored:

    void
    func(...

  Moreover, since the pattern does not allow punctuation other than '*',
  the following C++ constructs are not recognized:

  . template definitions:
      template<class T> int func(T arg)

  . functions returning references:
      const string& get_message()

  . functions returning templated types:
      vector<int> foo()

  . operator definitions:
      Value operator+(Value l, Value r)

- The third part of the pattern finally matches compound definitions.
  But it forgets about unions and namespaces, and also skips single-line
  definitions

    struct random_iterator_tag {};

  because no semicolon can occur on the line.

Change the first pattern to require a colon at the end of the line
(except for trailing space and comments), so that it does not reject
constructor or destructor definitions.

Notice that all interesting anchor points begin with an identifier or
keyword. But since there is a large variety of syntactical constructs
after the first "word", the simplest is to require only this word and
accept everything else. Therefore, this boils down to a line that begins
with a letter or underscore (optionally preceded by the C++ scope
operator '::' to accept functions returning a type anchored at the
global namespace). Replace the second and third part by a single pattern
that picks such a line.

This has the following desirable consequence:

- All constructs mentioned above are recognized.

and the following likely desirable consequences:

- Definitions of global variables and typedefs are recognized:

    int num_entries = 0;
    extern const char* help_text;
    typedef basic_string<wchar_t> wstring;

- Commonly used marco-ized boilerplate code is recognized:

    BEGIN_MESSAGE_MAP(CCanvas,CWnd)
    Q_DECLARE_METATYPE(MyStruct)
    PATTERNS("tex",...)

  (The last one is from this very patch.)

but also the following possibly undesirable consequence:

- When a label is not on a line by itself (except for a comment) it is
  no longer rejected, but can appear as a hunk header if it occurs at
  the beginning of a line:

    next:;

IMO, the benefits of the change outweigh the (possible) regressions by a
large margin.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-21 15:03:32 -07:00
Johannes Sixt abf8f98602 userdiff: support unsigned and long long suffixes of integer constants
Do not split constants such as 123U, 456ll, 789UL at the first U or
second L.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-21 14:48:07 -07:00
Johannes Sixt 407e07f2a6 userdiff: support C++ ->* and .* operators in the word regexp
The character sequences ->* and .* are valid C++ operators. Keep them
together in --word-diff mode.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-21 14:47:50 -07:00
Adrian Johnson 39a87a29ce userdiff: update Ada patterns
- Allow extra space in "is new" and "is separate"
- Fix bug in word regex for numbers

Signed-off-by: Adrian Johnson <ajohnson@redneon.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-05 10:45:51 -08:00
Jeff King 0a5987fe5e userdiff: drop parse_driver function
When we parse userdiff config, we generally assume that

  diff.name.key

will affect the "key" value of the "name" driver. However,
without confirming that the key is a valid userdiff key, we
may accidentally conflict with the ancient "diff.color.*"
namespace. The current code is careful not to even create a
driver struct if we do not see a key that is known by the
diff-driver code.

However, this carefulness is unnecessary; the default driver
with no keys set behaves exactly the same as having no
driver at all. We can simply set up the driver struct as
soon as we see we have a config key that looks like a
driver. This makes the code a bit more readable.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-23 08:41:51 -08:00
Jeff King d731f0ade1 convert some config callbacks to parse_config_key
These callers can drop some inline pointer arithmetic and
magic offset constants, making them more readable and less
error-prone (those constants had to match the lengths of
strings, but there is no automatic verification of that
fact).

The "ep" pointer (presumably for "end pointer"), which
points to the final key segment of the config variable, is
given the more standard name "key" to describe its function
rather than its derivation.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-23 08:41:50 -08:00
Adrian Johnson e90d065e64 Add userdiff patterns for Ada
Add Ada xfuncname and wordRegex patterns to the list of builtin
patterns.

Signed-off-by: Adrian Johnson <ajohnson@redneon.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-16 21:54:47 -07:00
Jeff King 6680a0874f drop odd return value semantics from userdiff_config
When the userdiff_config function was introduced in be58e70
(diff: unify external diff and funcname parsing code,
2008-10-05), it used a return value convention unlike any
other config callback. Like other callbacks, it used "-1" to
signal error. But it returned "1" to indicate that it found
something, and "0" otherwise; other callbacks simply
returned "0" to indicate that no error occurred.

This distinction was necessary at the time, because the
userdiff namespace overlapped slightly with the color
configuration namespace. So "diff.color.foo" could mean "the
'foo' slot of diff coloring" or "the 'foo' component of the
"color" userdiff driver". Because the color-parsing code
would die on an unknown color slot, we needed the userdiff
code to indicate that it had matched the variable, letting
us bypass the color-parsing code entirely.

Later, in 8b8e862 (ignore unknown color configuration,
2009-12-12), the color-parsing code learned to silently
ignore unknown slots. This means we no longer need to
protect userdiff-matched variables from reaching the
color-parsing code.

We can therefore change the userdiff_config calling
convention to a more normal one. This drops some code from
each caller, which is nice. But more importantly, it reduces
the cognitive load for readers who may wonder why
userdiff_config is unlike every other config callback.

There's no need to add a new test confirming that this
works; t4020 already contains a test that sets
diff.color.external.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-07 10:44:54 -08:00
Junio C Hamano 357ba5cf0d Merge branch 'tr/userdiff-c-returns-pointer'
* tr/userdiff-c-returns-pointer:
  userdiff: allow * between cpp funcname words
2011-12-13 22:57:19 -08:00
Thomas Rast 37e7793d47 userdiff: allow * between cpp funcname words
The cpp pattern, used for C and C++, would not match the start of a
declaration such as

  static char *prepare_index(int argc,

because it did not allow for * anywhere between the various words that
constitute the modifiers, type and function name.  Fix it.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-06 13:16:37 -08:00
Gustaf Hendeby 53b10a1405 Add built-in diff patterns for MATLAB code
MATLAB is often used in industry and academia for scientific
computations motivating it being included as a built-in pattern.

Signed-off-by: Gustaf Hendeby <hendeby@isy.liu.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-15 16:11:52 -08:00
Michael Haggerty d932f4eb9f Rename git_checkattr() to git_check_attr()
Suggested by: Junio Hamano <gitster@pobox.com>

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-04 15:53:21 -07:00
Junio C Hamano dbae1a1336 Merge branch 'jk/combine-diff-binary-etc'
* jk/combine-diff-binary-etc:
  combine-diff: respect textconv attributes
  refactor get_textconv to not require diff_filespec
  combine-diff: handle binary files as binary
  combine-diff: calculate mode_differs earlier
  combine-diff: split header printing into its own function
2011-06-29 17:03:10 -07:00
Jeff King 3813e69031 refactor get_textconv to not require diff_filespec
This function actually does two things:

  1. Load the userdiff driver for the filespec.

  2. Decide whether the driver has a textconv component, and
     initialize the textconv cache if applicable.

Only part (1) requires the filespec object, and some callers
may not have a filespec at all. So let's split them it into
two functions, and put part (2) with the userdiff code,
which is a better fit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-23 15:46:02 -07:00
Jonathan Nieder f143d9c695 userdiff/perl: tighten BEGIN/END block pattern to reject here-doc delimiters
A naive method of treating BEGIN/END blocks with a brace on the second
line as diff/grep funcname context involves also matching unrelated
lines that consist of all-caps letters:

	sub foo {
		print <<'EOF'
	text goes here
	...
	EOF
		... rest of foo ...
	}

That's not so great, because it means that "git diff" and "git grep
--show-function" would write "=EOF" or "@@ EOF" as context instead of
a more useful reminder like "@@ sub foo {".

To avoid this, tighten the pattern to only match the special block
names that perl accepts (namely BEGIN, END, INIT, CHECK, UNITCHECK,
AUTOLOAD, and DESTROY).  The list is taken from perl's toke.c.

Suggested-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-23 11:39:13 -07:00
Jonathan Nieder ea2ca4497b userdiff/perl: catch sub with brace on second line
Accept

	sub foo
	{
	}

as an alternative to a more common style that introduces perl
functions with a brace on the first line (and likewise for BEGIN/END
blocks).  The new regex is a little hairy to avoid matching

	# forward declaration
	sub foo;

while continuing to match "sub foo($;@) {" and

	sub foo { # This routine is interesting;
		# in fact, the lines below explain how...

While at it, pay attention to Perl 5.14's "package foo {" syntax as an
alternative to the traditional "package foo;".

Requested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-21 22:29:32 -07:00
Jonathan Nieder 12f0967a8a userdiff/perl: match full line of POD headers
The builtin perl userdiff driver is not greedy enough about catching
POD header lines.  Capture the whole line, so instead of just
declaring that we are in some "@@ =head1" section, diff/grep output
can explain that the enclosing section is about "@@ =head1 OPTIONS".

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-21 22:29:32 -07:00
Jonathan Nieder f12c66b9bb userdiff/perl: anchor "sub" and "package" patterns on the left
The userdiff funcname mechanism has no concept of nested scopes ---
instead, "git diff" and "git grep --show-function" simply label the
diff header with the most recent matching line.  Unfortunately that
means text following a subroutine in a POD section:

	=head1 DESCRIPTION

	You might use this facility like so:

		sub example {
			foo;
		}

	Now, having said that, let's say more about the facility.
	Blah blah blah ... etc etc.

gets the subroutine name instead of the POD header in its diff/grep
funcname header, making it harder to get oriented when reading a
diff without enough context.

The fix is simple: anchor the funcname syntax to the left margin so
nested subroutines and packages like this won't get picked up.  (The
builtin C++ funcname pattern already does the same thing.)  This means
the userdiff driver will misparse the idiom

	{
		my $static;
		sub foo {
			... use $static ...
		}
	}

but I think that's worth it; we can revisit this later if the userdiff
mechanism learns to keep track of the beginning and end of nested
scopes.

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-21 22:29:31 -07:00
Junio C Hamano 1bb4abeff7 Merge branch 'tr/diff-words-test'
* tr/diff-words-test:
  t4034 (diff --word-diff): add a minimum Perl drier test vector
  t4034 (diff --word-diff): style suggestions
  userdiff: simplify word-diff safeguard
  t4034: bulk verify builtin word regex sanity
2011-02-09 16:41:17 -08:00
Junio C Hamano 342953a686 Merge branch 'as/userdiff-pascal'
* as/userdiff-pascal:
  userdiff: match Pascal class methods
2011-01-24 10:54:12 -08:00
Jonathan Nieder 664d44ee7f userdiff: simplify word-diff safeguard
git's diff-words support has a detail that can be a little dangerous:
any text not matched by a given language's tokenization pattern is
treated as whitespace and changes in such text would go unnoticed.
Therefore each of the built-in regexes allows a special token type
consisting of a single non-whitespace character [^[:space:]].

To make sure UTF-8 sequences remain human readable, the builtin
regexes also have a special token type for runs of bytes with the high
bit set.  In English, non-ASCII characters are usually isolated so
this is analogous to the [^[:space:]] pattern, except it matches a
single _multibyte_ character despite use of the C locale.

Unfortunately it is easy to make typos or forget entirely to include
these catch-all token types when adding support for new languages (see
v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes,
2010-12-18).  Avoid this by including them automatically within the
PATTERNS and IPATTERN macros.

While at it, change the UTF-8 sequence token type to match exactly one
non-ASCII multi-byte character, rather than an arbitrary run of them.

Suggested-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-18 08:59:39 -08:00
Alexey Shumkin ad5b6942d5 userdiff: match Pascal class methods
Class declarations were already covered by the second pattern, but class
methods have the 'class' keyword in front too. Account for it.

Signed-off-by: Alexey Shumkin <zapped@mail.ru>
Acked-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-11 11:03:48 -08:00
Junio C Hamano a25e47377d userdiff/perl: catch BEGIN/END/... and POD as headers
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-27 09:19:38 -08:00
Jonathan Nieder 71a5d4bc0e diff: funcname and word patterns for perl
The default function name discovery already works quite well for Perl
code... with the exception of here-documents (or rather their ending).

 sub foo {
	print <<END
 here-document
 END
	return 1;
 }

The default funcname pattern treats the unindented END line as a
function declaration and puts it in the @@ line of diff and "grep
--show-function" output.

With a little knowledge of perl syntax, we can do better.  You can
try it out by adding "*.perl diff=perl" to the gitattributes file.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-27 08:47:21 -08:00
Thomas Rast b34f69f991 userdiff: fix typo in ruby and python word regexes
Both had an unclosed ] that ruined the safeguard against not matching
a non-space char.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-18 22:58:40 -08:00
Brandon Casey 909a5494f8 userdiff.c: add builtin fortran regex patterns
This adds fortran xfuncname and wordRegex patterns to the list of builtin
patterns.  The intention is for the patterns to be appropriate for all
versions of fortran including 77, 90, 95.  The patterns can be enabled by
adding the diff=fortran attribute to the .gitattributes file for the
desired file glob.

This also adds a new macro named IPATTERN which is just like the PATTERNS
macro except it sets the REG_ICASE flag so that case will be ignored.

The test code in t4018 and the docs were updated as appropriate.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-10 09:38:29 -07:00
Petr Onderka b221207db9 Userdiff patterns for C#
Add userdiff patterns for C#. This code is an improved version of
code by Adam Petaccia from 21 June 2009 mail to the list.

Signed-off-by: Petr Onderka <gsvick@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-16 18:28:27 -07:00
Junio C Hamano 9559910cac Merge branch 'bs/userdiff-php'
* bs/userdiff-php:
  diff: Support visibility modifiers in the PHP hunk header regexp
2010-06-13 11:22:05 -07:00
Björn Steinbrink 6d2f208c3d diff: Support visibility modifiers in the PHP hunk header regexp
Starting with PHP5, class methods can have a visibility modifier, which
caused the methods not to be matched by the existing regexp, so extend
the regexp to match those modifiers. And while we're at it, allow the
"static" modifier as well.

Since the "static" modifier can appear either before or after the
visibility modifier, let's just allow any number of modifiers to appear
in any order, as that simplifies the regexp and shouldn't cause any
false positives.

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-27 07:12:07 -07:00
Jeff King d9bae1a178 diff: cache textconv output
Running a textconv filter can take a long time. It's
particularly bad for a large file which needs to be spooled
to disk, but even for small files, the fork+exec overhead
can add up for something like "git log -p".

This patch uses the notes-cache mechanism to keep a fast
cache of textconv output. Caches are stored in
refs/notes/textconv/$x, where $x is the userdiff driver
defined in gitattributes.

Caching is enabled only if diff.$x.cachetextconv is true.

In my test repo, on a commit with 45 jpg and avi files
changed and a textconv to show their exif tags:

  [before]
  $ time git show >/dev/null
  real    0m13.724s
  user    0m12.057s
  sys     0m1.624s

  [after, first run]
  $ git config diff.mfo.cachetextconv true
  $ time git show >/dev/null
  real    0m14.252s
  user    0m12.197s
  sys     0m1.800s

  [after, subsequent runs]
  $ time git show >/dev/null
  real    0m0.352s
  user    0m0.148s
  sys     0m0.200s

So for a slight (3.8%) cost on the first run, we achieve an
almost 40x speed up on subsequent runs.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-02 00:05:31 -07:00
Junio C Hamano 7fb0eaa289 git_attr(): fix function signature
The function took (name, namelen) as its arguments, but all the public
callers wanted to pass a full string.

Demote the counted-string interface to an internal API status, and allow
public callers to just pass the string to the function.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-16 20:39:59 -08:00
Paolo Bonzini 959e2e64a5 avoid exponential regex match for java and objc function names
In the old regex

^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\([^;]*)$
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

you can backtrack arbitrarily from [A-Za-z_0-9]* into [A-Za-z_], thus
causing an exponential number of backtracks.  Ironically it also causes
the regex not to work as intended; for example "catch" can match the
underlined part of the regex, the first repetition matching "c" and
the second matching "atch".

The replacement regex avoids this problem, because it makes sure that
at least a space/tab is eaten on each repetition.  In other words,
a suffix of a repetition can never be a prefix of the next repetition.

Signed-off-by: Paolo Bonzini <bonzini@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-18 09:52:10 -07:00
Boyd Stephen Smith Jr ae3b970ac3 Change the spelling of "wordregex".
Use "wordRegex" for configuration variable names.  Use "word_regex" for C
language tokens.

Signed-off-by: Boyd Stephen Smith Jr. <bss@iguanasuicide.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-21 23:52:16 -08:00
Thomas Rast 80c49c3de2 color-words: make regex configurable via attributes
Make the --color-words splitting regular expression configurable via
the diff driver's 'wordregex' attribute.  The user can then set the
driver on a file in .gitattributes.  If a regex is given on the
command line, it overrides the driver's setting.

We also provide built-in regexes for the languages that already had
funcname patterns, and add an appropriate diff driver entry for C/++.
(The patterns are designed to run UTF-8 sequences into a single chunk
to make sure they remain readable.)

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 10:44:21 -08:00
Jeff King c7534ef4a1 userdiff: require explicitly allowing textconv
Diffs that have been produced with textconv almost certainly
cannot be applied, so we want to be careful not to generate
them in things like format-patch.

This introduces a new diff options, ALLOW_TEXTCONV, which
controls this behavior. It is off by default, but is
explicitly turned on for the "log" family of commands, as
well as the "diff" porcelain (but not diff-* plumbing).

Because both text conversion and external diffing are
controlled by these diff options, we can get rid of the
"plumbing versus porcelain" distinction when reading the
config. This was an attempt to control the same thing, but
suffered from being too coarse-grained.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-26 14:09:48 -07:00
Jeff King 9cb92c390c diff: add filter for converting binary to text
When diffing binary files, it is sometimes nice to see the
differences of a canonical text form rather than either a
binary patch or simply "binary files differ."

Until now, the only option for doing this was to define an
external diff command to perform the diff. This was a lot of
work, since the external command needed to take care of
doing the diff itself (including mode changes), and lost the
benefit of git's colorization and other options.

This patch adds a text conversion option, which converts a
file to its canonical format before performing the diff.
This is less flexible than an arbitrary external diff, but
is much less work to set up. For example:

  $ echo '*.jpg diff=exif' >>.gitattributes
  $ git config diff.exif.textconv exiftool
  $ git config diff.exif.binary false

allows one to see jpg diffs represented by the text output
of exiftool.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-18 08:02:55 -07:00
Jeff King 122aa6f9c0 diff: introduce diff.<driver>.binary
The "diff" gitattribute is somewhat overloaded right now. It
can say one of three things:

  1. this file is definitely binary, or definitely not
     (i.e., diff or !diff)
  2. this file should use an external diff engine (i.e.,
     diff=foo, diff.foo.command = custom-script)
  3. this file should use particular funcname patterns
     (i.e., diff=foo, diff.foo.(x?)funcname = some-regex)

Most of the time, there is no conflict between these uses,
since using one implies that the other is irrelevant (e.g.,
an external diff engine will decide for itself whether the
file is binary).

However, there is at least one conflicting situation: there
is no way to say "use the regular rules to determine whether
this file is binary, but if we do diff it textually, use
this funcname pattern." That is, currently setting diff=foo
indicates that the file is definitely text.

This patch introduces a "binary" config option for a diff
driver, so that one can explicitly set diff.foo.binary. We
default this value to "don't know". That is, setting a diff
attribute to "foo" and using "diff.foo.funcname" will have
no effect on the binaryness of a file. To get the current
behavior, one can set diff.foo.binary to true.

This patch also has one additional advantage: it cleans up
the interface to the userdiff code a bit. Before, calling
code had to know more about whether attributes were false,
true, or unset to determine binaryness. Now that binaryness
is a property of a driver, we can represent these situations
just by passing back a driver struct.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-18 08:02:26 -07:00
Jeff King be58e70dba diff: unify external diff and funcname parsing code
Both sets of code assume that one specifies a diff profile
as a gitattribute via the "diff=foo" attribute. They then
pull information about that profile from the config as
diff.foo.*.

The code for each is currently completely separate from the
other, which has several disadvantages:

  - there is duplication as we maintain code to create and
    search the separate lists of external drivers and
    funcname patterns

  - it is difficult to add new profile options, since it is
    unclear where they should go

  - the code is difficult to follow, as we rely on the
    "check if this file is binary" code to find the funcname
    pattern as a side effect. This is the first step in
    refactoring the binary-checking code.

This patch factors out these diff profiles into "userdiff"
drivers. A file with "diff=foo" uses the "foo" driver, which
is specified by a single struct.

Note that one major difference between the two pieces of
code is that the funcname patterns are always loaded,
whereas external drivers are loaded only for the "git diff"
porcelain; the new code takes care to retain that situation.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-18 08:02:21 -07:00