1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-25 14:46:09 +02:00
Commit Graph

214 Commits

Author SHA1 Message Date
Jeff Hostetler ee4512ed48 trace2: create new combined trace facility
Create a new unified tracing facility for git.  The eventual intent is to
replace the current trace_printf* and trace_performance* routines with a
unified set of git_trace2* routines.

In addition to the usual printf-style API, trace2 provides higer-level
event verbs with fixed-fields allowing structured data to be written.
This makes post-processing and analysis easier for external tools.

Trace2 defines 3 output targets.  These are set using the environment
variables "GIT_TR2", "GIT_TR2_PERF", and "GIT_TR2_EVENT".  These may be
set to "1" or to an absolute pathname (just like the current GIT_TRACE).

* GIT_TR2 is intended to be a replacement for GIT_TRACE and logs command
  summary data.

* GIT_TR2_PERF is intended as a replacement for GIT_TRACE_PERFORMANCE.
  It extends the output with columns for the command process, thread,
  repo, absolute and relative elapsed times.  It reports events for
  child process start/stop, thread start/stop, and per-thread function
  nesting.

* GIT_TR2_EVENT is a new structured format. It writes event data as a
  series of JSON records.

Calls to trace2 functions log to any of the 3 output targets enabled
without the need to call different trace_printf* or trace_performance*
routines.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-22 15:27:59 -08:00
Jeff King cbdb8d1439 remote-curl: tighten "version 2" check for smart-http
In a v2 smart-http conversation, the server should reply to our initial
request with a pkt-line saying "version 2". We check that with
starts_with(), but really that should be the only thing in that packet.
A response of "version 20" should not match.

Let's tighten this check to use strcmp(). Note that we don't need to
worry about a trailing newline here, because the ptk-line code will have
chomped it for us already.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-06 12:20:22 -08:00
Jeff King 8ee3e120cd remote-curl: refactor smart-http discovery
After making initial contact with an http server, we have to decide if
the server supports smart-http, and if so, which version. Our rules are
a bit inconsistent:

  1. For v0, we require that the content-type indicates a smart-http
     response. We also require the response to look vaguely like a
     pkt-line starting with "#". If one of those does not match, we fall
     back to dumb-http.

     But according to our http protocol spec[1]:

       Dumb servers MUST NOT return a return type starting with
       `application/x-git-`.

     If we see the expected content-type, we should consider it
     smart-http. At that point we can parse the pkt-line for real, and
     complain if it is not syntactically valid.

  2. For v2, we do not actually check the content-type. Our v2 protocol
     spec says[2]:

       When using the http:// or https:// transport a client makes a
       "smart" info/refs request as described in `http-protocol.txt`[...]

     and the http spec is clear that for a smart-http response[3]:

       The Content-Type MUST be `application/x-$servicename-advertisement`.

     So it is required according to the spec.

These inconsistencies were easy to miss because of the way the original
code was written as an inline conditional. Let's pull it out into its
own function for readability, and improve a few things:

 - we now predicate the smart/dumb decision entirely on the presence of
   the correct content-type

 - we do a real pkt-line parse before deciding how to proceed (and die
   if it isn't valid)

 - use skip_prefix() for comparing service strings, instead of
   constructing expected output in a strbuf; this avoids dealing with
   memory cleanup

Note that this _is_ tightening what the client will allow. It's all
according to the spec, but it's possible that other implementations
might violate these. However, violating these particular rules seems
like an odd choice for a server to make.

[1] Documentation/technical/http-protocol.txt, l. 166-167
[2] Documentation/technical/protocol-v2.txt, l. 63-64
[3] Documentation/technical/http-protocol.txt, l. 247

Helped-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-06 12:20:19 -08:00
Junio C Hamano 5f8b86db94 Merge branch 'jt/fetch-v2-sideband'
"git fetch" and "git upload-pack" learned to send all exchange over
the sideband channel while talking the v2 protocol.

* jt/fetch-v2-sideband:
  tests: define GIT_TEST_SIDEBAND_ALL
  {fetch,upload}-pack: sideband v2 fetch response
  sideband: reverse its dependency on pkt-line
  pkt-line: introduce struct packet_writer
  pack-protocol.txt: accept error packets in any context
  Use packet_reader instead of packet_read_line
2019-02-05 14:26:11 -08:00
Junio C Hamano 99c0bdd09d Merge branch 'ms/http-no-more-failonerror'
Debugging help for http transport.

* ms/http-no-more-failonerror:
  test: test GIT_CURL_VERBOSE=1 shows an error
  remote-curl: unset CURLOPT_FAILONERROR
  remote-curl: define struct for CURLOPT_WRITEFUNCTION
  http: enable keep_error for HTTP requests
  http: support file handles for HTTP_KEEP_ERROR
2019-01-29 12:47:55 -08:00
Masaya Suzuki b79bdd8c12 remote-curl: unset CURLOPT_FAILONERROR
By not setting CURLOPT_FAILONERROR, curl parses the HTTP response
headers even if the response is an error. This makes GIT_CURL_VERBOSE to
show the HTTP headers, which is useful for debugging.

Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-10 15:00:56 -08:00
Masaya Suzuki cf2fb92b00 remote-curl: define struct for CURLOPT_WRITEFUNCTION
In order to pass more values for rpc_in, define a struct and pass it as
an additional value.

Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-10 15:00:56 -08:00
Masaya Suzuki e6cf87b12d http: enable keep_error for HTTP requests
curl stops parsing a response when it sees a bad HTTP status code and it
has CURLOPT_FAILONERROR set. This prevents GIT_CURL_VERBOSE to show HTTP
headers on error.

keep_error is an option to receive the HTTP response body for those
error responses. By enabling this option, curl will process the HTTP
response headers, and they're shown if GIT_CURL_VERBOSE is set.

Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-10 15:00:56 -08:00
Masaya Suzuki 2d103c31c2 pack-protocol.txt: accept error packets in any context
In the Git pack protocol definition, an error packet may appear only in
a certain context. However, servers can face a runtime error (e.g. I/O
error) at an arbitrary timing. This patch changes the protocol to allow
an error packet to be sent instead of any packet.

Without this protocol spec change, when a server cannot process a
request, there's no way to tell that to a client. Since the server
cannot produce a valid response, it would be forced to cut a connection
without telling why. With this protocol spec change, the server can be
more gentle in this situation. An old client may see these error packets
as an unexpected packet, but this is not worse than having an unexpected
EOF.

Following this protocol spec change, the error packet handling code is
moved to pkt-line.c. Implementation wise, this implementation uses
pkt-line to communicate with a subprocess. Since this is not a part of
Git protocol, it's possible that a packet that is not supposed to be an
error packet is mistakenly parsed as an error packet. This error packet
handling is enabled only for the Git pack protocol parsing code
considering this.

Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02 13:05:30 -08:00
Masaya Suzuki 01f9ec64c8 Use packet_reader instead of packet_read_line
By using and sharing a packet_reader while handling a Git pack protocol
request, the same reader option is used throughout the code. This makes
it easy to set a reader option to the request parsing code.

Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02 13:05:27 -08:00
Nguyễn Thái Ngọc Duy 3b3357626e style: the opening '{' of a function is in a separate line
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-10 15:41:09 +09:00
Torsten Bögershausen cb8010bbff remote-curl.c: xcurl_off_t is not portable (on 32 bit platfoms)
When  setting
DEVELOPER = 1
DEVOPTS = extra-all

"gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516" errors out with
"comparison is always false due to limited range of data type"
"[-Werror=type-limits]"

It turns out that the function xcurl_off_t() has 2 flavours:

- It gives a warning 32 bit systems, like Linux
- It takes the signed ssize_t as a paramter, but the only caller is using
  a size_t (which is typically unsigned these days)

The original motivation of this function is to make sure that sizes > 2GiB
are handled correctly. The curl documentation says:
"For any given platform/compiler curl_off_t must be typedef'ed to a 64-bit
 wide signed integral data type"
On a 32 bit system "size_t" can be promoted into a 64 bit signed value
without loss of data, and therefore we may see the
"comparison is always false" warning.
On a 64 bit system it may happen, at least in theory, that size_t is > 2^63,
and then the promotion from an unsigned "size_t" into a signed "curl_off_t"
may be a problem.

One solution to suppress a possible compiler warning could be to remove
the function xcurl_off_t().

However, to be on the very safe side, we keep it and improve it:

- The len parameter is changed from ssize_t to size_t
- A temporally variable "size" is used, promoted int uintmax_t and the compared
  with "maximum_signed_value_of_type(curl_off_t)".
  Thanks to Junio C Hamano for this hint.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12 17:03:10 +09:00
Junio C Hamano ae109a9789 Merge branch 'en/double-semicolon-fix'
Code clean-up.

* en/double-semicolon-fix:
  Remove superfluous trailing semicolons
2018-09-24 10:30:47 -07:00
Elijah Newren c3b9bc94b9 Remove superfluous trailing semicolons
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-05 10:21:24 -07:00
Junio C Hamano 13bf260ac7 Merge branch 'js/typofixes'
Comment update.

* js/typofixes:
  remote-curl: remove spurious period
  git-compat-util.h: fix typo
2018-08-20 11:33:50 -07:00
Johannes Schindelin a8132410ee remote-curl: remove spurious period
We should not interrupt. sentences in the middle.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-08 09:07:18 -07:00
Brandon Williams eaf6a1b6e9 remote-curl: accept compressed responses with protocol v2
Configure curl to accept compressed responses when using protocol v2 by
setting `CURLOPT_ENCODING` to "", which indicates that curl should send
an "Accept-Encoding" header with all supported compression encodings.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-23 10:24:13 +09:00
Brandon Williams 1a53e692af remote-curl: accept all encodings supported by curl
Configure curl to accept all encodings which curl supports instead of
only accepting gzip responses.

This fixes an issue when using an installation of curl which is built
without the "zlib" feature. Since aa90b9697 (Enable info/refs gzip
decompression in HTTP client, 2012-09-19) we end up requesting "gzip"
encoding anyway despite libcurl not being able to decode it.  Worse,
instead of getting a clear error message indicating so, we end up
falling back to "dumb" http, producing a confusing and difficult to
debug result.

Since curl doesn't do any checking to verify that it supports the a
requested encoding, instead set the curl option `CURLOPT_ENCODING` with
an empty string indicating that curl should send an "Accept-Encoding"
header containing only the encodings supported by curl.

Reported-by: Anton Golubev <anton.golubev@gmail.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-23 10:24:12 +09:00
Junio C Hamano 96f29521a3 Merge branch 'ma/http-walker-no-partial'
"git http-fetch" (deprecated) had an optional and experimental
"feature" to fetch only commits and/or trees, which nobody used.
This has been removed.

* ma/http-walker-no-partial:
  walker: drop fields of `struct walker` which are always 1
  http-fetch: make `-a` standard behaviour
2018-05-08 15:59:35 +09:00
Junio C Hamano 9bfa0f9be3 Merge branch 'bw/protocol-v2'
The beginning of the next-gen transfer protocol.

* bw/protocol-v2: (35 commits)
  remote-curl: don't request v2 when pushing
  remote-curl: implement stateless-connect command
  http: eliminate "# service" line when using protocol v2
  http: don't always add Git-Protocol header
  http: allow providing extra headers for http requests
  remote-curl: store the protocol version the server responded with
  remote-curl: create copy of the service name
  pkt-line: add packet_buf_write_len function
  transport-helper: introduce stateless-connect
  transport-helper: refactor process_connect_service
  transport-helper: remove name parameter
  connect: don't request v2 when pushing
  connect: refactor git_connect to only get the protocol version once
  fetch-pack: support shallow requests
  fetch-pack: perform a fetch using v2
  upload-pack: introduce fetch server command
  push: pass ref prefixes when pushing
  fetch: pass ref prefixes when fetching
  ls-remote: pass ref prefixes when requesting a remote's refs
  transport: convert transport_get_remote_refs to take a list of ref prefixes
  ...
2018-05-08 15:59:16 +09:00
Martin Ågren 0b6b342954 walker: drop fields of `struct walker` which are always 1
After the previous commit, both users of `struct walker` set `get_tree`,
`get_history` and `get_all` to 1. Drop those fields and simplify the
walker implementation accordingly.

Let's hope that any out-of-tree users will not mind this change. They
should notice that the compilation fails as they try to set these
fields. (If they do not set them, note that `get_http_walker()` leaves
them undefined, so the behavior will have been undefined all the time.)

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-24 10:55:04 +09:00
Stefan Beller d807c4a01d exec_cmd: rename to use dash in file name
This is more consistent with the project style. The majority of Git's
source files use dashes in preference to underscores in their file names.

Signed-off-by: Stefan Beller <sbeller@google.com>
2018-04-11 18:11:00 +09:00
Brandon Williams a4d78ce26b remote-curl: don't request v2 when pushing
In order to be able to ship protocol v2 with only supporting fetch, we
need clients to not issue a request to use protocol v2 when pushing
(since the client currently doesn't know how to push using protocol v2).
This allows a client to have protocol v2 configured in
`protocol.version` and take advantage of using v2 for fetch and falling
back to using v0 when pushing while v2 for push is being designed.

We could run into issues if we didn't fall back to protocol v2 when
pushing right now.  This is because currently a server will ignore a request to
use v2 when contacting the 'receive-pack' endpoint and fall back to
using v0, but when push v2 is rolled out to servers, the 'receive-pack'
endpoint will start responding using v2.  So we don't want to get into a
state where a client is requesting to push with v2 before they actually
know how to push using v2.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15 12:01:09 -07:00
Brandon Williams 0f1dc53f45 remote-curl: implement stateless-connect command
Teach remote-curl the 'stateless-connect' command which is used to
establish a stateless connection with servers which support protocol
version 2.  This allows remote-curl to act as a proxy, allowing the git
client to communicate natively with a remote end, simply using
remote-curl as a pass through to convert requests to http.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15 12:01:09 -07:00
Brandon Williams 237ffedd46 http: eliminate "# service" line when using protocol v2
When an http info/refs request is made, requesting that protocol v2 be
used, don't send a "# service" line since this line is not part of the
v2 spec.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15 12:01:09 -07:00
Brandon Williams 884e586f9e http: don't always add Git-Protocol header
Instead of always sending the Git-Protocol header with the configured
version with every http request, explicitly send it when discovering
refs and then only send it on subsequent http requests if the server
understood the version requested.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15 12:01:09 -07:00
Brandon Williams 49e85e9500 remote-curl: store the protocol version the server responded with
Store the protocol version the server responded with when performing
discovery.  This will be used in a future patch to either change the
'Git-Protocol' header sent in subsequent requests or to determine if a
client needs to fallback to using a different protocol version.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15 12:01:09 -07:00
Brandon Williams f08a5d42ea remote-curl: create copy of the service name
Make a copy of the service name being requested instead of relying on
the buffer pointed to by the passed in 'const char *' to remain
unchanged.

Currently, all service names are string constants, but a subsequent
patch will introduce service names from external sources.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15 12:01:09 -07:00
Brandon Williams 8f6982b4e1 protocol: introduce enum protocol_version value protocol_v2
Introduce protocol_v2, a new value for 'enum protocol_version'.
Subsequent patches will fill in the implementation of protocol_v2.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 14:15:07 -07:00
Brandon Williams ad6ac1244f connect: discover protocol version outside of get_remote_heads
In order to prepare for the addition of protocol_v2 push the protocol
version discovery outside of 'get_remote_heads()'.  This will allow for
keeping the logic for processing the reference advertisement for
protocol_v1 and protocol_v0 separate from the logic for protocol_v2.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 14:15:06 -07:00
Junio C Hamano 69917e6439 Merge branch 'jk/push-options-via-transport-fix'
"git push" over http transport did not unquote the push-options
correctly.

* jk/push-options-via-transport-fix:
  remote-curl: unquote incoming push-options
  t5545: factor out http repository setup
2018-02-28 13:37:58 -08:00
Junio C Hamano 2fb346c06a Merge branch 'js/packet-read-line-check-null'
Some low level protocol codepath could crash when they get an
unexpected flush packet, which is now fixed.

* js/packet-read-line-check-null:
  always check for NULL return from packet_read_line()
  correct error messages for NULL packet_read_line()
2018-02-27 10:33:55 -08:00
Jeff King 90dce21eb0 remote-curl: unquote incoming push-options
The transport-helper protocol c-style quotes the value of
any options passed to the helper via the "option <key> <value>"
directive. However, remote-curl doesn't actually unquote the
push-option values, meaning that we will send the quoted
version to the other side (whereas git-over-ssh would send
the raw value).

The pack-protocol.txt documentation defines the push-options
as a series of VCHARs, which excludes most characters that
would need quoting. But:

  1. You can still see the bug with a valid push-option that
     starts with a double-quote (since that triggers
     quoting).

  2. We do currently handle any non-NUL characters correctly
     in git-over-ssh. So even though the spec does not say
     that we need to handle most quoted characters, it's
     nice if our behavior is consistent between protocols.

There are two new tests: the "direct" one shows that this
already works in the non-http case, and the http one covers
this bugfix.

Reported-by: Jon Simons <jon@jonsimons.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-20 11:27:06 -08:00
Jon Simons bb1356dc64 always check for NULL return from packet_read_line()
The packet_read_line() function will die if it sees any
protocol or socket errors. But it will return NULL for a
flush packet; some callers which are not expecting this may
dereference NULL if they get an unexpected flush. This would
involve the other side breaking protocol, but we should
flag the error rather than segfault.

Signed-off-by: Jon Simons <jon@jonsimons.org>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-08 12:37:40 -08:00
Jeff Hostetler acb0c57260 fetch: support filters
Teach fetch to support filters. This is only allowed for the remote
configured in extensions.partialcloneremote.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 09:58:51 -08:00
Jonathan Tan 88e2f9ed8e introduce fetch-object: fetch one promisor object
Introduce fetch-object, providing the ability to fetch one object from a
promisor remote.

This uses fetch-pack. To do this, the transport mechanism has been
updated with 2 flags, "from-promisor" to indicate that the resulting
pack comes from a promisor remote (and thus should be annotated as such
by index-pack), and "no-dependents" to indicate that only the objects
themselves need to be fetched (but fetching additional objects is
nevertheless safe).

Whenever "no-dependents" is used, fetch-pack will refrain from using any
object flags, because it is most likely invoked as part of a dynamic
object fetch by another Git command (which may itself use object flags).
An alternative to this is to leave fetch-pack alone, and instead update
the allocation of flags so that fetch-pack's flags never overlap with
any others, but this will end up shrinking the number of flags available
to nearly every other Git command (that is, every Git command that
accesses objects), so the approach in this commit was used instead.

This will be tested in a subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 09:46:05 -08:00
Brandon Williams b2141fc1d2 config: don't include config.h by default
Stop including config.h by default in cache.h.  Instead only include
config.h in those files which require use of the config system.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15 12:56:22 -07:00
Junio C Hamano 4c01f67d91 Merge branch 'dt/http-postbuffer-can-be-large'
Allow the http.postbuffer configuration variable to be set to a
size that can be expressed in size_t, which can be larger than
ulong on some platforms.

* dt/http-postbuffer-can-be-large:
  http.postbuffer: allow full range of ssize_t values
2017-04-23 22:07:45 -07:00
Junio C Hamano b1081e4004 Merge branch 'bc/object-id'
Conversion from unsigned char [40] to struct object_id continues.

* bc/object-id:
  Documentation: update and rename api-sha1-array.txt
  Rename sha1_array to oid_array
  Convert sha1_array_for_each_unique and for_each_abbrev to object_id
  Convert sha1_array_lookup to take struct object_id
  Convert remaining callers of sha1_array_lookup to object_id
  Make sha1_array_append take a struct object_id *
  sha1-array: convert internal storage for struct sha1_array to object_id
  builtin/pull: convert to struct object_id
  submodule: convert check_for_new_submodule_commits to object_id
  sha1_name: convert disambiguate_hint_fn to take object_id
  sha1_name: convert struct disambiguate_state to object_id
  test-sha1-array: convert most code to struct object_id
  parse-options-cb: convert sha1_array_append caller to struct object_id
  fsck: convert init_skiplist to struct object_id
  builtin/receive-pack: convert portions to struct object_id
  builtin/pull: convert portions to struct object_id
  builtin/diff: convert to struct object_id
  Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ
  Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ
  Define new hash-size constants for allocating memory
2017-04-19 21:37:13 -07:00
David Turner 37ee680d9b http.postbuffer: allow full range of ssize_t values
Unfortunately, in order to push some large repos where a server does
not support chunked encoding, the http postbuffer must sometimes
exceed two gigabytes.  On a 64-bit system, this is OK: we just malloc
a larger buffer.

This means that we need to use CURLOPT_POSTFIELDSIZE_LARGE to set the
buffer size.

Signed-off-by: David Turner <dturner@twosigma.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-13 18:24:32 -07:00
brian m. carlson 910650d2f8 Rename sha1_array to oid_array
Since this structure handles an array of object IDs, rename it to struct
oid_array.  Also rename the accessor functions and the initialization
constant.

This commit was produced mechanically by providing non-Documentation
files to the following Perl one-liners:

    perl -pi -E 's/struct sha1_array/struct oid_array/g'
    perl -pi -E 's/\bsha1_array_/oid_array_/g'
    perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g'

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-31 08:33:56 -07:00
brian m. carlson ee3051bd23 sha1-array: convert internal storage for struct sha1_array to object_id
Make the internal storage for struct sha1_array use an array of struct
object_id internally.  Update the users of this struct which inspect its
internals.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-28 09:59:34 -07:00
Brandon Williams 511155db51 remote-curl: allow push options
Teach remote-curl to understand push options and to be able to convey
them across HTTP.

Signed-off-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-22 15:41:21 -07:00
Junio C Hamano d984592043 Merge branch 'dt/smart-http-detect-server-going-away'
When the http server gives an incomplete response to a smart-http
rpc call, it could lead to client waiting for a full response that
will never come.  Teach the client side to notice this condition
and abort the transfer.

An improvement counterproposal has failed.
cf. <20161114194049.mktpsvgdhex2f4zv@sigill.intra.peff.net>

* dt/smart-http-detect-server-going-away:
  upload-pack: optionally allow fetching any sha1
  remote-curl: don't hang when a server dies before any output
2017-01-10 15:24:25 -08:00
Junio C Hamano 8a2882f23e Merge branch 'jk/http-walker-limit-redirect-2.9'
Transport with dumb http can be fooled into following foreign URLs
that the end user does not intend to, especially with the server
side redirects and http-alternates mechanism, which can lead to
security issues.  Tighten the redirection and make it more obvious
to the end user when it happens.

* jk/http-walker-limit-redirect-2.9:
  http: treat http-alternates like redirects
  http: make redirects more obvious
  remote-curl: rename shadowed options variable
  http: always update the base URL for redirects
  http: simplify update_url_from_redirect
2016-12-19 14:45:32 -08:00
Jeff King 50d3413740 http: make redirects more obvious
We instruct curl to always follow HTTP redirects. This is
convenient, but it creates opportunities for malicious
servers to create confusing situations. For instance,
imagine Alice is a git user with access to a private
repository on Bob's server. Mallory runs her own server and
wants to access objects from Bob's repository.

Mallory may try a few tricks that involve asking Alice to
clone from her, build on top, and then push the result:

  1. Mallory may simply redirect all fetch requests to Bob's
     server. Git will transparently follow those redirects
     and fetch Bob's history, which Alice may believe she
     got from Mallory. The subsequent push seems like it is
     just feeding Mallory back her own objects, but is
     actually leaking Bob's objects. There is nothing in
     git's output to indicate that Bob's repository was
     involved at all.

     The downside (for Mallory) of this attack is that Alice
     will have received Bob's entire repository, and is
     likely to notice that when building on top of it.

  2. If Mallory happens to know the sha1 of some object X in
     Bob's repository, she can instead build her own history
     that references that object. She then runs a dumb http
     server, and Alice's client will fetch each object
     individually. When it asks for X, Mallory redirects her
     to Bob's server. The end result is that Alice obtains
     objects from Bob, but they may be buried deep in
     history. Alice is less likely to notice.

Both of these attacks are fairly hard to pull off. There's a
social component in getting Mallory to convince Alice to
work with her. Alice may be prompted for credentials in
accessing Bob's repository (but not always, if she is using
a credential helper that caches). Attack (1) requires a
certain amount of obliviousness on Alice's part while making
a new commit. Attack (2) requires that Mallory knows a sha1
in Bob's repository, that Bob's server supports dumb http,
and that the object in question is loose on Bob's server.

But we can probably make things a bit more obvious without
any loss of functionality. This patch does two things to
that end.

First, when we encounter a whole-repo redirect during the
initial ref discovery, we now inform the user on stderr,
making attack (1) much more obvious.

Second, the decision to follow redirects is now
configurable. The truly paranoid can set the new
http.followRedirects to false to avoid any redirection
entirely. But for a more practical default, we will disallow
redirects only after the initial ref discovery. This is
enough to thwart attacks similar to (2), while still
allowing the common use of redirects at the repository
level. Since c93c92f30 (http: update base URLs when we see
redirects, 2013-09-28) we re-root all further requests from
the redirect destination, which should generally mean that
no further redirection is necessary.

As an escape hatch, in case there really is a server that
needs to redirect individual requests, the user can set
http.followRedirects to "true" (and this can be done on a
per-server basis via http.*.followRedirects config).

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-06 12:32:48 -08:00
Jeff King fcaa6e64b3 remote-curl: rename shadowed options variable
The discover_refs() function has a local "options" variable
to hold the http_get_options we pass to http_get_strbuf().
But this shadows the global "struct options" that holds our
program-level options, which cannot be accessed from this
function.

Let's give the local one a more descriptive name so we can
tell the two apart.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-06 12:32:48 -08:00
David Turner 296b847c0d remote-curl: don't hang when a server dies before any output
In the event that a HTTP server closes the connection after giving a
200 but before giving any packets, we don't want to hang forever
waiting for a response that will never come.  Instead, we should die
immediately.

One case where this happens is when attempting to fetch a dangling
object by its object name.  In this case, the server dies before
sending any data.  Prior to this patch, fetch-pack would wait for
data from the server, and remote-curl would wait for fetch-pack,
causing a deadlock.

Despite this patch, there is other possible malformed input that could
cause the same deadlock (e.g. a half-finished pktline, or a pktline but
no trailing flush).  There are a few possible solutions to this:

1. Allowing remote-curl to tell fetch-pack about the EOF (so that
fetch-pack could know that no more data is coming until it says
something else).  This is tricky because an out-of-band signal would
be required, or the http response would have to be re-framed inside
another layer of pkt-line or something.

2. Make remote-curl understand some of the protocol.  It turns out
that in addition to understanding pkt-line, it would need to watch for
ack/nak.  This is somewhat fragile, as information about the protocol
would end up in two places.  Also, pkt-lines which are already at the
length limit would need special handling.

Both of these solutions would require a fair amount of work, whereas
this hack is easy and solves at least some of the problem.

Still to do: it would be good to give a better error message
than "fatal: The remote end hung up unexpectedly".

Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-18 13:05:46 -08:00
Junio C Hamano a460ea4a3c Merge branch 'nd/shallow-deepen'
The existing "git fetch --depth=<n>" option was hard to use
correctly when making the history of an existing shallow clone
deeper.  A new option, "--deepen=<n>", has been added to make this
easier to use.  "git clone" also learned "--shallow-since=<date>"
and "--shallow-exclude=<tag>" options to make it easier to specify
"I am interested only in the recent N months worth of history" and
"Give me only the history since that version".

* nd/shallow-deepen: (27 commits)
  fetch, upload-pack: --deepen=N extends shallow boundary by N commits
  upload-pack: add get_reachable_list()
  upload-pack: split check_unreachable() in two, prep for get_reachable_list()
  t5500, t5539: tests for shallow depth excluding a ref
  clone: define shallow clone boundary with --shallow-exclude
  fetch: define shallow boundary with --shallow-exclude
  upload-pack: support define shallow boundary by excluding revisions
  refs: add expand_ref()
  t5500, t5539: tests for shallow depth since a specific date
  clone: define shallow clone boundary based on time with --shallow-since
  fetch: define shallow boundary with --shallow-since
  upload-pack: add deepen-since to cut shallow repos based on time
  shallow.c: implement a generic shallow boundary finder based on rev-list
  fetch-pack: use a separate flag for fetch in deepening mode
  fetch-pack.c: mark strings for translating
  fetch-pack: use a common function for verbose printing
  fetch-pack: use skip_prefix() instead of starts_with()
  upload-pack: move rev-list code out of check_non_tip()
  upload-pack: make check_non_tip() clean things up on error
  upload-pack: tighten number parsing at "deepen" lines
  ...
2016-10-10 14:03:50 -07:00
Junio C Hamano de61cebde7 Merge branch 'jk/common-main-2.8' into jk/common-main
* jk/common-main-2.8:
  mingw: declare main()'s argv as const
  common-main: call git_setup_gettext()
  common-main: call restore_sigpipe_to_default()
  common-main: call sanitize_stdfds()
  common-main: call git_extract_argv0_path()
  add an extra level of indirection to main()
2016-07-06 10:02:57 -07:00