commits
If a file was an exact multiple of 1024 bytes (the size of an internal
buffer) and was missing a final newline, the LineReaderAt implementation
would drop the last line, leading to an unexpected EOF error on apply.
In addition to fixing the bug, slightly change the behavior of
ReadLineAt to reflect how it is actually used:
1. Clarify that the return value n includes all lines instead of only
lines with a final newline. This was already true except in the
case of the bug fixed by this commit.
2. Only return io.EOF if fewer lines are read than requested. The
previous implementation also returned io.EOF if the last line was
missing a final newline, but this was confusing and didn't really
serve a purpose.
This is technically a breaking change for external implementations but
an implementation that exactly followed the "spec" was already broken in
certain edge cases.
Git does not quote file names containing spaces when generating header
lines. However, I misread the Git implementation and thought it used
spaces as a name terminator by default, implying that names with spaces
would be quoted.
Rewrite the the name parsing code to better match Git. Specifically,
when both names in a header are not quoted, test all splits to see if
any of them produce two equal names. Also account for spaces in file
names when parsing file metadata.
Finally, allow trailing spaces on file names, which seem to be allowed
by Git (although this sounds like a terrible idea to me.)
`git am` will accept patches which aren't fully RFC-compliant, as long
as they start with `From` and have a `Subject` line. Relax
ParsePatchHeader() so that it will accept such patches as well.
Specifically, it will tolerate patches of the form:
From: <foo@company.com>
In this case, the `Author` field will contain `foo@company.com
<foo@company.com>`. Duplicate this behavior.
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Co-authored-by: George Dunlap <george.dunlap@citrix.com>
...when parsing emails, similar to `git am`.
Add a new field, `BodyAppendix` to PatchHeader.
Modify `scanMessageBody` to accept a boolean argument saying whether
to separate out the appendix or not. Do this by keeping two string
builders, and having it switch to the appendix builder when it finds a
`---` line.
Handling the newlines at the end as expected requires moving things
around a bit.
First, we were trimming space from the line once to decide whether the
line was empty, and then trimming space again if we determined it
wasn't empty. This only needs to be done once.
Then, do all the trimming (both of whitespace and the prefix) first,
before deciding what to do about the line.
Request BodyAppendix separately when parsing a mail, but not a commit
message.
Add some tests to verify that it works as expected.
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Co-authored-by: George Dunlap <george.dunlap@citrix.com>
Primarily to get rid of [PATCH] at the front, but while we're here
just be generally compatible with `git am`:
* Remove `re` and variations
* Remove whitespace
* Remove anything in brackets
But only at the very beginning of the subject.
Store anything removed in this way in PatchHeader.SubjectPrefix.
Inspired by
https://github.com/git/git/blob/master/mailinfo.c:cleanup_subject()
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Co-authored-by: George Dunlap <george.dunlap@citrix.com>
Removed the distinction between parsed and raw dates and simply return
an error for unsupported date formats. Arbitrary date support was
probably a premature optimization and would be better supported by a
method to register custom date parsing functions.
Simple time.Time fields make the structure easier to use and mean that
a zero value always indicates the patch did not include a date.
A closer look at the Git source shows that the rfc2822 and default
formats do not zero-pad days, so we need to accept single-digit days
when parsing. The single-digit pattern will also accept the padded
format, so there's no loss of functionality.
Change the parsing test to use single-digit values to emphasize when
padding is required and when it is not.
When applying a text fragment that deletes all content in the file, make
sure there is no content left in the source after using all the lines in
the fragment. If there is is extra content, it's a conflict: something
added more lines to the file after generating the patch.
Git usually refers to the whole thing (title + body) as the "commit
message", so using different terms for the parts clarifies things. Also
add a new Message() function that returns the combined string.
This function parses patch headers (the preamble returned by the
existing Parse function) to extract information about the commit that
generated the patch. This is useful when patches are an interchange
format and this library is applying commits generated elsewhere.
Because of the variety of header formats, parsing is fairly lenient and
best-effort, although certain invalid input does cause errors.
This also extracts some test utilities from the apply tests for reuse.
Use golangci-lint to coordinate linting instead of installing tools
individually. The workflow installs the binary directly instead of using
one of the existing public actions.
copyFrom and copyLinesFrom did not increment the start offset after
reading, so the same lines were copied over and over again.
Also refactor the apply tests to use a common type now that there are
three variants that do almost the same thing for each test.
Applying a file now rejects future text fragment applies under the
assumption that all relevant fragments were part of the file.
Try to avoid confusion about whether variables refer to byte or line
counts/positions. It still isn't perfect, but I'm not sure how to
further clarify without being overly verbose.
This adds minimal new coverage, since the apply tests already exercise
this, but it's complicated enough that dedicated tests will be helpful.
This removes the distinction between "strict" and "fuzzy" application
by allowing future methods on Applier that control settings. It also
avoids state tracking in the text fragment apply signature by moving it
into the Applier type.
While in practice, an Applier will be used once and discarded, the
capability is provided to reset it for multiple uses.
This is no longer used for text application, so revert the parser back
to using StringReader.
This is functionally equivalent to the previous version (except for one
error case), but uses the new interface. I think the code is simpler
overall because it removes the line tracking.
This is a line-oriented parallel to io.ReaderAt, meant for text applies.
While the mapping isn't quite as clean as in the binary case, a text
apply still reads a fixed chunk of lines starting at a specific line
number and modifies them. This also allows a consistent interface for
strict and fuzzy applies.
The implementation wraps an io.ReaderAt and reads data in chunks,
indexing line boundaries as it goes. This is probably not the most
efficient way to implement this interface, but it works and allows file
application to take a consistent interface.
This better matches the actual contract of the delta patch, which reads
fixed size chunks of the source files at arbitrary positions.
These cover some of the possible errors with delta fragments. Many of
the same tests could be implemented slightly easier by testing the
helper functions, but the infrastructure is already set up to test the
full function.
This is made easier by the bin.go CLI, which can parse and encode binary
patch data. Once parsed, fragments were manipulated with truncate and
dd, encoded, and placed in the patch files.
This ensures a default size is used for large copy instructions.
Covers basic literal and delta application. Input files were created by
using dd and the 'conv=notrunc' option to modify sections of files
created from /dev/urandom. All file start with two null bytes to
convince Git they are binary.
Currently untested, but based on code I validated against some generated
patches. Tests comming in a future commit.
Conflict errors are now represented by an exported type that is
compatible with errors.Is instead of using the method defined on
ApplyError. This makes the tests slightly cleaner and should be more
idiomatic for clients.
These cover the errors that can happen with a single fragment and no
additional manipulation by the caller of ApplyStrict.
This ensures the line number is used and application isn't based only on
matching context lines.
This matches the type used for positions in text fragments.
io.EOF was not properly accounted for when dealing with patches that are
missing trailing newline characters.
These cover all of the cases (I think) where application should succeed.
This wraps the underlying error with optional position information and
provides a way to test if the error was due to a conflict. At the
moment, details about the conflict are not exposed outside of the
message string.
Text fragment application is implemented but untested and with several
TODOs, binary fragment application is unimplemented and will panic.
Applying a fragment requires the content to match the stored counts, so
there must be a way to check this. Parsed fragments should always be
valid, but manually created or modified fragments may be invalid.
When applying, we need to copy all data after the last fragment line.
This function provides a way to get back the possibly-buffered io.Reader
that backs the LineReader.
Document the method as part of the type so it is processed by godoc.
LineReader is a wrapper around StringReader that counts line numbers.
This is not currently used by the parser, but will be used by the apply
functions.
The tests were already split up in this way, so it makes sense to have
the parsing functions split as well.
These are fairly basic as the decoder is also exercised by the fragment
parsing tests, but they cover some errror cases that may not be covered
otherwise.
Git uses a unique Base85 encoding with different characters than the
ascii85 encoding implemented by Go, so add a custom decoding function.
Once decoded, use zlib instead of the raw DEFLATE algorithm to
decompress the data.
These issues were caught by some basic parsing tests which are added
here as well.
If a file was an exact multiple of 1024 bytes (the size of an internal
buffer) and was missing a final newline, the LineReaderAt implementation
would drop the last line, leading to an unexpected EOF error on apply.
In addition to fixing the bug, slightly change the behavior of
ReadLineAt to reflect how it is actually used:
1. Clarify that the return value n includes all lines instead of only
lines with a final newline. This was already true except in the
case of the bug fixed by this commit.
2. Only return io.EOF if fewer lines are read than requested. The
previous implementation also returned io.EOF if the last line was
missing a final newline, but this was confusing and didn't really
serve a purpose.
This is technically a breaking change for external implementations but
an implementation that exactly followed the "spec" was already broken in
certain edge cases.
Git does not quote file names containing spaces when generating header
lines. However, I misread the Git implementation and thought it used
spaces as a name terminator by default, implying that names with spaces
would be quoted.
Rewrite the the name parsing code to better match Git. Specifically,
when both names in a header are not quoted, test all splits to see if
any of them produce two equal names. Also account for spaces in file
names when parsing file metadata.
Finally, allow trailing spaces on file names, which seem to be allowed
by Git (although this sounds like a terrible idea to me.)
`git am` will accept patches which aren't fully RFC-compliant, as long
as they start with `From` and have a `Subject` line. Relax
ParsePatchHeader() so that it will accept such patches as well.
Specifically, it will tolerate patches of the form:
From: <foo@company.com>
In this case, the `Author` field will contain `foo@company.com
<foo@company.com>`. Duplicate this behavior.
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Co-authored-by: George Dunlap <george.dunlap@citrix.com>
...when parsing emails, similar to `git am`.
Add a new field, `BodyAppendix` to PatchHeader.
Modify `scanMessageBody` to accept a boolean argument saying whether
to separate out the appendix or not. Do this by keeping two string
builders, and having it switch to the appendix builder when it finds a
`---` line.
Handling the newlines at the end as expected requires moving things
around a bit.
First, we were trimming space from the line once to decide whether the
line was empty, and then trimming space again if we determined it
wasn't empty. This only needs to be done once.
Then, do all the trimming (both of whitespace and the prefix) first,
before deciding what to do about the line.
Request BodyAppendix separately when parsing a mail, but not a commit
message.
Add some tests to verify that it works as expected.
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Co-authored-by: George Dunlap <george.dunlap@citrix.com>
Primarily to get rid of [PATCH] at the front, but while we're here
just be generally compatible with `git am`:
* Remove `re` and variations
* Remove whitespace
* Remove anything in brackets
But only at the very beginning of the subject.
Store anything removed in this way in PatchHeader.SubjectPrefix.
Inspired by
https://github.com/git/git/blob/master/mailinfo.c:cleanup_subject()
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Co-authored-by: George Dunlap <george.dunlap@citrix.com>
Removed the distinction between parsed and raw dates and simply return
an error for unsupported date formats. Arbitrary date support was
probably a premature optimization and would be better supported by a
method to register custom date parsing functions.
Simple time.Time fields make the structure easier to use and mean that
a zero value always indicates the patch did not include a date.
A closer look at the Git source shows that the rfc2822 and default
formats do not zero-pad days, so we need to accept single-digit days
when parsing. The single-digit pattern will also accept the padded
format, so there's no loss of functionality.
Change the parsing test to use single-digit values to emphasize when
padding is required and when it is not.
This function parses patch headers (the preamble returned by the
existing Parse function) to extract information about the commit that
generated the patch. This is useful when patches are an interchange
format and this library is applying commits generated elsewhere.
Because of the variety of header formats, parsing is fairly lenient and
best-effort, although certain invalid input does cause errors.
This also extracts some test utilities from the apply tests for reuse.
This removes the distinction between "strict" and "fuzzy" application
by allowing future methods on Applier that control settings. It also
avoids state tracking in the text fragment apply signature by moving it
into the Applier type.
While in practice, an Applier will be used once and discarded, the
capability is provided to reset it for multiple uses.
This is a line-oriented parallel to io.ReaderAt, meant for text applies.
While the mapping isn't quite as clean as in the binary case, a text
apply still reads a fixed chunk of lines starting at a specific line
number and modifies them. This also allows a consistent interface for
strict and fuzzy applies.
The implementation wraps an io.ReaderAt and reads data in chunks,
indexing line boundaries as it goes. This is probably not the most
efficient way to implement this interface, but it works and allows file
application to take a consistent interface.
These cover some of the possible errors with delta fragments. Many of
the same tests could be implemented slightly easier by testing the
helper functions, but the infrastructure is already set up to test the
full function.
This is made easier by the bin.go CLI, which can parse and encode binary
patch data. Once parsed, fragments were manipulated with truncate and
dd, encoded, and placed in the patch files.
Git uses a unique Base85 encoding with different characters than the
ascii85 encoding implemented by Go, so add a custom decoding function.
Once decoded, use zlib instead of the raw DEFLATE algorithm to
decompress the data.
These issues were caught by some basic parsing tests which are added
here as well.