1 # Buffer internals {#buffer_internals}
5 A module called the *scoop* is used
for buffering data going into
6 librsync. It accumulates data when the application does not supply it
7 in large enough chunks
for librsync to make use of it.
9 The scoop
object is a set of fields in the
rs_job_t object::
15 Data from the read callback always goes into the scoop buffer.
17 The state functions call rs__scoop_read when they need some input
18 data. If the read callback blocks, it might take multiple attempts
19 before it can be filled. Each time, the state
function will also need
20 to block, and then be reawakened by the library.
22 Once the scoop has been sufficiently filled, it must be completely
23 consumed by the state
function. This is easy
if the state
function
24 always requests one unit of work at a time: a block, a file header
27 All
this means that the valid data is always located at the start of
28 the scoop, continuing
for scoop_avail bytes. The library is never
29 allowed to consume only part of the data.
31 One the state
function has consumed the data, it should call
32 rs__scoop_reset(), which resets scoop_avail to 0.
37 The library can set up data to be written out by putting a
38 pointer/length
for it in the output queue::
43 The job infrastructure will make sure
this is written out before the
44 next call into the state machine.
46 There is only one outq_ptr, so any given state
function can only
47 produce one contiguous block of output.
52 The scoop buffer may be used by the output queue. This means that
53 data can traverse the library with no extra copies: one copy into the
54 scoop buffer, and one copy out. In
this case outq_ptr points into
55 scoop_buf, and outq_bytes tells how much data needs to be written.
57 The state
function calls rs__scoop_reset before returning when it is
58 finished with the data in the scoop. However, the outq may still
59 point into the scoop buffer,
if it has not yet been able to be copied
60 out. This means that there is data in the scoop beyond scoop_avail
61 that must still be retained.
63 This is safe because neither the scoop nor the state
function will
64 get to run before the output queue has completely drained.
69 How much readahead is required?
71 At the moment (??) our rollsum and MD4 routines require a full
72 contiguous block to calculate a checksum. This could be relaxed, at a
73 possible loss of efficiency.
75 So calculating block checksums requires one full block to be in
78 When applying a patch, we only need enough readahead to unpack the
81 When calculating a delta, we need a full block to calculate its
82 checksum, plus space for the missed data. We can accumulate any
83 amount of missed data before emitting it as a literal; the more we can
84 accumulate the more compact the encoding will be.
The contents of this structure are private.