Jump to content
Sign in to follow this  
DarkClaw

BLOCK_DOWNLOAD_WINDOW

Recommended Posts

/** Size of the "block download window": how far ahead of our current height do we fetch?
* Larger windows tolerate larger download speed differences between peer, but increase the potential
* degree of disordering of blocks on disk (which make reindexing and pruning harder). We'll probably
* want to make this a per-peer adaptive value at some point. */

static const unsigned int BLOCK_DOWNLOAD_WINDOW = 1024; 

https://github.com/xaya/xaya/blob/2e25715a3a0e7abd8951313b410810d9f3fc4893/src/validation.h

Quote

If you are parsing the blk.dat files, be aware that blocks are not going to be in order. For example, you may encounter blocks in this order as you run through the file:


A B C E F D

This is because your bitcoin node will download blocks in parallel to download the blockchain as quickly as possible. Your node will download blocks further ahead of the current one as it goes, instead of waiting to receive each block in order.

The maximum distance ahead your node will fetch from (or the "maximum out-of-orderness") is controlled by BLOCK_DOWNLOAD_WINDOW in the bitcoin source code.

I have whats probably another crypto dev 101 question. Does this mean if I am parsing the blk*.dat files, I only need to look +/- 1024 blocks to find a block with the previous hash corresponding to the current hash? Ie, if no hash is found in that window then it is an orphan? Currently the biggest offset I see is 67 blocks...

Also, is it possible for me to somehow ensure I only download the blocks in order even if that means my chain is slightly (eg 5-10 minutes) out of date?

Edited by DarkClaw

Share this post


Link to post
Share on other sites

To be honest, I'm not an expert myself about the networking part of the core daemon, so my answer may not be 100% accurate.  But I agree with your understanding of the download window - it should mean that you will (at least typically) find a block with a matching "hash link" within 1024 blocks.  One exception is if there was a reorg longer than 1024 blocks, of course, but that is very unlikely.

I'm not aware of a way to force download in order, but you can of course adapt the block window and rebuild the client.  Please note that that can never ensure a fully linear download, though - even in practice, you will from time to time get at least short reorgs (mostly orphan blocks).  And they surely mess up the block order a bit, no matter what you do.

I don't know what other projects parsing the raw blk*.dat files do, but one potential approach is to do what the core daemon itself does:  Read everything in the order it is, build up an index (blockhash to where on disk it is) and an in-memory graph that represents the blockchain without data (just block hashes and pointers to previous blocks).  Then, you can find the longest chain in order from your graph and load the block data for it using the index.

Share this post


Link to post
Share on other sites

Thanks. That makes sense, I'll have to check out how the index is being built too. It seems like there should be a name for this sorting problem and some common algorithms people use...

Share this post


Link to post
Share on other sites

I don't think you need any special "sorting algorithm" for this.  Take a look at CBlockIndex in src/chain.h of the daemon's code.  Basically, you need to build a graph where each node holds:  Block hash, position on disk (which blk*.dat file and at which offset), chain work (which is what actually determines the "longest" chain) and the pointer to the previous node.  Also keep track of all the "tip" nodes that are not yet the previous one of another (e.g. in std::set).  That should be all you need.

Share this post


Link to post
Share on other sites

I'll check it out. How I was doing it is checking hash[idx] == prev_hash[idx + 1], then if it was false compare hash[idx] to all prev_hash[j] (j in 1:nBlock). If they are all false its an orphan, if not I would store the idx and j indices, eg I have a table like this (this is just partial, so they may not all pair up):

idx_hash  idx_prev_hash
17312     17314
17313        NA
17353        NA
17377        NA
17413        NA
17653        NA
17743        NA
17753        NA
17778        NA
18142        NA
18199        NA
18540        NA
18556        NA
18690     18727
18726     18728
18727     18729
18728     18730
18729     18731
18730     18732
18731     18733
18732     18734
18733     18735
18734     18736
18735     18737
18736     18738
18737     18739
18738     18740
18739     18741
18740     18742
18741     18743
18742     18744
18743     18745
18744     18746
18745     18747
18746     18748
18747     18749
18748     18750
18749     18751
18750     18752
18751     18753
18752     18754
18753     18755
18754     18756
18755     18757
18756     18758
18757     18691
18775        NA

Using the first row, I'd resort the blocks as:

1:idx_hash[1], idx_prev_hash[1], ((idx_hash[1] + 1):nBlocks)[-idx_prev_hash[1]]

And then I'd recalculate the table (which should be 2 rows shorter now) and keep doing that until there are only orphans left. This solution is not very efficient and incomplete since it will fail if there is an orphan chain (rather than just a single block), amongst other issues. It is more a prototype to help me wrap my head around the problem. But anyway you can see why I was thinking of "sorting algorithms", I'm sure in the end I will end up using the existing method.

Edit:

I can't write

hash[i]

in the main text, its being read as an italics tag. Not a big deal but itd be cool to have an "inline code" tag to prevent this. Maybe it exists already...

Edited by DarkClaw

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×