This post is a short overview of what is required for a minimal, download-only BitTorrent implementation.
Earlier this year, I wrote a BitTorrent client as an excuse to practice concurrency and networking concepts. But the resources and documentation I found while researching the protocol felt scattered, so I’m distilling my understanding here as a starting point for others.
1. Parse the .torrent
metainfo file
The .torrent
file contains information about the torrent tracker and the files to be downloaded.
Data is encoded using a serialization protocol called bencoding.
Parsing bencoded data is not significantly more difficult than parsing json, and there is likely a bencoding library available for your language.
2. Connect to the tracker
To connect to the torrent, an HTTP GET request is made to the tracker announce URL. The response provides a list of available peers.
3. Concurrent peer network connections
The client will connect to peers using TCP sockets. To support multiple simultaneous connections the client should be able to handle network operations asynchronously. There are two fundamental ways to do this in Python: (1) using threads, or (2) using an event loop with select() (or a library like Twisted which does so internally).
4. Peer protocol
The spec defines a number of messages that each peer must be prepared to send and receive. A minimal BitTorrent client may not need to implement all of these messages. In order to start downloading from a peer, a client needs to send a handshake, wait for a handshake response, send an ‘interested’ message, and wait for an ‘unchoke’ message. It can then start sending ‘request’ messages to request blocks. The peer will respond with ‘piece’ messages which contain the block data.
5. Torrent strategy
The client must download all blocks of all pieces and assemble them into the complete output file set. If any peers disconnect or fail to provide a block, the client must request from another peer. A more ambitious client may also attempt to further optimize its download strategy to improve download times.
Further reading
I found the following blog posts to be very helpful when I was getting started:
How to Write a Bittorrent Client (part 1)
(part 2)
(Kristen Widman)
Pitfalls when creating a BitTorrent client (Erick Rivas)
The best advice I picked up from them is (1) to rely on the unofficial BitTorrent spec, and (2) to use Wireshark to inspect network traffic to clarify ambiguities in the spec and to validate your implementation.
There are now many extensions to the original BitTorrent protocol, so you should stick with .torrent
files that do not use new or experimental features for your testing. I have had good luck with torrents from archive.org and bt.etree.org.
Good luck!