Getting Bitcoin historical data

I've found that gathering historical trade data is not as simple as it sounds. Many exchanges limit their historical data to only a few days, while others require a fee to be paid. Considering we're here to make money and not lose it, that is not acceptable.

Luckily, I've found that Kraken exchange does everything we want it to. Kraken is one of the more popular cryptocurrency exchanges. It provides an API which allows us to fetch all the historical data, granulated by trade. This is much better than OHLC (open-high-low-close) data that is usually available; we can calculate volume weighted average price (VWAP) which will represent the price movement more accurately than OHLC data.

Kraken does, however, have have a call rate limit, but that can be easily avoided by taking our time gathering the data.

To start, we set the url to Kraken's API, market variable to Bitcoin. We will also have a checkpoint file, so we can continue where we left off:

    BASE_URL = ""
    CHECKPOINT_FILE = "last_fetched.pickle"

The API allows us to only fetch trades that occured after a trade with a specified trade ID. The function that requests the data is as follows:

def fetch_data(timestamp):
    request = requests.get(BASE_URL + "Trades?pair={}&since={}".format(MARKET, timestamp))
    return request.json()

The core of the script is a while loop which tries to fetch the data from Kraken, orders the columns to [timestamppricevolume] format and appends it to a csv. It will look something like this:

while True:
        # fetch the data
        print("Fetching from {}..".format(anytime(int(last_timestamp))))
        response = fetch_data(start_url, market, last_trade_id)

        # check for errors
        if len(response['error']) != 0:
            print("Response error:", response['error'])
            pickle.dump([last_timestamp, last_trade_id], open("./last_fetched.pickle", "wb"))
        if len(response['result']) <= 1:

The data comes in the form of a list of lists, each entry containing information of a specific trade.

Blog post 03 - Bitcoin histroical data.png

We can easily put that in a Pandas DataFrame and get rid of the information we do not need.

            # get the data into a dataframe
            df = pd.DataFrame(response['result'][MARKET])

            # get rid of unnecessary columns
            df = df[df.columns[0:3]]

            # set column names
            df.columns = ["price", "volume", "timestamp"]

            # update variables for the checkpoint
            last_trade_id = response['result']['last']
            first_timestamp = df.timestamp.min()
            last_timestamp = df.timestamp.max()

            print("Fetched from: {} -- to: {}".format(anytime(first_timestamp), anytime(last_timestamp)))

Now we append the data to a .csv file and wait a bit.

            # save data to csv
            data_pd.to_csv("./kraken.csv", mode='a', header=False, index=False)

            # sleep to avoid breaking API call limit

We also need a function to save our progress to the checkpoint file:

def save_checkpoint(timestamp, trade_id):
    with open(CHECKPOINT_FILE, "wb") as f_out:
        pickle.dump([timestamp, trade_id], f_out)
    print("Checkpoint saved!")

And that's it! You can find the complete code here.
It might take a while, but all the available data from Kraken exchange will be downloaded and saved, as well as the last trade ID (so we can update the dataset in the future). If the program breaks, the last trade_id will be saved and we will be able to continue where we left off.