A dataset (sample) of CEX-DEX transactions on the Ethereum mainnet identified by a set of heuristics implemented on Dune Analytics, including the information of the block, the transaction, and the DEX trade(s) executed by the transaction.
The dataset is curated by Fei and Danning, and is part of the TLDR 2024 fellowship program. Special Thanks to Tivas for helping organize this repository.
The dataset is now provided with a 1-day sample. For the entire-range dataset and more details of our work, stay in tune for this repo and the TLDR Conference 2025.
date_range: 2023-08-08 to 2025-03-08
size: ~4 GB
for the entire time range.
structure: [8,668,130, 20]
(8,668,130 rows, 20 columns)
source: Dune
blockchain: Ethereum
variables:
block_number
[Integer]: Block numberblock_time
[DateTime]: UTC timestamp of the blocktx_hash
[String]: Transaction Hashtx_index
[Integer]: Index of the transaction within the blockfrom_addr
[String]: Address sending the transactionto_addr
[String]: Address receiving the transactionmev_bot_label
[String]: Label identifying the MEV bot (if applicable)base_fees
[Float]: Base fees paid for the transactionpriority_fees
[Float]: Tips or priority fees paid to the buildercb_transfer
[Float]: Coinbase transfer paid to the buildermev_value
[Float]: priorityfees + cbtransfervolume
[Float]: USD notional volume of the tradetoken_bought_amount
[Float]: Amount of token boughttoken_sold_amount
[Float]: Amount of token soldtoken_bought_contract
[String]: ERC-20 contract address of the token boughttoken_sold_contract
[String]: ERC-20 contract address of the token soldtoken_bought_symbol
[String]: Symbol of the token boughttoken_sold_symbol
[String]: Symbol of the token soldpair
[String]: Token trading pairmulti_trade
[Integer]: Flag indicating if the trade includes multiple swaps (0 or 1)other_notes: Transactions with identifiable MEV bot interactions are labeled. Dataset may include rows with <nil>
in MEV bot label.
The data pipeline of this project consists of two scripts, one to pull CEX-DEX transaction data from Dune and one to get CEX prices for tokens covered in the dataset during this time range.
The structure of the data from the Dune query output can be seen in the table above.
To pull the dataset from Dune, you can use the Dune API endpoints with the Pagination feature to divide the dataset into downloadable .csv files.
A python script dune_api.py
for pulling the CEX-DEX dataset using Pagination from Dune API endpoints is provided.
After pulling the CEX-DEX dataset from Dune, you can check the tokens' CEX prices at the block_time
or different markouts.
In our project, we use Binance Spot historical quote data from Tardis. For details about the Tardis data, check the Tardis documentation.
We here provide a Python script parsing_tardis_data.py
for parsing data of token price quoted in USDT on Binance Spot from Tardis. To run this script, add a valid TARDIS API key and data directory into lines 36 and 37.