dexanalytics
DEX Analytics
Back to TLDR Datasets

Analysis of CEX/DEX Arbitrage

Authors

Fei Wu

Formats

.csv

Resources

A dataset (sample) of CEX-DEX transactions on the Ethereum mainnet identified by a set of heuristics implemented on Dune Analytics, including the information of the block, the transaction, and the DEX trade(s) executed by the transaction.

The dataset is curated by Fei and Danning, and is part of the TLDR 2024 fellowship program. Special Thanks to Tivas for helping organize this repository.

About the CEX-DEX dataset

The dataset is now provided with a 1-day sample. For the entire-range dataset and more details of our work, stay in tune for this repo and the TLDR Conference 2025.

date_range: 2023-08-08 to 2025-03-08

size: ~4 GB for the entire time range.

structure: [8,668,130, 20] (8,668,130 rows, 20 columns)

source: Dune

blockchain: Ethereum

variables:

  • block_number [Integer]: Block number
  • block_time [DateTime]: UTC timestamp of the block
  • tx_hash [String]: Transaction Hash
  • tx_index [Integer]: Index of the transaction within the block
  • from_addr [String]: Address sending the transaction
  • to_addr [String]: Address receiving the transaction
  • mev_bot_label [String]: Label identifying the MEV bot (if applicable)
  • base_fees [Float]: Base fees paid for the transaction
  • priority_fees [Float]: Tips or priority fees paid to the builder
  • cb_transfer [Float]: Coinbase transfer paid to the builder
  • mev_value [Float]: priorityfees + cbtransfer
  • volume [Float]: USD notional volume of the trade
  • token_bought_amount [Float]: Amount of token bought
  • token_sold_amount [Float]: Amount of token sold
  • token_bought_contract [String]: ERC-20 contract address of the token bought
  • token_sold_contract [String]: ERC-20 contract address of the token sold
  • token_bought_symbol [String]: Symbol of the token bought
  • token_sold_symbol [String]: Symbol of the token sold
  • pair [String]: Token trading pair
  • multi_trade [Integer]: Flag indicating if the trade includes multiple swaps (0 or 1)

other_notes: Transactions with identifiable MEV bot interactions are labeled. Dataset may include rows with <nil> in MEV bot label.

Data Pipeline of our project

The data pipeline of this project consists of two scripts, one to pull CEX-DEX transaction data from Dune and one to get CEX prices for tokens covered in the dataset during this time range.

Pulling the CEX-DEX dataset using Dune API

The structure of the data from the Dune query output can be seen in the table above.

To pull the dataset from Dune, you can use the Dune API endpoints with the Pagination feature to divide the dataset into downloadable .csv files.

A python script dune_api.py for pulling the CEX-DEX dataset using Pagination from Dune API endpoints is provided.

Binance price from Tardis

After pulling the CEX-DEX dataset from Dune, you can check the tokens' CEX prices at the block_time or different markouts.

In our project, we use Binance Spot historical quote data from Tardis. For details about the Tardis data, check the Tardis documentation.

We here provide a Python script parsing_tardis_data.py for parsing data of token price quoted in USDT on Binance Spot from Tardis. To run this script, add a valid TARDIS API key and data directory into lines 36 and 37.

Want more flexible access to DEX data? Sign up for the Explorer waitlist and we'll reach out to you!