5-Second Bitcoin OHLCV Analysis: Our First Kaggle Notebook
Today, we created our first Kaggle notebook for analyzing Bitcoin historical data in OHLCV format with 5-second granularity .
We share the workings of our Python code to present 3 points: How to analyze historical Bitcoin OHLCV data in 5-second granularity , how to verify data continuity, and how to effectively visualize the market using a dedicated Kaggle notebook .
- Loading historical Bitcoin OHLCV data
- Timestamp continuity check
- Bitcoin price visualization (Close)
- High-frequency OHLC graph
- SMA 50 and SMA 200 indicators
- OHLCV Correlation Matrix
- Good practices in Python, clean code...
Before we begin, you can find Bitcoin's historical OHLCV data at this address:
Kaggle - Historical Bitcoin Data OHLCV
And the link to the notebook:
Kaggle - Notebook
A brief preamble,
In Python, to work with historical data in CSV format, we use two libraries:
Pandas , for easily manipulating and analyzing data.
Matplotlib , to display the data graphically,
To go further, examples for these bookstores are provided:
Panda, find examples of its use here
MatPlotLib examples can be found here .
Other resources on the internet explain the capabilities of these libraries much better than I can. While they produce excellent results, I'm more of a Power BI fan for data visualization. But today, it's Python!
So we'll start at the beginning,
1. Load historical Bitcoin OHLCV data every 5 seconds
The notebook begins by loading historical Bitcoin (BTC/USDT) data in semicolon-separated CSV format, with the classic OHLCV + QuoteVolume columns:
- Timestamp
- Open: Opening price for the period
- High: The highest price reached during the period
- Low: The lowest point reached during the period
- Close: The closing price
- Volume: The quantity of asset traded (crypto)
- QuoteVolume: The total quantity in USDT exchanged
def main():
df = pd.read_csv(
"BTCUSDT_2025_01_01-07_avg5s.csv",
sep=';',
parse_dates=['Timestamp'])
print(df.head())
The parse_dates function automatically converts timestamps into datetime objects.
Our CSV file has now been uploaded.
2. Verification of timestamp continuity
It is important to verify data continuity when working with historical data. Without it, analyses, backtests, or quantitative models can be flawed.
In our Python code, we created a function using the pandas library to facilitate this search from the dataset:
def missingTimestamp(df):
df = df.sort_values('Timestamp')
expected_freq = pd.to_timedelta("5s")
actual_gaps = df['Timestamp'].diff().value_counts()
print()
print("Most common timestamp gaps:")
print(actual_gaps)
if len(actual_gaps) == 1 and actual_gaps.index[0] == expected_freq:
print("โ Dataset is perfectly continuous at 5-second intervals.")
else:
print("โ Irregular gaps detected.")
If everything goes well, the expected result should be the following: 
In our case:
๐ 100% of the intervals are 5 seconds
๐ No holes, no anomalies
๐ The historical data is perfectly clean
3. Bitcoin price visualization (Close)
We displayed the evolution of the โCloseโ price at 5-second intervals using Matplotlib in our function:
def visualizingPrice(df):
plt.plot(df['Timestamp'], df['Close'], linewidth=0.7)
plt.title("Bitcoin Close Price (5-Second Intervals)")
plt.xlabel("Time")
plt.ylabel("Price (USDT)")
plt.show()
Simply replace "Close" with "Open", "High" or "Low" in the first line of the function to display different data:
plt.plot(df['Timestamp'], df['Close'], linewidth=0.7)
Visualizing the evolution of the "Close" price at 5-second intervals using Matplotlib:
This chart provides a first look at the market at high resolution (5s).
4. OHLC (pseudo-candlestick) visualization
Next, we represented the OHLC structure on the first 2000 lines using High-Low shading and a Close line:
def candlestick(df):
sample = df.iloc[:2000].copy()
plt.plot(sample['Timestamp'], sample['Close'], label='Close', linewidth=0.7)
plt.fill_between(sample['Timestamp'], sample['Low'], sample['High'],
alpha=0.5, color='red', label='High-Low range')
plt.title("Bitcoin OHLC Visualization โ First 2000 Intervals")
plt.xlabel("Time")
plt.ylabel("Price")
plt.legend()
plt.show()
We get a nice graph:
This type of visualization is very useful for:- identify areas of high volatility
- identify rapid price behavior
- visualize the intra-structural evolution of the market
At this level of granularity, we obtain an almost โmicrostructuralโ view.
5. Moving averages: 50-day SMA and 200-day SMA
We then add two of the most commonly used indicators in trading:
- SMA 50 for fast trends
- SMA 200 for long-term trends.
๐ Key point :
The moving averages here are calculated directly on 5-second data, which is not standard practice. To obtain a less noisy and more representative view of actual market conditions (most experienced traders use the 50 and 200 SMAs over longer periods), these indicators are generally applied to longer-duration candlesticks.
However, visualizing moving averages on 5-second data remains interesting for analyzing high-frequency price dynamics.
Our Python function:
def movingAverage(df):
sample = df.iloc[:2000].copy()
sample['SMA50'] = sample['Close'].rolling(50).mean()
sample['SMA200'] = sample['Close'].rolling(200).mean()
plt.plot(sample['Timestamp'], sample['Close'], linewidth=0.7, label='Close')
plt.plot(sample['Timestamp'], sample['SMA50'], label='SMA 50')
plt.plot(sample['Timestamp'], sample['SMA200'], label='SMA 200')
plt.legend()
plt.title("Close Price with Moving Averages")
plt.show()
sample['SMA50'] = sample['Close'].rolling(50).mean()
sample['SMA100'] = sample['Close'].rolling(100).mean()
sample['SMA150'] = sample['Close'].rolling(150).mean()
sample['SMA200'] = sample['Close'].rolling(200).mean()
plt.plot(sample['Timestamp'], sample['SMA50'], label='SMA 50')
plt.plot(sample['Timestamp'], sample['SMA100'], label='SMA 100')
plt.plot(sample['Timestamp'], sample['SMA150'], label='SMA 150')
plt.plot(sample['Timestamp'], sample['SMA200'], label='SMA 200')

The chart allows visualization of crossovers, trends, and consolidation phases.
This is also an excellent starting point for developing a simple backtest (SMA crossover). We will see how to create a crossover SMA backtest in a future article.
6. OHLCV Correlation Matrix
Finally, we calculate the correlation matrix:
def corrMatrix(df):
corr = df[['Open', 'High', 'Low', 'Close', 'Volume', 'QuoteVolume']].corr()
print()
print("Correlation matrix:")
print(corr)

This analysis helps us understand:
- natural correlations between Open/Close
- the impact of volume on movements
- if structural patterns appear
- the relationship between volatility and volume
This is a first step towards a more in-depth quantitative analysis.
7. Let's not forget Python best practices
As with compiled applications, it is recommended to use a main() function:
def main():
df = pd.read_csv(
"/kaggle/input/bitcoin-historical-data-ohlcv-5-second-interval/BTCUSDT_2025_01_01-07_avg5s.csv",
sep=';',
parse_dates=['Timestamp']
) print(df.head())
missingTimestamp(df)
candlestick(df)
movingAverage(df)
corrMatrix(df)
if __name__ == "__main__":
hand()
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (14, 7)
plt.rcParams['axes.grid'] = True
Conclusion
This first Kaggle notebook demonstrates how easily high-frequency data can be used to:
- explore the market microstructure
- understanding price dynamics
- develop strategies
- test models
- prepare quantitative analyses
Thanks to the 5-second OHLCV format, we have access to a level of detail rarely available publicly. Ideal for backtesting Grid or Scalping strategies on this price history.
Thank you for reading, back to your notebooks :)