Analyse OHLCV Bitcoin à 5 secondes : notre premier notebook Kaggle - Market5s

5-Second Bitcoin OHLCV Analysis: Our First Kaggle Notebook

November 26, 2025

Today, we created our first Kaggle notebook for analyzing Bitcoin historical data in OHLCV format with 5-second granularity .
We share the workings of our Python code to present 3 points: How to analyze historical Bitcoin OHLCV data in 5-second granularity , how to verify data continuity, and how to effectively visualize the market using a dedicated Kaggle notebook .

Loading historical Bitcoin OHLCV data
Timestamp continuity check
Bitcoin price visualization (Close)
High-frequency OHLC graph
SMA 50 and SMA 200 indicators
OHLCV Correlation Matrix
Good practices in Python, clean code...

Before we begin, you can find Bitcoin's historical OHLCV data at this address:
Kaggle - Historical Bitcoin Data OHLCV
And the link to the notebook:
Kaggle - Notebook

A brief preamble,

In Python, to work with historical data in CSV format, we use two libraries:
Pandas , for easily manipulating and analyzing data.
Matplotlib , to display the data graphically,

To go further, examples for these bookstores are provided:
Panda, find examples of its use here
MatPlotLib examples can be found here .

Other resources on the internet explain the capabilities of these libraries much better than I can. While they produce excellent results, I'm more of a Power BI fan for data visualization. But today, it's Python!

So we'll start at the beginning,

1. Load historical Bitcoin OHLCV data every 5 seconds

The notebook begins by loading historical Bitcoin (BTC/USDT) data in semicolon-separated CSV format, with the classic OHLCV + QuoteVolume columns:

Timestamp
Open: Opening price for the period
High: The highest price reached during the period
Low: The lowest point reached during the period
Close: The closing price
Volume: The quantity of asset traded (crypto)
QuoteVolume: The total quantity in USDT exchanged

 def main():
 df = pd.read_csv(
    "BTCUSDT_2025_01_01-07_avg5s.csv",
 sep=';',
 parse_dates=['Timestamp'])

 print(df.head())

The parse_dates function automatically converts timestamps into datetime objects.
Our CSV file has now been uploaded.

2. Verification of timestamp continuity

It is important to verify data continuity when working with historical data. Without it, analyses, backtests, or quantitative models can be flawed.

In our Python code, we created a function using the pandas library to facilitate this search from the dataset:

 def missingTimestamp(df):
  df = df.sort_values('Timestamp')

  expected_freq = pd.to_timedelta("5s")
  actual_gaps = df['Timestamp'].diff().value_counts()

 print()
  print("Most common timestamp gaps:")
 print(actual_gaps)

  if len(actual_gaps) == 1 and actual_gaps.index[0] == expected_freq:
    print("✅ Dataset is perfectly continuous at 5-second intervals.")
 else:
    print("⚠ Irregular gaps detected.")

If everything goes well, the expected result should be the following:

In our case:
👉 100% of the intervals are 5 seconds
👉 No holes, no anomalies
👉 The historical data is perfectly clean

3. Bitcoin price visualization (Close)

We displayed the evolution of the “Close” price at 5-second intervals using Matplotlib in our function:

 def visualizingPrice(df):
  plt.plot(df['Timestamp'], df['Close'], linewidth=0.7)
  plt.title("Bitcoin Close Price (5-Second Intervals)")
 plt.xlabel("Time")
 plt.ylabel("Price (USDT)")
 plt.show()

Simply replace "Close" with "Open", "High" or "Low" in the first line of the function to display different data:

  plt.plot(df['Timestamp'], df['Close'], linewidth=0.7)

Visualizing the evolution of the "Close" price at 5-second intervals using Matplotlib:

This chart provides a first look at the market at high resolution (5s).

4. OHLC (pseudo-candlestick) visualization

Next, we represented the OHLC structure on the first 2000 lines using High-Low shading and a Close line:

def candlestick(df):
  sample = df.iloc[:2000].copy()


  plt.plot(sample['Timestamp'], sample['Close'], label='Close', linewidth=0.7)
  plt.fill_between(sample['Timestamp'], sample['Low'], sample['High'],
           alpha=0.5, color='red', label='High-Low range')


  plt.title("Bitcoin OHLC Visualization — First 2000 Intervals")
 plt.xlabel("Time")
 plt.ylabel("Price")
 plt.legend()
 plt.show()

🔵 The blue curve represents the Close

🔴 The red zone represents High-Low volatility

We get a nice graph:

This type of visualization is very useful for:

identify areas of high volatility
identify rapid price behavior
visualize the intra-structural evolution of the market

At this level of granularity, we obtain an almost “microstructural” view.

5. Moving averages: 50-day SMA and 200-day SMA

We then add two of the most commonly used indicators in trading:

SMA 50 for fast trends
SMA 200 for long-term trends.

👉 Key point :
The moving averages here are calculated directly on 5-second data, which is not standard practice. To obtain a less noisy and more representative view of actual market conditions (most experienced traders use the 50 and 200 SMAs over longer periods), these indicators are generally applied to longer-duration candlesticks.
However, visualizing moving averages on 5-second data remains interesting for analyzing high-frequency price dynamics.

Our Python function:

 def movingAverage(df):
  sample = df.iloc[:2000].copy()
  sample['SMA50'] = sample['Close'].rolling(50).mean()
  sample['SMA200'] = sample['Close'].rolling(200).mean()
 
plt.plot(sample['Timestamp'], sample['Close'], linewidth=0.7, label='Close')
 plt.plot(sample['Timestamp'], sample['SMA50'], label='SMA 50')
 plt.plot(sample['Timestamp'], sample['SMA200'], label='SMA 200')

 plt.legend()
 plt.title("Close Price with Moving Averages")
 plt.show()

You can add and modify different moving averages yourself.

The "Sample" variable here calculates the average of the latest data

The plt.plot function displays this calculated data.

It is of course possible to add more:

  sample['SMA50'] = sample['Close'].rolling(50).mean()
  sample['SMA100'] = sample['Close'].rolling(100).mean()
  sample['SMA150'] = sample['Close'].rolling(150).mean()
  sample['SMA200'] = sample['Close'].rolling(200).mean()

By displaying them:

  plt.plot(sample['Timestamp'], sample['SMA50'], label='SMA 50')
  plt.plot(sample['Timestamp'], sample['SMA100'], label='SMA 100')
  plt.plot(sample['Timestamp'], sample['SMA150'], label='SMA 150') 
plt.plot(sample['Timestamp'], sample['SMA200'], label='SMA 200')

In our example, we calculated only 2 moving averages on the first 2000 points:

The chart allows visualization of crossovers, trends, and consolidation phases.

This is also an excellent starting point for developing a simple backtest (SMA crossover). We will see how to create a crossover SMA backtest in a future article.

6. OHLCV Correlation Matrix

Finally, we calculate the correlation matrix:

 def corrMatrix(df):
  corr = df[['Open', 'High', 'Low', 'Close', 'Volume', 'QuoteVolume']].corr()
 print()
  print("Correlation matrix:")
 print(corr)

This matrix allows us to measure how each column changes in relation to the others.

The OHLC data set and the VQ data set:

This analysis helps us understand:

natural correlations between Open/Close
the impact of volume on movements
if structural patterns appear
the relationship between volatility and volume

This is a first step towards a more in-depth quantitative analysis.

7. Let's not forget Python best practices

As with compiled applications, it is recommended to use a main() function:

 def main():
 df = pd.read_csv(
 "/kaggle/input/bitcoin-historical-data-ohlcv-5-second-interval/BTCUSDT_2025_01_01-07_avg5s.csv",
 sep=';',
 parse_dates=['Timestamp']
 )
 print(df.head())

 missingTimestamp(df)
 candlestick(df)
 movingAverage(df)
 corrMatrix(df)

 if __name__ == "__main__":
  hand()

And, at the very beginning of our script, we import the libraries, dependencies of our program:

 import pandas as pd
 import matplotlib.pyplot as plt

 plt.rcParams['figure.figsize'] = (14, 7)
 plt.rcParams['axes.grid'] = True

🔗 To go further

You can find the entire notebook directly on our Kaggle space or directly on our dedicated "source code" github page.

Conclusion

This first Kaggle notebook demonstrates how easily high-frequency data can be used to:

explore the market microstructure
understanding price dynamics
develop strategies
test models
prepare quantitative analyses

Thanks to the 5-second OHLCV format, we have access to a level of detail rarely available publicly. Ideal for backtesting Grid or Scalping strategies on this price history.

Thank you for reading, back to your notebooks :)

Back to blog

Item added to your cart

5-Second Bitcoin OHLCV Analysis: Our First Kaggle Notebook

1. Load historical Bitcoin OHLCV data every 5 seconds

2. Verification of timestamp continuity

3. Bitcoin price visualization (Close)

4. OHLC (pseudo-candlestick) visualization

5. Moving averages: 50-day SMA and 200-day SMA

6. OHLCV Correlation Matrix

7. Let's not forget Python best practices

Conclusion

Leave a comment

Country/region

Language

1. Load historical Bitcoin OHLCV data every 5 seconds

2. Verification of timestamp continuity

3. Bitcoin price visualization (Close)

4. OHLC (pseudo-candlestick) visualization

5. Moving averages: 50-day SMA and 200-day SMA

6. OHLCV Correlation Matrix

7. Let's not forget Python best practices

Conclusion

Leave a comment