Analyse OHLCV Bitcoin ร  5 secondes : notre premier notebook Kaggle - Market5s

5-Second Bitcoin OHLCV Analysis: Our First Kaggle Notebook

Today, we created our first Kaggle notebook for analyzing Bitcoin historical data in OHLCV format with 5-second granularity .
We share the workings of our Python code to present 3 points: How to analyze historical Bitcoin OHLCV data in 5-second granularity , how to verify data continuity, and how to effectively visualize the market using a dedicated Kaggle notebook .

  • Loading historical Bitcoin OHLCV data
  • Timestamp continuity check
  • Bitcoin price visualization (Close)
  • High-frequency OHLC graph
  • SMA 50 and SMA 200 indicators
  • OHLCV Correlation Matrix
  • Good practices in Python, clean code...


Before we begin, you can find Bitcoin's historical OHLCV data at this address:
Kaggle - Historical Bitcoin Data OHLCV
And the link to the notebook:
Kaggle - Notebook


A brief preamble,

In Python, to work with historical data in CSV format, we use two libraries:
Pandas , for easily manipulating and analyzing data.
Matplotlib , to display the data graphically,

To go further, examples for these bookstores are provided:
Panda, find examples of its use here
MatPlotLib examples can be found here .

Other resources on the internet explain the capabilities of these libraries much better than I can. While they produce excellent results, I'm more of a Power BI fan for data visualization. But today, it's Python!

So we'll start at the beginning,

1. Load historical Bitcoin OHLCV data every 5 seconds

The notebook begins by loading historical Bitcoin (BTC/USDT) data in semicolon-separated CSV format, with the classic OHLCV + QuoteVolume columns:

  • Timestamp
  • Open: Opening price for the period
  • High: The highest price reached during the period
  • Low: The lowest point reached during the period
  • Close: The closing price
  • Volume: The quantity of asset traded (crypto)
  • QuoteVolume: The total quantity in USDT exchanged
 def main():
df = pd.read_csv(
"BTCUSDT_2025_01_01-07_avg5s.csv",
sep=';',
parse_dates=['Timestamp'])

print(df.head())

The parse_dates function automatically converts timestamps into datetime objects.
Our CSV file has now been uploaded.

2. Verification of timestamp continuity

It is important to verify data continuity when working with historical data. Without it, analyses, backtests, or quantitative models can be flawed.

In our Python code, we created a function using the pandas library to facilitate this search from the dataset:

 def missingTimestamp(df):
df = df.sort_values('Timestamp')
expected_freq = pd.to_timedelta("5s")
actual_gaps = df['Timestamp'].diff().value_counts()
print()
print("Most common timestamp gaps:")
print(actual_gaps)
if len(actual_gaps) == 1 and actual_gaps.index[0] == expected_freq:
print("โœ… Dataset is perfectly continuous at 5-second intervals.")
else:
print("โš  Irregular gaps detected.")

If everything goes well, the expected result should be the following:

In our case:
๐Ÿ‘‰ 100% of the intervals are 5 seconds
๐Ÿ‘‰ No holes, no anomalies
๐Ÿ‘‰ The historical data is perfectly clean

3. Bitcoin price visualization (Close)

We displayed the evolution of the โ€œCloseโ€ price at 5-second intervals using Matplotlib in our function:

 def visualizingPrice(df):
plt.plot(df['Timestamp'], df['Close'], linewidth=0.7)
plt.title("Bitcoin Close Price (5-Second Intervals)")
plt.xlabel("Time")
plt.ylabel("Price (USDT)")
plt.show()

Simply replace "Close" with "Open", "High" or "Low" in the first line of the function to display different data:

  plt.plot(df['Timestamp'], df['Close'], linewidth=0.7)

Visualizing the evolution of the "Close" price at 5-second intervals using Matplotlib:

This chart provides a first look at the market at high resolution (5s).

4. OHLC (pseudo-candlestick) visualization

Next, we represented the OHLC structure on the first 2000 lines using High-Low shading and a Close line:

def candlestick(df):
sample = df.iloc[:2000].copy()

plt.plot(sample['Timestamp'], sample['Close'], label='Close', linewidth=0.7)
plt.fill_between(sample['Timestamp'], sample['Low'], sample['High'],
alpha=0.5, color='red', label='High-Low range')

plt.title("Bitcoin OHLC Visualization โ€” First 2000 Intervals")
plt.xlabel("Time")
plt.ylabel("Price")
plt.legend()
plt.show()
๐Ÿ”ต The blue curve represents the Close
๐Ÿ”ด The red zone represents High-Low volatility

We get a nice graph:
This type of visualization is very useful for:
  • identify areas of high volatility
  • identify rapid price behavior
  • visualize the intra-structural evolution of the market

At this level of granularity, we obtain an almost โ€œmicrostructuralโ€ view.

5. Moving averages: 50-day SMA and 200-day SMA

We then add two of the most commonly used indicators in trading:

  • SMA 50 for fast trends
  • SMA 200 for long-term trends.

๐Ÿ‘‰ Key point :
The moving averages here are calculated directly on 5-second data, which is not standard practice. To obtain a less noisy and more representative view of actual market conditions (most experienced traders use the 50 and 200 SMAs over longer periods), these indicators are generally applied to longer-duration candlesticks.
However, visualizing moving averages on 5-second data remains interesting for analyzing high-frequency price dynamics.

Our Python function:

 def movingAverage(df):
sample = df.iloc[:2000].copy()
sample['SMA50'] = sample['Close'].rolling(50).mean()
sample['SMA200'] = sample['Close'].rolling(200).mean()
plt.plot(sample['Timestamp'], sample['Close'], linewidth=0.7, label='Close')
plt.plot(sample['Timestamp'], sample['SMA50'], label='SMA 50')
plt.plot(sample['Timestamp'], sample['SMA200'], label='SMA 200')
plt.legend()
plt.title("Close Price with Moving Averages")
plt.show()

You can add and modify different moving averages yourself.
The "Sample" variable here calculates the average of the latest data
The plt.plot function displays this calculated data.

It is of course possible to add more:
  sample['SMA50'] = sample['Close'].rolling(50).mean()
sample['SMA100'] = sample['Close'].rolling(100).mean()
sample['SMA150'] = sample['Close'].rolling(150).mean()
sample['SMA200'] = sample['Close'].rolling(200).mean()

By displaying them:
  plt.plot(sample['Timestamp'], sample['SMA50'], label='SMA 50')
plt.plot(sample['Timestamp'], sample['SMA100'], label='SMA 100')
plt.plot(sample['Timestamp'], sample['SMA150'], label='SMA 150')
plt.plot(sample['Timestamp'], sample['SMA200'], label='SMA 200')
In our example, we calculated only 2 moving averages on the first 2000 points:

The chart allows visualization of crossovers, trends, and consolidation phases.

This is also an excellent starting point for developing a simple backtest (SMA crossover). We will see how to create a crossover SMA backtest in a future article.

6. OHLCV Correlation Matrix

Finally, we calculate the correlation matrix:

 def corrMatrix(df):
corr = df[['Open', 'High', 'Low', 'Close', 'Volume', 'QuoteVolume']].corr()
print()
print("Correlation matrix:")
print(corr)

This matrix allows us to measure how each column changes in relation to the others.
The OHLC data set and the VQ data set:

This analysis helps us understand:

  • natural correlations between Open/Close
  • the impact of volume on movements
  • if structural patterns appear
  • the relationship between volatility and volume

This is a first step towards a more in-depth quantitative analysis.

7. Let's not forget Python best practices

As with compiled applications, it is recommended to use a main() function:

 def main():
df = pd.read_csv(
"/kaggle/input/bitcoin-historical-data-ohlcv-5-second-interval/BTCUSDT_2025_01_01-07_avg5s.csv",
sep=';',
parse_dates=['Timestamp']
) print(df.head())
missingTimestamp(df)
candlestick(df)
movingAverage(df)
corrMatrix(df)
if __name__ == "__main__":
hand()
And, at the very beginning of our script, we import the libraries, dependencies of our program:
 import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (14, 7)
plt.rcParams['axes.grid'] = True

๐Ÿ”— To go further
You can find the entire notebook directly on our Kaggle space or directly on our dedicated "source code" github page.

Conclusion

This first Kaggle notebook demonstrates how easily high-frequency data can be used to:

  • explore the market microstructure
  • understanding price dynamics
  • develop strategies
  • test models
  • prepare quantitative analyses

Thanks to the 5-second OHLCV format, we have access to a level of detail rarely available publicly. Ideal for backtesting Grid or Scalping strategies on this price history.

Thank you for reading, back to your notebooks :)

Back to blog

Leave a comment

Please note, comments need to be approved before they are published.