AI Mole
Published on

How to start downloading stock market data for finance with machine learning

Authors
  • avatar

In today's fast-paced financial landscape, harnessing the power of machine learning algorithms is a promessing approach for decision-making in the stock market. However, before diving into the realm of predictive analytics and algorithmic trading, one must first master the foundational step: acquiring reliable and comprehensive stock market data. In this introductory guide, we'll explore how to get started with fetching stock market data using Python, laying the groundwork for subsequent machine learning applications in finance.

First, to create a Conda environment for working with financial data from Yahoo Finance and perform data processing tasks, you can follow these steps:

Create a new Conda environment: Open a terminal or command prompt and create a new Conda environment using the following command:

conda create --name finance_env python=3.8
conda init

Replace finance_env with your desired environment name.

  • Activate the environment: Activate the newly created environment with the following command:
conda activate finance_env
  • Install necessary packages: You can install the required packages for working with financial data. Typically, you'll need packages like pandas, numpy, matplotlib, requests, and yfinance for fetching data from Yahoo Finance. You can install them using the following command:
conda install pandas numpy matplotlib requests

Then install yfinance using pip:

pip install yfinance
  • Verify installations: After installation, you can verify if everything is installed correctly by running Python in your terminal and importing the required libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
import yfinance as yf
  • Start working: Now you're all set to work with financial data. You can use yfinance to fetch stock prices and history from Yahoo Finance, and use other libraries for data processing, visualization, and analysis.

  • Remember to deactivate the environment when you're done working:

conda deactivate

And activate it again when you want to work:

conda activate finance_env

By following these steps, you'll have a Conda environment set up specifically for working with financial data from Yahoo Finance.

To download historical stock price data for some ticker, you can use the yfinance library within a Python script. Here's a sample code snippet to download historical data for Stellantis (ticker symbol: STLA) for a specific time period:

import yfinance as yf

def download_stock_data(ticker, start_date, end_date):
    """
    Downloads historical stock price data for the given ticker symbol
    between the specified start and end dates.
    
    Args:
    - ticker (str): Ticker symbol of the stock (e.g., "STLA" for Stellantis).
    - start_date (str): Start date in YYYY-MM-DD format.
    - end_date (str): End date in YYYY-MM-DD format.
    
    Returns:
    - pandas.DataFrame: DataFrame containing historical stock price data.
    """
    # Download data
    stock_data = yf.download(ticker, start=start_date, end=end_date)
    
    return stock_data

if __name__ == "__main__":
    # Define ticker symbol and date range
    ticker_symbol = "STLA"
    start_date = "2023-01-01"
    end_date = "2024-01-01"
    
    # Download data
    stock_data = download_stock_data(ticker_symbol, start_date, end_date)
    
    # Display first few rows of the downloaded data
    print(stock_data.head())
	

This script defines a function download_stock_data that takes the ticker symbol (ticker), start date (start_date), and end date (end_date) as inputs. It uses yfinance to download historical stock price data within the specified date range.

In the if name == "main": block, you can specify the ticker symbol (ticker_symbol) and the start and end dates for the data you want to download. Running this script will download the historical stock price data for Stellantis between the specified dates and display the first few rows of the downloaded data.

Make sure you have installed the required packages (including yfinance) in your Conda environment as described previously before.

You can easily convert the downloaded stock data, which is already in a Pandas DataFrame format, into a NumPy array if needed. Here's how you can do it:

import numpy as np
import pandas as pd
import yfinance as yf

def download_stock_data(ticker, start_date, end_date):
    """
    Downloads historical stock price data for the given ticker symbol
    between the specified start and end dates.
    
    Args:
    - ticker (str): Ticker symbol of the stock (e.g., "STLA" for Stellantis).
    - start_date (str): Start date in YYYY-MM-DD format.
    - end_date (str): End date in YYYY-MM-DD format.
    
    Returns:
    - pandas.DataFrame: DataFrame containing historical stock price data.
    """
    # Download data
    stock_data = yf.download(ticker, start=start_date, end=end_date)
    
    return stock_data

if __name__ == "__main__":
    # Define ticker symbol and date range
    ticker_symbol = "STLA"
    start_date = "2023-01-01"
    end_date = "2024-01-01"
    
    # Download data
    stock_data = download_stock_data(ticker_symbol, start_date, end_date)
    
    # Convert DataFrame to NumPy array
    stock_data_numpy = stock_data.to_numpy()
    
    # Convert DataFrame to Pandas DataFrame (Optional, since it's already in this format)
    stock_data_pandas = pd.DataFrame(stock_data)
    
    # Display the first few rows of the NumPy array and Pandas DataFrame
    print("NumPy array:")
    print(stock_data_numpy[:5])  # Display first 5 rows
    print("\n")
    
    print("Pandas DataFrame:")
    print(stock_data_pandas.head())  # Display first few rows

In this code, after downloading the stock data into a Pandas DataFrame (stock_data), we convert it into a NumPy array using the to_numpy() method. Optionally, we also convert it back to a Pandas DataFrame (stock_data_pandas), although it's already in this format.

This way, you'll have the historical stock price data from Yahoo Finance available both as a NumPy array and a Pandas DataFrame, giving you flexibility in data processing.

To visualize the historical stock price data in candlestick form, you can use the mplfinance library, which is specifically designed for financial data visualization, including candlestick charts. First, make sure you have mplfinance installed in your Conda environment:

pip install mplfinance

Then, you can modify the previous script to include visualization:

import numpy as np
import pandas as pd
import yfinance as yf
import mplfinance as mpf

def download_stock_data(ticker, start_date, end_date):
    """
    Downloads historical stock price data for the given ticker symbol
    between the specified start and end dates.
    
    Args:
    - ticker (str): Ticker symbol of the stock (e.g., "STLA" for Stellantis).
    - start_date (str): Start date in YYYY-MM-DD format.
    - end_date (str): End date in YYYY-MM-DD format.
    
    Returns:
    - pandas.DataFrame: DataFrame containing historical stock price data.
    """
    # Download data
    stock_data = yf.download(ticker, start=start_date, end=end_date)
    
    return stock_data

if __name__ == "__main__":
    # Define ticker symbol and date range
    ticker_symbol = "STLA"
    start_date = "2023-01-01"
    end_date = "2024-01-01"
    
    # Download data
    stock_data = download_stock_data(ticker_symbol, start_date, end_date)
    
    # Visualize data in candlestick form
    mpf.plot(stock_data, type='candle', style='charles', volume=True)

In this modified script, after downloading the stock data into a Pandas DataFrame (stock_data), we use mplfinance to plot the candlestick chart directly from the DataFrame using the mpf.plot() function. The type='candle' argument specifies that we want to visualize the data as candlestick charts, and style='charles' specifies a predefined style for the chart. volume=True adds a volume subplot to the chart, displaying trading volume along with the price data.

This script will display a candlestick chart with trading volume for the historical stock price data of Stellantis within the specified date range. Adjust the date range and ticker symbol as needed.

Next time we'll see how it is possible to work with dividends and to download a batch of data for multiple tickers.