Algorithmic Alchemy: Turning Math into Market Magic with Python — Part 1
Can AI take a very basic mathematical principal and yield a 184% profit trading stocks? YES! Yes it can!
Most people approach the stock market with gut feelings and hot tips. I prefer Python and probability.
As an attorney turned programmer, I wanted to see if I could build a trading system based purely on mathematical models. Could I remove emotion from the equation and automate the process?
This article details my journey into algorithmic trading, from reading Investopedia to building a basic trading bot with Alpaca. It’s a practical exploration of how to code, backtest, and optimize a simple trading strategy — all while having some fun along the way.
Let’s crank this up to 11 and have some fun!
Let’s turn these theories into action! For this experiment, I’m using Alpaca’s free paper-trading platform. It’s a fantastic sandbox for testing trading strategies without risking real money. Plus, with access to stocks, commodities, currencies, and even crypto, the possibilities for experimentation are endless.
Let’s get coding! We’ll fetch historical price data from Alpaca using their REST API. (You’ll need an Alpaca account and paper trading API keys, which you can generate in the dashboard.) We’ll then convert this raw data into a usable pandas DataFrame. I’m assuming you’re familiar with making API calls in Python and working with DataFrames.
Here’s the code:
def convert_to_dataframe(results: dict):
bars_df = pd.DataFrame(results.get('bars', []))
if not bars_df.empty:
column_mapping = {
'c': 'close',
'o': 'open',
'h': 'high',
'l': 'low',
'v': 'volume',
'n': 'trade_count',
'vw': 'vwap',
't': 'date'
}
bars_df = bars_df.rename(columns=column_mapping) # Rename columns
bars_df['next_page_token'] = results.get('next_page_token')
bars_df['symbol'] = results.get('symbol')
return bars_df
We’ll build our trading strategy on a simplified model using basic probability.
Think of each day’s stock movement as a coin flip: either a gain (heads) or a loss (tails), with an initial 50/50 probability. Now, imagine flipping a coin multiple times and getting several heads in a row. Intuitively, you know that even with a fair coin, there’s no guarantee that a series of heads will simply keep coming up forever. The longer the streak of heads, the more likely that, at some point, tails will reappear.
Our strategy works in much the same way. For example, two gains in a row would be equivalent to the heads outcome in two consecutive coin tosses. The probability of the next toss being heads again would be 12.5%, representing three consecutive heads results. If we were to bet, it would make more sense to bet on tails, the opposite outcome. Each successive gain decreases the probability of another gain and increases the probability of a loss, and vice versa.
This lets us define gain_threshold and loss_threshold parameters to buy low and sell high: we’ll buy when the probability of a continued loss drops below our loss_threshold, and we’ll sell when the probability of a continued gain falls below our gain_threshold.
First, the code, then the explanation:
def get_gain_or_loss_probability(df: pd.DataFrame, gain_threshold: float, loss_threshold: float) -> pd.DataFrame:
bought = False
gain_probability = 0.5
loss_probability = 0.5
df['gain_probability'] = 0.5
df['loss_probability'] = 0.5
df['indication'] = 'do not buy'
for i in range(1, len(df)):
current_close = df.iloc[i]['close']
prior_close = df.iloc[i - 1]['close']
if current_close > prior_close:
gain_probability *= 0.5
loss_probability = 1 - gain_probability
elif current_close < prior_close:
loss_probability *= 0.5
gain_probability = 1 - loss_probability
else:
gain_probability = 0.5
loss_probability = 0.5
df.at[i, 'gain_probability'] = gain_probability
df.at[i, 'loss_probability'] = loss_probability
if not bought and gain_probability >= gain_threshold:
df.at[i, 'indication'] = 'buy'
bought = True
elif bought and loss_probability >= loss_threshold:
df.at[i, 'indication'] = 'sell'
bought = False
return df
- We start by assuming a 50/50 chance for either outcome.
- Then, for each row, we consider whether the day was a gain or a loss, then predict what the chance of a gain or loss will be for the following day.
- If today’s closing price is higher than yesterday’s we consider that a gain result, our ‘heads’ outcome and our gain_probability is halved.
- The reverse is also true for the ‘tails’ outcome. Each new loss will halve the current loss_probability.
This probability shift is tracked in our model with two variables gain_probability and loss_probability.
The key is to act on these shifting probabilities using thresholds. When the calculated ‘heads’ probability falls, our gain_probability becomes high. We then bet on ‘heads’ occurring. When that happens, the model generates a ‘buy’ signal. If, for example, we set the gain_threshold to 96.5%, our algorithm waits for five consecutive losses to generate this buy signal.
Conversely, once we’ve “bought” (bet on ‘heads’), we watch the loss_probability. If it reaches our loss_threshold (say, 95% after four consecutive gains), it triggers a “sell” signal — we cash in our bet before the odds shift too far against us.
These gain_threshold and loss_threshold values are our tuning knobs, allowing us to fine-tune when we enter and exit the market, hopefully avoiding buying too early or selling too late, at least according to the model.
The probability mathematics isn’t important here. The important note is that we have a mathematical formula, and we’ve built a set of rules to indicate when to buy, sell, or do nothing.
With this in place, we can write a Back Testing program to feed our modeled data into and determine whether or not a particular asset would have yielded an overall profit or loss for a given time period.
First the code:
def backtest(df: pd.DataFrame, initial_bank: float) -> tuple[pd.DataFrame, float]:
required_cols = {'indication', 'open', 'close'}
if not required_cols.issubset(df.columns):
raise ValueError(f"DataFrame must contain columns: {required_cols}")
if initial_bank < 0:
raise ValueError("Initial bank cannot be negative.")
df['shares'] = 0
df['cost/profit'] = 0.00
df['bank'] = initial_bank
shares_owned = 0
bought = False
bank = initial_bank
for i in range(len(df)):
indication = df.at[i, 'indication']
close_price = df.at[i, 'close']
if indication == 'buy' and not bought:
affordable_shares = int(bank // df.at[i, 'open'])
if affordable_shares > 0:
cost = affordable_shares * df.at[i, 'open']
bank -= cost
shares_owned += affordable_shares
bought = True
df.at[i, 'bank'] = bank
df.at[i, 'shares'] = affordable_shares
df.at[i, 'cost/profit'] = -cost if affordable_shares > 0 else 0.00
elif indication == 'sell' and bought:
profit = shares_owned * close_price
bank += profit
shares_owned = 0
bought = False
df.at[i, 'bank'] = bank
df.at[i, 'shares'] = -shares_owned
df.at[i, 'cost/profit'] = profit
elif indication == 'do not buy':
df.at[i, 'bank'] = bank
df.at[i, 'shares'] = shares_owned
df.at[i, 'cost/profit'] = 0.00
else:
raise ValueError("Invalid indication: 'buy', 'sell', or 'do not buy'")
if bought and i == len(df) - 1:
final_value = shares_owned * close_price
bank += final_value
df.at[i, 'bank'] = bank
total_gain_loss = bank - initial_bank
return df, total_gain_loss
Here, we’re importing the DataFrame, which now has the indication along with the pricing information, as well as some play money as a “bank”.
When a “buy” indication is reached, we’ll purchase the maximum number of stocks that we can with our available bank, and reflect the deduction from the account and the number of shares that are owned. When a sell signal is reached, we’ll sell the shares at that day’s close price and deposit that money into our bank. The process repeats until the end date for our simulation is reached.
Now we’ve got the power!
With a few adjustments, we can print the overall gain or loss as a percentage and dollar amount.
Here’s a test case for that purpose:
def setUp(self):
results = Alpaca.get_historical('AAPL', '2023-01-01T00:00:00Z', '2025-01-01T00:00:00Z')
df = Alpaca.convert_to_dataframe(results)
df = TechAnalysis.get_loss_or_gain(df)
self.df = Indicators.get_gain_or_loss_probability(df, 0.936698, 0.977329)
def test_backtest(self):
bank = 1000.00
df, total_gain_loss = Backtest.backtest(self.df, initial_bank=bank)
percentage_gained = total_gain_loss / bank * 100
print(f'Total gain/loss: ${total_gain_loss.round(2)}')
print(f'Percentage gained: {percentage_gained.round(2)}%')
self.assertIsNotNone(df)
With this foundation, we can now layer on other models, play with thresholds to find a maximum sweet spot, diversify by purchasing many assets simultaneously, etc.
Here’s a quick chart in Matplotlib to show the daily price changes and when the simulation bought or sold the asset (in this case, Apple) versus the probability of either a gain or a loss:
As you can see, the loss probability increases as the price goes up, and the gain probability increases as the price goes down. I don’t think I’ll trust this with my bank account, but still an incredibly powerful visualization and simulation!
Now, do you want to see something really cool?
Let’s install Hyperopt and run the simulation through a bunch of ML models to find the optimal gain_threshold and loss_threshold to maximize the best yield for our simulation.
What is Hyperopt?
Once we have a model selected, Hyperopt allows us to find the most optimal parameters (gain_threshold and loss_threshold) that are the best for our model and given data set.
def objective_function(gain_threshold, loss_threshold, df, initial_bank):
df = Indicators.get_gain_or_loss_probability(df.copy(), gain_threshold, loss_threshold)
_, total_gain_loss = Backtest.backtest(df, initial_bank)
return total_gain_loss
def objective(params):
gain_threshold = params['gain_threshold']
loss_threshold = params['loss_threshold']
historical_df = params['historical_df']
initial_bank = params['initial_bank']
total_gain_loss = objective_function(gain_threshold, loss_threshold, historical_df, initial_bank)
return {'loss': -total_gain_loss, 'status': STATUS_OK}
And, the test case with the prior DataFrame:
def test_objective(self):
space = {
'gain_threshold': hp.uniform('gain_threshold', 0.5, 1.0),
'loss_threshold': hp.uniform('loss_threshold', 0.5, 1.0),
'historical_df': self.df,
'initial_bank': 1000.00
}
best = fmin(Optimization.objective, space, algo=tpe.suggest, max_evals=500)
print("Best parameters:", best)
self.assertIsNotNone(best)
According to this run, we should toggle our thresholds to be 53% and 99.45%. I first ran AAPL at 94% and 98% to yield a gain of 33.39%. Now, let’s run the same 2-year period at the recommended thresholds according to hyperopt:
Not much of an improvement, but it’s the best that could be done for AAPL. For TSLA (Tesla), it was much different. I first ran TSLA with 94% and 98% for a one-year period and yielded :
Then, I ran it through Hyperopt, which set the values to be: 99.21% and 99.99%. This time the yield was…
Wow! I still wouldn’t trade my hard earned money against this, but wow! With a few lines of code, we were able to run through hundreds of simulations to find the absolute best values for which to set our thresholds.
With this, we can run simulations through dozens of assets, then find the threshold values that would be best to set for all of the assets and see what the yield would be.
Stay tuned for future articles. In those, we’ll explore expanding the simulations to include multiple assets across different asset classes (markets), incorporate new and additional mathematical models, run simulations across each and all, then aggregate the results into a final buy/sell matrix.
Finally, and I can’t fully commit until I’m completely confident (and competent), but I’ll write the code to complete the trading bot and transfer the seed money to kick this whole thing off.