# FinLab Best Practices and Anti-Patterns This document contains critical coding patterns, anti-patterns, and best practices for developing FinLab strategies. **Following these guidelines prevents common errors, lookahead bias, and data pollution.** ## Table of Contents 1. [Code Patterns (DO THIS)](#code-patterns-do-this) 2. [Anti-Patterns (DON'T DO THIS)](#anti-patterns-dont-do-this) 3. [Preventing Future Data Pollution](#preventing-future-data-pollution) 4. [Stock Selection Patterns](#stock-selection-patterns) 5. [Backtesting Patterns](#backtesting-patterns) 6. [Error Handling](#error-handling) --- ## Code Patterns (DO THIS) ### ✅ Combine Conditions with Logical Operators **DO:** Use `&`, `|`, `~` to combine conditions into a single position DataFrame. ```python from finlab import data from finlab.backtest import sim factor1 = data.get("price:收盤價") factor2 = data.get("monthly_revenue:當月營收") factor3 = data.get("price_earning_ratio:本益比") cond1 = factor1.rank(axis=1, pct=True) > 0.5 cond2 = factor2.rank(axis=1, pct=True) > 0.5 cond_intersection = cond1 & cond2 position = factor3[cond_intersection].is_smallest(5) report = sim(position, resample="M") ``` **DON'T:** Create separate functions to generate positions (adds unnecessary complexity). ### ✅ Use `is_smallest()` or `is_largest()` for Stock Selection **DO:** Limit to top N < 50 stocks using these methods. ```python # Select top 10 stocks by lowest P/E pe = data.get("price_earning_ratio:本益比") position = pe.is_smallest(10) # Select top 15 stocks by highest momentum, where condition is met close = data.get("price:收盤價") momentum = close / close.shift(20) - 1 condition = close > close.average(60) position = momentum[condition].is_largest(15) ``` **Note:** The DataFrame used with `is_smallest()`/`is_largest()` must have **float dtype**, not bool. If you have a boolean condition, apply it as a filter first. ### ✅ Use Correct Technical Indicator Syntax **DO:** Call `data.indicator()` without passing OHLCV data. ```python # Correct - no OHLCV parameters rsi = data.indicator("RSI", timeperiod=14) # Correct - multiple return values macd, macd_signal, macd_hist = data.indicator( "MACD", fastperiod=12, slowperiod=26, signalperiod=9 ) # Correct - Bollinger Bands upperband, middleband, lowerband = data.indicator( "BBANDS", timeperiod=20, nbdevup=2.0, nbdevdn=2.0, matype=0 ) ``` **DON'T:** Pass close price or OHLCV data to indicators. ```python # ❌ WRONG - don't pass close rsi = data.indicator("RSI", close, timeperiod=14) # ERROR ``` ### ✅ Use `df.shift(1)` for Previous Values **DO:** Use `.shift()` to access historical data. ```python # Correct - get previous day's close prev_close = close.shift(1) # Correct - detect crossover sma20 = close.average(20) sma60 = close.average(60) golden_cross = (sma20 > sma60) & (sma20.shift() < sma60.shift()) ``` **DON'T:** Use `.iloc[-2]` or similar indexing (can cause lookahead bias). ```python # ❌ WRONG prev_close = close.iloc[-2] # DON'T USE THIS ``` ### ✅ Use `data.universe()` for Filtering **DO:** Use context manager or `set_universe()` to filter stocks by market/category. ```python from finlab import data # Method 1: Context manager (temporary scope) with data.universe(market='TSE_OTC', category=['水泥工業']): price = data.get('price:收盤價') # Method 2: Set globally data.set_universe(market='TSE_OTC', category='半導體', exclude_category='金融') price = data.get('price:收盤價') ``` See [data-reference.md](data-reference.md) for complete `data.universe()` usage. ### ✅ Assign `resample` to Prevent Overtrading **DO:** Always specify `resample` parameter in `sim()`. ```python # Monthly rebalancing sim(position, resample="M") # Weekly rebalancing sim(position, resample="W") # Use monthly revenue index rev = data.get('monthly_revenue:當月營收') sim(position, resample=rev.index) ``` **DON'T:** Omit `resample` (defaults to daily, causes excessive trading). --- ## Anti-Patterns (DON'T DO THIS) ### ❌ Don't Use `==` for Float Comparisons **Reason:** Floating point precision issues. ```python # ❌ BAD condition = (close == 100.0) # ✅ GOOD - use inequalities or np.isclose() import numpy as np condition = np.isclose(close, 100.0) # Or better: condition = (close > 99.9) & (close < 100.1) ``` ### ❌ Don't Use `reindex()` on FinLabDataFrame **Reason:** FinLabDataFrame already automatically aligns indices/columns. ```python # ❌ BAD - unnecessary reindexing df1 = data.get("price:收盤價") df2 = data.get("monthly_revenue:當月營收") df2_reindexed = df2.reindex(df1.index, method='ffill') # DON'T DO THIS # ✅ GOOD - automatic alignment position = df1 > df1.average(60) & (df2 > df2.shift(1)) ``` **Exception:** Only use `reindex()` for position DataFrame when changing to a specific resampling schedule: ```python # ✅ Allowed - reindex position to monthly revenue dates rev = data.get('monthly_revenue:當月營收') position_resampled = position.reindex(rev.index_str_to_date().index, method="ffill") ``` ### ❌ Don't Use For Loops **Reason:** FinLabDataFrame methods are vectorized and much faster. ```python # ❌ BAD - iterating over rows for date in close.index: for stock in close.columns: if close.loc[date, stock] > sma60.loc[date, stock]: position.loc[date, stock] = True # ✅ GOOD - vectorized operations position = close > sma60 ``` ### ❌ Don't Filter 注意股/處置股/全額交割股 Unless Asked **Reason:** These filters remove many stocks and should only be applied when explicitly requested. ```python # ❌ DON'T do this by default is_regular = ( data.get("etl:noticed_stock_filter") & data.get("etl:disposal_stock_filter") & data.get("etl:full_cash_delivery_stock_filter") ) position = position & is_regular # ✅ Only do this if user specifically asks to remove these stocks ``` ### ❌ Don't Pass OHLCV to Technical Indicators **Reason:** `data.indicator()` automatically uses correct price data. ```python # ❌ WRONG close = data.get("price:收盤價") rsi = data.indicator("RSI", close, timeperiod=14) # ERROR # ✅ CORRECT rsi = data.indicator("RSI", timeperiod=14) # Automatically uses close ``` ### ❌ Don't Use Boolean Indexing with Mismatched Indices **Reason:** When extracting `.iloc[-1]` from DataFrames with different columns, the resulting Series have different indices. Boolean indexing then fails with `IndexingError`. ```python # ❌ BAD - indices may not match selected = latest_pe[latest_combined] # IndexingError # ✅ GOOD - align indices first common = latest_combined.index.intersection(latest_pe.index) selected = latest_pe.loc[common][latest_combined.loc[common]] ``` --- ## Preventing Future Data Pollution **Critical:** Future data pollution (lookahead bias) occurs when you use information that wouldn't have been available at the time of decision-making. This silently corrupts backtests and makes them unrealistic. ### ✅ Leave `df.index` As-Is **DO:** Keep index intact, even if it contains strings like "2025Q1". ```python # ✅ GOOD - leave index as-is revenue = data.get("monthly_revenue:當月營收") # Index may contain strings like "2022-01", "2022-02", etc. # FinLabDataFrame aligns by shape in binary operations position = revenue > revenue.shift(1) ``` **DON'T:** Manually assign to `df.index`. ```python # ❌ FORBIDDEN - can corrupt shared data df.index = new_index # NEVER DO THIS ``` ### ✅ Use Only Approved Resampling Method **DO:** Use exactly this pattern for resampling (datetime index required, use `.last()` only). ```python # ✅ CORRECT resampling pattern df = df.index_str_to_date().resample('M').last() ``` **DON'T:** Use other aggregation methods like `.mean()`, `.first()`, `.ffill()`. ```python # ❌ WRONG df = df.resample('M').mean() # Can cause lookahead df = df.resample('M').ffill() # Can cause lookahead ``` ### ✅ Use Only Approved Reindexing Method **DO:** Use exactly `method='ffill'` for reindexing. ```python # ✅ CORRECT df = df.reindex(target_index, method='ffill') ``` **DON'T:** Use other methods like `'bfill'` or `None`. ```python # ❌ WRONG df = df.reindex(target_index, method='bfill') # Lookahead bias df = df.reindex(target_index) # Missing data ``` ### ✅ Use `verify_strategy()` to Auto-Detect Lookahead Bias `verify_strategy()` automatically tests your strategy for lookahead bias by truncating data at historical dates and comparing results against a full-data run. > **Note:** This is a diagnostic tool — it runs the full strategy multiple times and is slow. Only use it when the user explicitly asks to verify lookahead bias. Do NOT run it as part of routine strategy building. Requires finlab >= 1.5.8 (`pip install finlab --upgrade`). ```python from finlab.verify import verify_strategy from finlab import data from finlab.backtest import sim def my_strategy(): close = data.get('price:收盤價') pb = data.get('price_earning_ratio:股價淨值比') position = pb[close > close.average(60)].is_smallest(10) return sim(position, resample='M', upload=False) result = verify_strategy(my_strategy, n_tests=5) print(result.passed) # True = no bias detected print(result.summary_df) # Per-date test results ``` **Parameters:** - `strategy` (Callable, required): Zero-arg function returning a `Report` (output of `sim()`) - `n_tests` (int, default=5): Number of random truncation dates to test - `test_dates` (list[str], optional): Explicit dates (YYYY-MM-DD) to test in addition to random sample - `verbose` (bool, default=True): Print progress and summary **Returns:** `VerifyResult` with `.passed`, `.n_tests`, `.n_passed`, `.n_failed`, `.summary_df`, `.details` --- ## Stock Selection Patterns ### Pattern 1: Limit to Top X% of Indicator ```python # Select stocks in top 30% by momentum momentum = close / close.shift(60) - 1 top_momentum = momentum.rank(axis=1, pct=True) > 0.7 ``` ### Pattern 1b: Stable Percentile Ranking with `valid=` When using `fillna()` before ranking (e.g. to compute indicators like SLOPE), the filled values inflate the rank denominator and shift all percentiles. Use `valid=` to exclude them: ```python ratio = close / close.shift(5) # fillna(1) needed for SLOPE, but those cells shouldn't count in rank score = ratio.fillna(1).apply(lambda s: talib.LINEARREG_SLOPE(s, timeperiod=5)) pct = score.rank(axis=1, pct=True, valid=ratio.notna()) ``` ### Pattern 2: Limit to Top N Stocks ```python # Select top 10 stocks with lowest P/B ratio pb = data.get("price_earning_ratio:股價淨值比") position = pb.is_smallest(10) # Select top 15 stocks meeting a condition volume = data.get("price:成交股數") liquid_stocks = volume.average(20) > 1000*1000 position = pb[liquid_stocks].is_smallest(15) ``` ### Pattern 3: Entry/Exit with `hold_until()` ```python close = data.get("price:收盤價") pb = data.get("price_earning_ratio:股價淨值比") # Define entry and exit signals entries = close > close.average(20) exits = close < close.average(60) # Hold until exit, limit to 10 stocks, rank by negative P/B position = entries.hold_until( exits, nstocks_limit=10, rank=-pb # Negative for ascending order (low P/B preferred) ) ``` ### Pattern 4: Industry Ranking ```python # Select top 20% within each industry roe = data.get("fundamental_features:ROE稅後") industry_top = roe.industry_rank() > 0.8 ``` --- ## Backtesting Patterns ### Pattern 1: Basic Backtest ```python sim(position, resample="M") ``` ### Pattern 2: Backtest Within Date Range ```python sim(position.loc['2020':'2023'], resample="M") ``` ### Pattern 3: Optuna Parameter Optimization ```python import optuna from finlab.backtest import sim def run_strategy(params): """Strategy function that returns a report""" sma_short = close.average(params['short']) sma_long = close.average(params['long']) position = (sma_short > sma_long) report = sim(position, resample="M", upload=False) return report def objective(trial): params = { 'short': trial.suggest_int('short', 5, 30), 'long': trial.suggest_int('long', 40, 120) } report = run_strategy(params) return report.metrics.sharpe_ratio() # Optimize with n_trials <= 10 study = optuna.create_study(direction='maximize') study.optimize(objective, n_trials=10) print(f"Best params: {study.best_params}") ``` ### Pattern 4: Evaluate Strategy Condition Coverage ```python # Check how often the condition is True (on average across stocks) condition = close > close.average(60) coverage = condition.sum(axis=1).loc['2020':].mean() print(f"Average stocks meeting condition: {coverage:.1f}") ``` ### Pattern 5: Adjust Rebalance Frequency ```python # Weekly sim(position, resample="W") # Monthly sim(position, resample="M") # Quarterly sim(position, resample="Q") # Custom: use monthly revenue index rev = data.get('monthly_revenue:當月營收') sim(position, resample=rev.index) ``` ### Pattern 6: Adjust Rebalance Offset ```python # Rebalance 1 week after period start sim(position, resample="M", resample_offset="1W") # Rebalance 1 month after quarter start sim(position, resample="Q", resample_offset="1M") ``` --- ## Error Handling ### Error: `_ArrayMemoryError` **Solution:** Reset kernel and try again. ```python # Call this if you encounter _ArrayMemoryError resetKernel() ``` ### Error: `requests.exceptions.ConnectionError` **Solution:** Reset kernel and retry. ```python resetKernel() ``` ### Error: 用量超限 (Quota Exceeded) **常見訊息:** `quota exceeded`, `daily limit reached`, `用量已達上限` **解決方案:** 1. **等待重置** - UTC+8 早上 8 點會自動重置用量 2. **升級方案** - 升級可獲得更多資料用量,詳見 https://www.finlab.finance/payment 3. **減少用量** - 避免重複取得相同數據,將常用數據存入變數;使用 `data.universe()` 限制股票範圍 ### Debugging Tips 1. **Break down experiments into small steps** ```python # Step 1: Fetch data close = data.get("price:收盤價") print(close.head()) # Step 2: Create condition condition = close > close.average(60) print(condition.head()) # Step 3: Select stocks position = condition.is_largest(10) print(position.head()) ``` 2. **Inspect variable values** after each step to ensure correctness. 3. **Use print statements** to display intermediate DataFrames. --- ## Strategy Design Principles ### Principle 1: Be Systematic - **Good:** Clearly define hypothesis, experiment setup, and evaluation criteria - **Good:** Import optuna to systematically explore parameter space - **Bad:** Randomly changing parameters without a clear plan ### Principle 2: Start Simple - Begin with a baseline strategy - Add complexity incrementally - Test each addition separately ### Principle 3: Write Clear, Maintainable Code - Use descriptive variable names - Add comments where logic isn't self-evident - Don't over-comment obvious operations --- ## Complete Pattern Examples ### Example 1: Value + Momentum + Liquidity ```python from finlab import data from finlab.backtest import sim # Fetch data close = data.get("price:收盤價") pb = data.get("price_earning_ratio:股價淨值比") volume = data.get("price:成交股數") # Create factors value = pb.rank(axis=1, pct=True) < 0.3 # Low P/B momentum = close.rise(20) # Rising liquidity = volume.average(20) > 500*1000 # Liquid # Combine position = value & momentum & liquidity position = pb[position].is_smallest(10) # Backtest report = sim(position, resample="M", stop_loss=0.08, upload=False) print(f"Annual Return: {report.metrics.annual_return():.2%}") print(f"Sharpe Ratio: {report.metrics.sharpe_ratio():.2f}") print(f"Max Drawdown: {report.metrics.max_drawdown():.2%}") ``` ### Example 2: Monthly Revenue Growth ```python from finlab import data from finlab.backtest import sim # Fetch revenue data rev = data.get("monthly_revenue:當月營收") rev_growth = data.get("monthly_revenue:去年同月增減(%)") # Revenue momentum rev_ma3 = rev.average(3) rev_high = (rev_ma3 / rev_ma3.rolling(12).max()) == 1 # Sustained growth strong_growth = (rev_growth > 20).sustain(3) # Combine position = rev_high & strong_growth position = rev_growth[position].is_largest(10) # Reindex to monthly revenue dates position_resampled = position.reindex(rev.index_str_to_date().index, method="ffill") # Backtest report = sim(position_resampled, upload=False) ``` --- ## See Also - [SKILL.md](SKILL.md) - Overview and quick start - [dataframe-reference.md](dataframe-reference.md) - FinLabDataFrame methods - [backtesting-reference.md](backtesting-reference.md) - Complete `sim()` API - [factor-examples.md](factor-examples.md) - 60+ complete examples