# FinlabDataFrame Reference ## Overview FinlabDataFrame is a powerful extension of pandas DataFrame specifically designed for financial data analysis and backtesting. It provides enhanced functionality for trading strategy development, including automatic index/column alignment, moving averages, entry/exit signal detection, and industry-based ranking. ## Key Features - Automatic re-alignment of indices and columns during arithmetic and logical operations - Built-in methods for moving averages and technical calculations - Entry/exit signal detection for trading strategies - Industry-based grouping and ranking - Multi-factor and industry neutralization - Integration with backtesting workflows **Contents:** [Constructor](#constructor) | [Index Conversion](#index-conversion-methods) | [Moving Average & Comparison](#moving-average--comparison-methods) | [Selection](#selection-methods) | [Signal Detection](#signal-detection-methods) | [Industry & Category](#industry--category-methods) | [Neutralization](#neutralization-methods) | [Quantile](#quantile-methods) | [Auto Alignment](#automatic-index-alignment) | [Method Chaining](#method-chaining-patterns) --- ## Constructor ### FinlabDataFrame Converts a regular pandas DataFrame to a FinlabDataFrame with enhanced financial data processing capabilities. **Signature:** ```python FinlabDataFrame(df: pd.DataFrame) ``` **Parameters:** - `df` (pd.DataFrame, required): A pandas DataFrame to be converted to FinlabDataFrame **Returns:** - An instance of FinlabDataFrame with enhanced financial data processing capabilities **Example:** ```python from finlab.dataframe import FinlabDataFrame from finlab import data import pandas as pd # Convert existing pandas DataFrame to FinlabDataFrame regular_df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) df = FinlabDataFrame(regular_df) # FinlabDataFrame is also automatically returned by data.get() price_df = data.get('price:收盤價') # Returns a FinlabDataFrame ``` --- ## Index Conversion Methods ### index_str_to_date Converts string-formatted financial report indices (e.g., "2022-Q1", "2022-M01") to datetime format based on actual disclosure dates. Essential for aligning financial data with daily price data. **Signature:** ```python index_str_to_date() -> FinlabDataFrame ``` **Returns:** - FinlabDataFrame with datetime index based on actual disclosure dates **Example:** ```python from finlab import data # Financial statement data has string index like "2022-Q1" cash = data.get('financial_statement:現金及約當現金') print(cash.index[:3]) # ['2013-Q1', '2013-Q2', '2013-Q3'] # Convert to actual disclosure dates cash_dated = cash.index_str_to_date() print(cash_dated.index[:3]) # DatetimeIndex(['2013-05-15', '2013-08-14', ...]) ``` **Note:** This method uses actual financial statement disclosure dates from `etl:financial_statements_disclosure_dates`, not simple quarter-end dates. --- ### deadline Converts financial report indices to regulatory deadline dates (公告截止日). Unlike `index_str_to_date()` which uses actual disclosure dates, this uses the official filing deadlines. **Signature:** ```python deadline() -> FinlabDataFrame ``` **Returns:** - FinlabDataFrame with datetime index based on regulatory filing deadlines **Example:** ```python from finlab import data # Convert quarterly data to deadline dates cash = data.get('financial_statement:現金及約當現金') cash_deadline = cash.deadline() # Convert monthly revenue to deadline dates revenue = data.get('monthly_revenue:當月營收') revenue_deadline = revenue.deadline() ``` **Use Case:** Use `deadline()` when you want conservative signal timing that assumes data arrives at the latest possible date. Use `index_str_to_date()` when you want signal timing based on actual historical disclosure. --- ## Moving Average & Comparison Methods ### average Calculates a moving average over n periods. Returns NaN if more than half the values in the window are NaN. **Signature:** ```python average(n: int) -> FinlabDataFrame ``` **Parameters:** - `n` (int, required): Number of periods for the moving average **Returns:** - FinlabDataFrame representing the moving average **Example:** ```python from finlab import data close = data.get('price:收盤價') sma10 = close.average(10) sma60 = close.average(60) # Stock price above moving average cond = close > sma60 ``` --- ### rise Determines if values are rising compared to n periods before. Returns True if current value > value n periods ago. **Signature:** ```python rise(n: int = 1) -> FinlabDataFrame ``` **Parameters:** - `n` (int, optional, default=1): Number of periods to compare **Returns:** - Boolean FinlabDataFrame indicating rising trends **Example:** ```python from finlab import data close = data.get('price:收盤價') # Price higher than 10 days ago rising = close.rise(10) # Consecutive rising days consecutive_rise = close.rise().sustain(3) ``` --- ### fall Determines if values are falling compared to n periods before. Returns True if current value < value n periods ago. **Signature:** ```python fall(n: int = 1) -> FinlabDataFrame ``` **Parameters:** - `n` (int, optional, default=1): Number of periods to compare **Returns:** - Boolean FinlabDataFrame indicating falling trends **Example:** ```python from finlab import data close = data.get('price:收盤價') # Price lower than 10 days ago falling = close.fall(10) # Avoid stocks in downtrend avoid = close.fall(20) ``` --- ### sustain Checks whether a condition is sustained over a moving window of n days. Returns True if the sum of True values in the window meets or exceeds the threshold. **Signature:** ```python sustain(nwindow: int, nsatisfy: int = None) -> FinlabDataFrame ``` **Parameters:** - `nwindow` (int, required): Window length (in days) - `nsatisfy` (int, optional): Minimum number of True values required; defaults to `nwindow` if not provided **Returns:** - Boolean FinlabDataFrame **Example:** ```python from finlab import data close = data.get('price:收盤價') # Price rising for 3 consecutive days rising_3days = close.rise().sustain(3) # Price rising at least 4 out of 5 days rising_4of5 = close.rise().sustain(5, 4) ``` --- ## Selection Methods ### is_largest Returns a boolean DataFrame where True values represent the top n largest values for each date. Eliminates the need for row-by-row iteration with nlargest. **Signature:** ```python is_largest(n: int) -> FinlabDataFrame ``` **Parameters:** - `n` (int, required): Number of top values to select on each date **Returns:** - Boolean FinlabDataFrame with True for top n stocks on each date **Example:** ```python from finlab import data from finlab.backtest import sim # Select 10 stocks with highest ROA roa = data.get('fundamental_features:ROA稅後息前') top_roa = roa.is_largest(10) # Backtest holding top ROA stocks report = sim(top_roa, resample='Q') ``` --- ### is_smallest Returns a boolean DataFrame where True values represent the n smallest values for each date. **Signature:** ```python is_smallest(n: int) -> FinlabDataFrame ``` **Parameters:** - `n` (int, required): Number of smallest values to select on each date **Returns:** - Boolean FinlabDataFrame with True for bottom n stocks on each date **Example:** ```python from finlab import data from finlab.backtest import sim # Select 10 stocks with lowest P/B ratio pb = data.get('price_earning_ratio:股價淨值比') lowest_pb = pb.is_smallest(10) # Backtest value strategy report = sim(lowest_pb, resample='M') ``` --- ### rank Computes ranking across rows or columns. **Includes lookahead bias warning when ranking along the time axis.** **Signature:** ```python rank(*args, valid=None, **kwargs) -> FinlabDataFrame ``` **Parameters:** - Same parameters as `pandas.DataFrame.rank()` - `axis` (int or str, default=0): Axis to rank along. **Warning: axis=0 may cause lookahead bias** - `valid` (DataFrame or Series of bool, optional): Only cells where `valid` is True participate in ranking. Cells where `valid` is False/NaN are set to NaN before ranking, so they do not affect the `pct=True` denominator. Common usage: after `fillna()`, pass the original `notna()` mask to prevent newly-listed or history-insufficient stocks from polluting percentile rankings. **Returns:** - FinlabDataFrame with rankings **Example:** ```python from finlab import data pb = data.get('price_earning_ratio:股價淨值比') # SAFE: Cross-sectional ranking (rank stocks against each other per day) pb_rank = pb.rank(axis=1, pct=True) # Select stocks in bottom 30% of P/B each day cheap = pb_rank < 0.3 # Using valid= to exclude fillna'd stocks from rank denominator: close = data.get('price:收盤價') ratio = close / close.shift(5) # fillna(1) is needed for SLOPE computation, but those fake values # should not count in percentile ranking score = ratio.fillna(1).apply(some_func) score.rank(axis=1, pct=True, valid=ratio.notna()) # WARNING: Time-series ranking triggers LookaheadWarning # This ranks each stock's current value against its future values # pb.rank(axis=0) # Will emit warning - use rolling().rank() instead ``` **Warning:** Ranking along `axis=0` (time axis) uses future data and will emit a `LookaheadWarning`. Use `rolling().rank()` or `expanding().rank()` for safe time-series ranking. --- ## Signal Detection Methods ### is_entry Identifies entry signal points where the condition switches from False to True. **Signature:** ```python is_entry() -> FinlabDataFrame ``` **Returns:** - Boolean FinlabDataFrame indicating entry signals **Example:** ```python from finlab import data close = data.get('price:收盤價') # Condition: price in top 10 position = close.is_largest(10) # Find days when stock enters top 10 entry_signals = position.is_entry() ``` --- ### is_exit Identifies exit signal points where the condition switches from True to False. **Signature:** ```python is_exit() -> FinlabDataFrame ``` **Returns:** - Boolean FinlabDataFrame indicating exit signals **Example:** ```python from finlab import data close = data.get('price:收盤價') # Condition: price in top 10 position = close.is_largest(10) # Find days when stock exits top 10 exit_signals = position.is_exit() ``` --- ### exit_when Creates a position DataFrame that enters on entry signals and exits when either the original condition becomes False OR the specified exit condition becomes True. **Signature:** ```python exit_when(exit: pd.DataFrame) -> FinlabDataFrame ``` **Parameters:** - `exit` (pd.DataFrame, required): Additional exit condition DataFrame **Returns:** - Boolean FinlabDataFrame representing positions **Example:** ```python from finlab import data close = data.get('price:收盤價') volume = data.get('price:成交股數') # Entry: price breaks above 20-day high entry_cond = close > close.rolling(20).max().shift() # Additional exit: volume spike (possible distribution) volume_spike = volume > volume.average(20) * 3 # Position with additional exit condition position = entry_cond.exit_when(volume_spike) ``` --- ### hold_until Generates trading positions based on entry signals until exit signals occur. Supports stock rotation limits, stop-loss/take-profit, and ranking-based selection. **Signature:** ```python hold_until( exit: pd.DataFrame, nstocks_limit: int = None, stop_loss: float = -np.inf, take_profit: float = np.inf, trade_at: str = 'close', rank: pd.DataFrame = None ) -> FinlabDataFrame ``` **Parameters:** - `exit` (pd.DataFrame, required): Exit signal DataFrame - `nstocks_limit` (int, optional): Maximum number of stocks to hold simultaneously - `stop_loss` (float, optional, default=-np.inf): Stop loss threshold (e.g., 0.1 = exit if down 10%) - `take_profit` (float, optional, default=np.inf): Take profit threshold (e.g., 0.2 = exit if up 20%) - `trade_at` (str, optional, default='close'): Price reference for stop/take profit ('close' or 'open') - `rank` (pd.DataFrame, optional): Ranking DataFrame for prioritizing entries when limit is reached (higher = priority) **Returns:** - Boolean FinlabDataFrame with positions (True indicates holding) **Example:** ```python from finlab import data from finlab.backtest import sim close = data.get('price:收盤價') pb = data.get('price_earning_ratio:股價淨值比') # Entry: price above 20-day MA entries = close > close.average(20) # Exit: price below 60-day MA exits = close < close.average(60) # Hold max 10 stocks, prefer lower P/B (use negative for ascending) position = entries.hold_until( exits, nstocks_limit=10, stop_loss=0.1, # 10% stop loss take_profit=0.3, # 30% take profit rank=-pb # Lower P/B = higher priority ) report = sim(position) ``` --- ## Industry & Category Methods ### groupby_category Groups DataFrame columns by their industry category. Similar to `pandas.DataFrame.groupby()` but groups stocks by industry. **Signature:** ```python groupby_category() -> pd.core.groupby.DataFrameGroupBy ``` **Returns:** - A GroupBy object with groups defined by industry categories **Example:** ```python from finlab import data pb = data.get('price_earning_ratio:股價淨值比') # Average P/B by industry industry_pb = pb.groupby_category().mean() # Plot semiconductor industry P/B over time industry_pb['半導體'].plot(title='Semiconductor P/B Ratio') ``` --- ### industry_rank Calculates percentile ranking for stocks within their respective industries. Returns values from 0 (lowest in industry) to 1 (highest in industry). **Signature:** ```python industry_rank(categories: list = None) -> FinlabDataFrame ``` **Parameters:** - `categories` (list, optional): List of industry categories to consider. If None, uses all industries from `data.get('security_industry_themes')` **Returns:** - FinlabDataFrame with industry-relative ranking scores (0 to 1) **Example:** ```python from finlab import data pe = data.get('price_earning_ratio:本益比') # Rank P/E within each industry pe_industry_rank = pe.industry_rank() # Select stocks that are cheap relative to their industry cheap_in_industry = pe_industry_rank < 0.3 ``` --- ### entry_price Retrieves the adjusted price at entry signal points. **Signature:** ```python entry_price(trade_at: str = 'close') -> FinlabDataFrame ``` **Parameters:** - `trade_at` (str, optional, default='close'): Price type ('close' or 'open') **Returns:** - FinlabDataFrame with entry prices (forward-filled) **Example:** ```python from finlab import data close = data.get('price:收盤價') # Position signal position = close.is_largest(10) # Get entry prices for calculating returns entry_prices = position.entry_price() current_return = close / entry_prices - 1 ``` --- ## Neutralization Methods ### neutralize Performs cross-sectional regression to neutralize factors from the data. Returns residuals after regressing on specified neutralizer factors. **Signature:** ```python neutralize( neutralizers: Union[pd.DataFrame, list[pd.DataFrame], dict[str, pd.DataFrame]], add_const: bool = True ) -> FinlabDataFrame ``` **Parameters:** - `neutralizers` (DataFrame, list, or dict, required): Factor(s) to neutralize against - Single DataFrame: neutralize against one factor - List of DataFrames: neutralize against multiple factors - Dict of DataFrames: neutralize with named factors - `add_const` (bool, optional, default=True): Whether to include intercept in regression **Returns:** - FinlabDataFrame containing regression residuals (neutralized values) **Example:** ```python from finlab import data # Original factor pe = data.get('price_earning_ratio:本益比') # Neutralize against size (market cap) size = data.get('etl:market_value') pe_size_neutral = pe.neutralize(size) # Neutralize against multiple factors pb = data.get('price_earning_ratio:股價淨值比') pe_multi_neutral = pe.neutralize([size, pb]) # Neutralize with named factors (for clarity) pe_named_neutral = pe.neutralize({ 'size': size, 'size_squared': size ** 2, 'pb': pb }) ``` **Use Case:** Factor neutralization removes unwanted exposures. For example, if you want a value factor that isn't just picking small-cap stocks, neutralize against market cap. --- ### neutralize_industry Performs cross-sectional regression to neutralize industry effects. Each stock is regressed on industry dummy variables, returning industry-neutral residuals. **Signature:** ```python neutralize_industry( categories: pd.DataFrame = None, add_const: bool = True ) -> FinlabDataFrame ``` **Parameters:** - `categories` (pd.DataFrame, optional): DataFrame with 'stock_id' and 'category' columns. If None, uses `data.get('security_categories')` - `add_const` (bool, optional, default=True): Whether to include intercept (one dummy is dropped to avoid multicollinearity) **Returns:** - FinlabDataFrame containing industry-neutralized values **Example:** ```python from finlab import data # P/E ratio varies significantly by industry pe = data.get('price_earning_ratio:本益比') # Remove industry effects pe_industry_neutral = pe.neutralize_industry() # Now pe_industry_neutral represents deviation from industry average # Positive = expensive relative to industry peers # Negative = cheap relative to industry peers # Custom industry categories custom_cats = pd.DataFrame({ 'stock_id': ['2330', '2317', '1101', '2412'], 'category': ['半導體', '電子', '水泥', '電信'] }) pe_custom_neutral = pe.neutralize_industry(categories=custom_cats) ``` --- ## Quantile Methods ### quantile_row Computes the specified quantile across all stocks for each date. **Signature:** ```python quantile_row(c: float) -> pd.Series ``` **Parameters:** - `c` (float, required): Quantile value between 0 and 1 (e.g., 0.9 for 90th percentile) **Returns:** - pandas Series containing the quantile value per date **Example:** ```python from finlab import data close = data.get('price:收盤價') # 90th percentile price each day q90 = close.quantile_row(0.9) # Median price each day median = close.quantile_row(0.5) # Select stocks above 90th percentile expensive = close > close.quantile_row(0.9) ``` --- ## Automatic Index Alignment FinlabDataFrame automatically aligns indices and columns when performing operations between DataFrames with different frequencies or shapes. **Supported Operations:** - Arithmetic: `+`, `-`, `*`, `/`, `//`, `%`, `**` - Comparison: `>`, `>=`, `==`, `!=`, `<`, `<=` - Logical: `&`, `|`, `^` **Example:** ```python from finlab import data # Daily data close = data.get('price:收盤價') # Daily frequency # Quarterly data roa = data.get('fundamental_features:ROA稅後息前') # Quarterly frequency # Operations automatically align - no manual reindex needed cond1 = close > close.average(60) # Daily condition cond2 = roa > 0 # Quarterly condition # Intersection works across frequencies # Quarterly data is forward-filled to daily position = cond1 & cond2 ``` **Alignment Rules:** 1. Index: Union of both indices, forward-filled 2. Columns: Intersection of both column sets 3. String indices (e.g., "2022-Q1") are converted to datetime using disclosure dates --- ## Method Chaining Patterns FinlabDataFrame methods can be chained for concise strategy expression. **Example: Complete Strategy** ```python from finlab import data from finlab.backtest import sim close = data.get('price:收盤價') pb = data.get('price_earning_ratio:股價淨值比') roa = data.get('fundamental_features:ROA稅後息前') # Chain conditions and selection position = ( pb[ (close > close.average(60)) & # Above 60-day MA (roa > 0) & # Profitable (close.rise(5)) # Rising momentum ] .neutralize_industry() # Industry-neutral P/B .is_smallest(10) # Cheapest 10 stocks ) report = sim(position, resample='M') ``` --- ## Related References - [Backtesting Reference](backtesting-reference.md) - Learn how to backtest strategies using FinlabDataFrame - [Data Reference](data-reference.md) - Explore available data sources - [Factor Examples](factor-examples.md) - See practical examples of using FinlabDataFrame in strategies - [Best Practices](best-practices.md) - Avoid common pitfalls including lookahead bias