Skip to main content

majority_sampling_ratio: <list[float]> (Optional)

Description

A list of majority sampling ratios for AutoML to explore. The majority_sampling_ratio parameter controls undersampling of the majority class in binary classification tasks.
It specifies how many majority-class examples to keep per minority-class example during training.

In other words:

For every example in the minority class, we sample majority_sampling_ratio examples from the majority class.
This parameter must be greater than 0.

Behavior

  • If the dataset’s actual majority-to-minority ratio is greater than the specified majority_sampling_ratio, undersampling is applied to reduce the imbalance.
  • If the dataset’s actual ratio is less than or equal to the specified ratio, the parameter has no effect (i.e., all data are used).
Example 1: Undersampling applied Suppose your dataset has:
  • Majority-class examples: 10,000
  • Minority-class examples: 100
    → Actual ratio = 100:1
If you set:
majority_sampling_ratio = 20
Then for each minority-class example, we keep 20 majority-class examples.
Resulting sampled data:
  • Majority-class examples kept: 100 × 20 = 2,000
  • Minority-class examples: 100
    → Resulting ratio = 20:1
Undersampling is applied because the actual ratio (100) is greater than the desired ratio (20). Example 2: No effect (ignored) Using the same dataset (10,000 majority, 100 minority → 100:1 ratio), if you set:
majority_sampling_ratio = 150
Then the desired ratio (150:1) is larger than the dataset’s actual ratio (100:1).
Since the dataset is already less imbalanced than the target, no undersampling occurs.
All majority examples are kept, and this setting is ignored.
Summary table
Dataset Majority:Minoritymajority_sampling_ratioAction TakenResulting Ratio
100:120Undersample majority20:1
100:150Undersample majority50:1
100:1100No change (equal ratio)100:1
100:1120Ignored (ratio already smaller)100:1
Note: Specifying a weight column via SDK and also specifying majority_sampling_ratio leads to majority_sampling_ratio being ignored.

Supported Task Types

  • Binary Classification

Default Values

run_modeDefault Value
FASTNone
NORMALNone
BESTNone