Imbalance in scikit-learn

I'm using scikit-learn in my Python program in order to perform some machine-learning operations. The problem is that my data-set has severe imbalance issues. Is anyone familiar with a solution...

Dealing with the class imbalance in binary classification

Here's a brief description of my problem: I am working on a supervised learning task to train a binary classifier. I have a dataset with a large class imbalance distribution: 8 negative instances...

Is it possible that Precision-Recall curve or a ROC curve is a horizontal line?

I am working on a binary classification task on imbalanced data. Since the accuracy is not so meaningful in this case. I use Scikit-Learn to compute the Precision-Recall curve and ROC curve in...

Problems importing imblearn python package on ipython notebook

I installed https://github.com/glemaitre/imbalanced-learn on windows powershell using pip install, conda and github. But when I'm on iPython notebook and I tried to import the package using: from...

Balanced Random Forest in scikit-learn (python)

I'm wondering if there is an implementation of the Balanced Random Forest (BRF) in recent versions of the scikit-learn package. BRF is used in the case of imbalanced data. It works as normal RF,...

How can I set sub-sample size in Random Forest Classifier in Scikit-Learn? Especially for imbalanced data

Currently, I am implementing RandomForestClassifier in Sklearn for my imbalanced data. I am not very clear about how RF works in Sklearn exactly. Here are my concerns as follows: According to the...

Using imblearn for oversampling multi class data

I want to use RandomOverSampler function from imbalanced-learn module to perform oversampling the data with more than two classes. The following is my code with 3 classes: import numpy as np from...

Jupyter: No module named 'imblearn" after installation

I installed "imbalanced-learn" (version 0.3.1) on ANACONDA Navigator. When I ran an example from the imbalanced-learn website using Jupyter (Python 3), I got an message regarding...

Does neural networks learn distribution in training dataset?

I am trying to train a Convolutional Neural Network on dataset with imbalanced classes (20% class 1, 70% class 2, 10% class 3). I want the network to learn that class 1 and class 3 occur very...

Python imbalanced-learn: How does the BalancedBaggingClassifier 'ratio' argument work?

Introduction I am working on a binary classification task with very imbalanced datasets (~1000 instances of class 1, ~10000000 instances of class 0) and am experimenting with the imbalanced-learn...

ModuleNotFoundError: No module named 'imblearn'

I tried running the following code: from imblearn import under_sampling, over_sampling from imblearn.over_sampling import SMOTE sm = SMOTE(random_state=12, ratio = 1.0) x_SMOTE, y_SMOTE =...

param-grid passing parameters to an underlying function. lost in kw_args

I am lost here. Hope some one can shed some light. I have built a pipeline (sklearn pipe, actually to be precise an imbalanced-learn pipeline) The first step of the pipe is a FunctionSampler...

Up Sampling imbalanced dataset's minor classes

i am using scikit-learn to classify my data, at the moment i am running a simple DecisionTree classifier. I have three classes with a big imbalanced problem. The classes are 0,1 and 2. The minor...

Anaconda: ModuleNotFoundError: No module named 'conda'

Please note this error is different than what shows up (and has answer) in stackoverflow. It is definitely not duplicated. I have seen this error before and has been able to fix it by modifying...

Random forest: balancing test set?

I am trying to run a Random Forest Classifier on an imbalanced dataset (~1:4). I am using the method from imblearn as follows: from imblearn.ensemble import...

No module named 'sklearn.neighbors._base'

I have recently installed imblearn package in jupyter using !pip show imbalanced-learn But I am not able to import this package. from tensorflow.keras import backend from imblearn.over_sampling...

Imbalanced-learn: Import Error: cannot import name 'MultiOutputMixin'

I've re-installed the latest scikit-learn and imbalanced-learn. I've also checked all other libraries to make sure it is compatible to Imbalanced-learn. I just want to run a simple...

Stratified cross-validation in PySpark

I am using the Apache Spark API in python, PySpark (--version 3.0.0), and would ideally like to perform cross-validation of my labelled data in a stratified manner since my data is highly...

ImportError: cannot import name '_deprecate_positional_args' from 'sklearn.utils.validation' to import imblearn

I am installing imbalanced-learn. it has successfully installed but on importing i am getting this error: ImportError: cannot import name '_deprecate_positional_args' from...

How to use cross validation in keras classifier

I was practicing the keras classification for imbalanced data. I followed the official example: https://keras.io/examples/structured_data/imbalanced_classification/ and used the scikit-learn api...

cross_val_score is returning nan list of scores in scikit learn

I am trying to handle imbalanced multi label dataset using cross validation but scikit learn cross_val_score is returning nan list of values on running classifier. Here is the code: import pandas...

Python package for SMOTEBoosting algorithm

I am looking for a Python package for SMOTEBoosting algorithm. But I can only find SMOTE in imbalanced-learn. Can anybody help out here?

It seems that scikit-learn has not been built correctly

I have been using Jupyter Notebook for my machine learning project. Before scikit-learn was working fine but eventually I installed pip install imblearn and pip install -U imbalanced-learn after...

Calculating micro F-1 score in keras

I have a dataset with 15 imbalanced classes and trying to do multilabel classification with keras. I am trying to use micro F-1 score as a metric. My model: # Create a VGG instance model_vgg =...

ValueError: could not broadcast input array from shape (3,96) into shape (184,96) while using SMOTENC from imbalanced learn

I am trying to use SMOTENC from the imbalanced learn library to oversample a dataframe that includes both categorical and numerical variables. There are 55 columns in total where 3 of them are...

How to use downsampling and configure class weight parameter when using XGboost for imbalanced classification?

I am working on binary classification problem on a dataset with extreme class imbalance. To help the model learn the signals of the minority class, I downsampled the majority class such that the...

Sudden Tensorflow / Keras Google Colab dependency problems `AttributeError: module 'tensorflow._api.v1.compat.v2' has no attribute '__internal__'`

I have running a machine learning model (Matterport's Mask R-CNN) in google colab for a couple of weeks. All of a sudden today I am unable to run any of my notebooks due to I think some kind of...

How to install PyCaret in AWS Glue

How can I properly install PyCaret in AWS Glue? Methods I tried: --additional-python-modules and --python-modules-installer-option Python library path easy_install as described in...

A problem in using AIF360 metrics in my code

I am trying to run AI Fairness 360 metrics on skit-learn (imbalanced-learn) algorithms, but I have a problem with my code. The problem is when I apply skit-learn (imbalanced-learn) algorithms like...

Import error _euclidean_distances from sklearn.metrics.pairwise

I am working with Orange 3.30.1 trying to use the Python Script widget to add SMOTE to my data classification problem (the Orange team has refrained from implementing it, and suggest this way...