The California housing price dataset contains information about housing prices and other features for various locations in California, United States. It is specially used for regression analysis and predictive modeling tasks. This dataset contains features including, the average number of rooms, average number of bedrooms, population, median income, etc., for multiple geographical areas around California, United States.
fetch_california_housing()
from sklearn datasets module is used to download and load the California housing dataset, which is a commonly used dataset for regression tasks in machine learning.
When fetch_california_housing()
is invoked, it automatically downloads the dataset (if it hasn’t been downloaded before) and returns a dictionary-like object that contains the data, target values, and more.
Parameters
The fetch_california_housing
function from scikit-learn’s datasets module takes the following parameters:
- data_home (optional): Specifies the directory where the dataset is stored or should be downloaded. If not provided, scikit-learn will use a default location to store the data.
- download_if_missing (optional): This boolean parameters shows whether to download the dataset, if its not already downloaded. Besides, its default value is True, shows it will be downloaded automatically when needed.
Return Type
It returns a dictionary-like object with the following attributes:
- data: A NumPy array as a feature matrix of dimensions (n_samples, n_features), where n_samples is the number of samples while n_features is the number of features.
- target: NumPy array of shape (n_samples,) that contains the target values (housing prices in this case) corresponding to each sample.
- feature_names: List of feature names, representing the names of the columns in the feature matrix.
- DESCR: A description of the dataset.
Explanation: California dataset
Here’s a coding explanation of how fetch_california_housing()
is used in a machine learning program to load the California housing dataset from sklean datasets repo:
from sklearn.datasets import fetch_california_housing
# Load the dataset
data = fetch_california_housing()
# Access the features and target values
X = data['data']
y = data['target']
# Additional information about the dataset
feature_names = data['feature_names']
description = data['DESCR']
- Line#1: Loading California housing dataset form sklean.datasets repo.
- Line#4: Invoking method to fetch dataset.
- Line#7-8: Access the features & target values of California housing.
In the above example, X represents the feature matrix (input data) and y represents the target variable (housing prices in this case). This data can be used to train regression models or perform other analysis tasks using scikit-learn or other ML frameworks in Python. Learn more about 20newsgroups dataset.