What is feature engineering?
Model features are the inputs that machine learning (ML) models use during training and inference to make predictions. ML model accuracy relies on a precise set and composition of features. For example, in an ML application that recommends a music playlist, features could include song ratings, which songs were listened to previously, and song listening time. It can take significant engineering effort to create features. Feature engineering involves the extraction and transformation of variables from raw data, such as price lists, product descriptions, and sales volumes so that you can use features for training and prediction. The steps required to engineer features include data extraction and cleansing and then feature creation and storage.
What are the challenges of feature engineering?
Feature engineering is challenging because it involves a combination of data analysis, business domain knowledge, and some intuition. When creating features, it's tempting to go immediately to available data, but often you should start by considering which data is required by speaking with experts, brainstorming, and doing third-party research. Without going through this exercise, you could miss important predictor variables.
Data extraction
Feature creation
Feature storage
How can AWS help with feature engineering?
With Amazon SageMaker Data Wrangler, you can simplify the feature engineering process using a single visual interface. Using the SageMaker Data Wrangler data selection tool, you can choose the raw data that you want from various data sources and import it with a single click. SageMaker Data Wrangler contains over 300 built-in data transformations so that you can quickly normalize, transform, and combine features without having to write any code. After your data is prepared, you can build fully automated ML workflows with Amazon SageMaker Pipelines and save them for reuse in the Amazon SageMaker Feature Store. SageMaker Feature Store is a purpose-built repository where you can store and access features, so it’s easier to name, organize, and reuse them across teams. SageMaker Feature Store provides a unified store for features during training and real-time inference without the need to write additional code or create manual processes to keep features consistent.