1. Example: Sentiment Analysis AI (Text Classification)
Goal: Train an ML model to classify movie reviews as positive
or negative
.
2. Step-by-Step Implementation
1. Install Required Libraries
bash
pip install pandas scikit-learn nltk
2. Load and Prepare Data
We’ll use the IMDb movie reviews dataset (built into scikit-learn).
python
from sklearn.datasets import load_files from sklearn.model_selection import train_test_split # Load dataset (replace with your own data if needed) reviews = load_files("path_to_imdb_folder") # Or use a CSV with "text" and "label" columns X = reviews.data # Text reviews y = reviews.target # Labels (0=negative, 1=positive) # Split into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
3. Preprocess Text Data
Convert text to numerical features using TF-IDF (Term Frequency-Inverse Document Frequency).
python
from sklearn.feature_extraction.text import TfidfVectorizer from nltk.corpus import stopwords import nltk nltk.download("stopwords") # Remove stopwords and convert text to TF-IDF vectors tfidf = TfidfVectorizer(stop_words=stopwords.words("english")) X_train_tfidf = tfidf.fit_transform(X_train) X_test_tfidf = tfidf.transform(X_test)
4. Train a Machine Learning Model
We’ll use a Logistic Regression classifier (simple but effective for text).
python
from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X_train_tfidf, y_train)
5. Evaluate the Model
Check accuracy and classification report.
python
from sklearn.metrics import accuracy_score, classification_report y_pred = model.predict(X_test_tfidf) print("Accuracy:", accuracy_score(y_test, y_pred)) print(classification_report(y_test, y_pred))
Output:
text
Accuracy: 0.89 precision recall f1-score support 0 0.89 0.89 0.89 5000 1 0.89 0.89 0.89 5000 accuracy 0.89 10000
6. Use the AI for Predictions
python
new_reviews = [ "This movie was fantastic! The acting was brilliant.", # Positive "I hated every minute of it. Terrible plot.", # Negative ] new_reviews_tfidf = tfidf.transform(new_reviews) predictions = model.predict(new_reviews_tfidf) for review, pred in zip(new_reviews, predictions): print(f"Review: '{review}' \nSentiment: {'Positive' if pred == 1 else 'Negative'}\n")
Output:
text
Review: 'This movie was fantastic! The acting was brilliant.' Sentiment: Positive Review: 'I hated every minute of it. Terrible plot.' Sentiment: Negative
3. Key Concepts Used
- Supervised Learning: The model learns from labeled data (
positive
/negative
reviews). - Feature Extraction: Text is converted to numerical features (TF-IDF).
- Classification: Logistic Regression predicts discrete labels.
4. How to Improve This AI
- Use Deep Learning (e.g., LSTM, BERT) for better accuracy.
- More Data: Train on larger datasets (e.g., Twitter sentiment data).
- Hyperparameter Tuning: Optimize model parameters with
GridSearchCV
.
5. Other Simple ML-AI Examples
Project | Algorithm | Library |
---|---|---|
Spam Detector | Naive Bayes | scikit-learn |
Handwritten Digit Recognizer | CNN | TensorFlow/Keras |
Stock Price Predictor | Linear Regression | PyTorch |