1. Example: Sentiment Analysis AI (Text Classification)
Goal: Train an ML model to classify movie reviews as positive or negative.
2. Step-by-Step Implementation
1. Install Required Libraries
bash
pip install pandas scikit-learn nltk
2. Load and Prepare Data
We’ll use the IMDb movie reviews dataset (built into scikit-learn).
python
from sklearn.datasets import load_files
from sklearn.model_selection import train_test_split
# Load dataset (replace with your own data if needed)
reviews = load_files("path_to_imdb_folder") # Or use a CSV with "text" and "label" columns
X = reviews.data # Text reviews
y = reviews.target # Labels (0=negative, 1=positive)
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
3. Preprocess Text Data
Convert text to numerical features using TF-IDF (Term Frequency-Inverse Document Frequency).
python
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.corpus import stopwords
import nltk
nltk.download("stopwords")
# Remove stopwords and convert text to TF-IDF vectors
tfidf = TfidfVectorizer(stop_words=stopwords.words("english"))
X_train_tfidf = tfidf.fit_transform(X_train)
X_test_tfidf = tfidf.transform(X_test)
4. Train a Machine Learning Model
We’ll use a Logistic Regression classifier (simple but effective for text).
python
from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X_train_tfidf, y_train)
5. Evaluate the Model
Check accuracy and classification report.
python
from sklearn.metrics import accuracy_score, classification_report
y_pred = model.predict(X_test_tfidf)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
Output:
text
Accuracy: 0.89
precision recall f1-score support
0 0.89 0.89 0.89 5000
1 0.89 0.89 0.89 5000
accuracy 0.89 10000
6. Use the AI for Predictions
python
new_reviews = [
"This movie was fantastic! The acting was brilliant.", # Positive
"I hated every minute of it. Terrible plot.", # Negative
]
new_reviews_tfidf = tfidf.transform(new_reviews)
predictions = model.predict(new_reviews_tfidf)
for review, pred in zip(new_reviews, predictions):
print(f"Review: '{review}' \nSentiment: {'Positive' if pred == 1 else 'Negative'}\n")
Output:
text
Review: 'This movie was fantastic! The acting was brilliant.' Sentiment: Positive Review: 'I hated every minute of it. Terrible plot.' Sentiment: Negative
3. Key Concepts Used
- Supervised Learning: The model learns from labeled data (
positive/negativereviews). - Feature Extraction: Text is converted to numerical features (TF-IDF).
- Classification: Logistic Regression predicts discrete labels.
4. How to Improve This AI
- Use Deep Learning (e.g., LSTM, BERT) for better accuracy.
- More Data: Train on larger datasets (e.g., Twitter sentiment data).
- Hyperparameter Tuning: Optimize model parameters with
GridSearchCV.
5. Other Simple ML-AI Examples
| Project | Algorithm | Library |
|---|---|---|
| Spam Detector | Naive Bayes | scikit-learn |
| Handwritten Digit Recognizer | CNN | TensorFlow/Keras |
| Stock Price Predictor | Linear Regression | PyTorch |

