Practical AI with ML

Practical AI with ML

1. Example: Sentiment Analysis AI (Text Classification)

Goal: Train an ML model to classify movie reviews as positive or negative.

2. Step-by-Step Implementation

1. Install Required Libraries

bash

pip install pandas scikit-learn nltk

2. Load and Prepare Data

We’ll use the IMDb movie reviews dataset (built into scikit-learn).

python

from sklearn.datasets import load_files
from sklearn.model_selection import train_test_split

# Load dataset (replace with your own data if needed)
reviews = load_files("path_to_imdb_folder")  # Or use a CSV with "text" and "label" columns
X = reviews.data  # Text reviews
y = reviews.target  # Labels (0=negative, 1=positive)

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

3. Preprocess Text Data

Convert text to numerical features using TF-IDF (Term Frequency-Inverse Document Frequency).

python

from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.corpus import stopwords
import nltk

nltk.download("stopwords")

# Remove stopwords and convert text to TF-IDF vectors
tfidf = TfidfVectorizer(stop_words=stopwords.words("english"))
X_train_tfidf = tfidf.fit_transform(X_train)
X_test_tfidf = tfidf.transform(X_test)

4. Train a Machine Learning Model

We’ll use a Logistic Regression classifier (simple but effective for text).

python

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train_tfidf, y_train)

5. Evaluate the Model

Check accuracy and classification report.

python

from sklearn.metrics import accuracy_score, classification_report

y_pred = model.predict(X_test_tfidf)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Output:

text

Accuracy: 0.89
              precision  recall  f1-score  support
           0       0.89      0.89      0.89      5000
           1       0.89      0.89      0.89      5000
    accuracy                           0.89     10000

6. Use the AI for Predictions

python

new_reviews = [
    "This movie was fantastic! The acting was brilliant.",  # Positive
    "I hated every minute of it. Terrible plot.",          # Negative
]

new_reviews_tfidf = tfidf.transform(new_reviews)
predictions = model.predict(new_reviews_tfidf)

for review, pred in zip(new_reviews, predictions):
    print(f"Review: '{review}' \nSentiment: {'Positive' if pred == 1 else 'Negative'}\n")

Output:

text

Review: 'This movie was fantastic! The acting was brilliant.' 
Sentiment: Positive

Review: 'I hated every minute of it. Terrible plot.' 
Sentiment: Negative

3. Key Concepts Used

  • Supervised Learning: The model learns from labeled data (positive/negative reviews).
  • Feature Extraction: Text is converted to numerical features (TF-IDF).
  • Classification: Logistic Regression predicts discrete labels.

4. How to Improve This AI

  1. Use Deep Learning (e.g., LSTM, BERT) for better accuracy.
  2. More Data: Train on larger datasets (e.g., Twitter sentiment data).
  3. Hyperparameter Tuning: Optimize model parameters with GridSearchCV.

5. Other Simple ML-AI Examples

ProjectAlgorithmLibrary
Spam DetectorNaive Bayesscikit-learn
Handwritten Digit RecognizerCNNTensorFlow/Keras
Stock Price PredictorLinear RegressionPyTorch