python
导入所需的库
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
1、数据准备
假设我们有一个包含电影评论及其情感标签的数据集
data = {
'review': [
I loved this movie! It was amazing.,
The acting was terrible and the plot was boring.,
A fantastic journey with great performances.,
I can't believe how bad this movie was.,
Absolutely one of the best films I've ever seen.
],
'sentiment': ['positive', 'negative', 'positive', 'negative', 'positive']
}
df = pd.DataFrame(data)
2、数据预处理
X = df['review'] 特征:电影评论
y = df['sentiment'] 标签:情感
将数据分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)