一、课程简介 | Course Description
几世纪以来,人文科学一直是针对特定文化主题进行“精读”(close reading)的,即通过精读揭示一层层含义,从而获得深层次的理解。然而,这种精读方法正越来越多地被基于计算机建模和语料库语言学的“远读”(distant reading)方法所取代,使得研究人员能专注于比单一案例/文本/作者更小或者更大的要素,如词汇、话题、体裁、主题等。
For centuries, the humanities has operated through the close reading of cultural objects: reading to uncover layers of meaning that lead to deep comprehension. Such ‘close’ approaches are increasingly replaced by ‘distant’ methods that rely on programmatic modeling and corpus linguistics. This allows researchers to focus on units that are much smaller or much larger than the singular case study, text, or author – words, topics, genres, themes, and so on.
在社交媒体背景下,“精读”和“远读”尤其相关。社交媒体的特点之一是虚假信息(disinformation )的传播——例如后真相(post-truth)、过滤气泡(filter bubbles)和点击诱饵(clickbait)。网络上,内容的传播方式似乎更多的是病毒式的、争议式的,而非通过对话传播真实信息。我们该如何理解网络平台上的大量社交数据,才能揭示其背后的意识形态、偏见和争议呢?
Close and distant reading are especially relevant in the context of social media, which are marked by the spread of disinformation – captured in terms such as post-truth, filter bubbles, and clickbait. The visibility of online content often seems to be informed more by virality and controversy than by truthfulness and dialogue. How can we understand the large quantities of social data on online platforms in order to reveal ideologies, biases, and controversies?
在本课程中,我们将利用社交媒体数据来揭示意义的构建模式。运用多种文本分析和数据分析方法(如tf-idf、主题建模和词嵌入等),学生将学会以批判性和探索性的方式,对语料库语言学研究方法进行应用和思考。我们会重点探讨网络社区内事实和意见的构建方式,以及在自然语言中所呈现的模式和偏见。
In this course, we will engage with social media data in order to uncover such patterns of meaning-making. Using a variety of strategies of textual and data analysis (e.g. tf-idf, topic modelling and word embeddings), students will learn to apply and critically reflect on corpus linguistics with a critical and explorative mindset. We will focus on the discursive ways in which facts and opinions are negotiated within communities and the patterns and biases that appear in natural language.
二、课程目标 | Learning Outcomes
--能够认识及了解几种流行的文本分析定量方法,及其各自在认识论上的优劣。
Attain knowledge and understanding of the epistemological potentials and pitfalls of several popular quantitative approaches to text analysis.
--能够运用文本分析和语言分析方法,对当代社交媒体数据集进行分析。
Apply textual and language analysis methods to contemporary datasets taken from social media.
--能够认识定量方法框架中的规范和预设。
Demonstrate an awareness of the norms and presuppositions in quantitative methodological frameworks.
--能够以批判性和探索性的思维方式应用数字人文的量化研究方法。
Applying quantitative methods from the Digital Humanities with a critical and explorative mindset.
三、课程主题 | Course Outline
课程1:数字人文入门 Introduction
介绍Jupyter Notebook和代码仓库;学习Python编程基础。
Introduction to Jupyter Notebooks, class repositories; working through some programming fundamentals in Python.
课程2:Pandas & NLTK
学习Pandas库的DataFrames数据结构对社交数据进行基本数据处理;学习用NLTK库的Text class类对文本进行初步处理。
Exploring basic operations on Pandas DataFrames when dealing with social data. Looking at NLTK’s Text class that allows for initial exploration of texts.
课程3:主题建模 Topic Modeling
超越“作者”视角——学习主题建模方法,探索数据中呈现出的模式;运用主题建模获得的结果进行“精读”。
Exploring topic modelling as one way to move beyond the author and explore discursive patterns in our data. Using topic modeling findings to engage in close reading.
课程4:词嵌入 Word Embeddings
使用Python中的Word2Vec功能介绍词嵌入知识;对词嵌入模型中隐含的偏见进行批判性讨论。
Introducing Word Embeddings through Word2Vec in Python. Critical discussion about the concerns of bias implicit in Word Embeddings models.
课程5:回归分析 Regression Analysis
使用Python进行回归分析。
Regression analysis in Python.
课程6:Naïve Bayes和情感分析 Naïve Bayes & Sentiment Analysis
使用Python的NLTK库进行Naïve Bayes分类和情感分析。
Naïve Bayes classification and sentiment analysis using NLTK in Python.