语料库研究基本方法.ppt
语料库研究基本方法,中国外语教育研究中心 梁茂成,主要内容,语料库语言学的性质 几个常用术语 语料库研究的基本方法,语料库语言学的性质,理性主义与经验主义 Rationalism: I think therefore I am. Empiricism: My mind is a blank slate. Seeing is believing.,语料库语言学的性质,the Wax Argument: He considers a piece of wax; his senses inform him that it has certain characteristics, such as shape, texture, size, color, smell, and so forth. When he brings the wax towards a flame, these characteristics change completely. However, it seems that it is still the same thing: it is still a piece of wax, even though the data of the senses inform him that all of its characteristics are different.,语料库语言学的性质,the Wax Argument: Therefore, in order to properly grasp the nature of the wax, he cannot use the senses. He must use his mind. Descartes concludes: “ And so something which I thought I was seeing with my eyes is in fact grasped solely by the faculty of judgment which is in my mind.,语料库语言学的性质,Empiricism: Empiricism emphasizes those aspects of scientific knowledge that are closely related to evidence, especially as discovered in experiments. It is a fundamental part of the scientific method that all hypotheses and theories must be tested against observations of the natural world, rather than resting solely on reasoning and intuition.,语料库语言学的性质,Science is considered to be methodologically empirical in nature. Corpus linguistics is empirical in nature.,语料库语言学的性质,语言研究中的数据类型 内省数据(introspective data): rationalism 实验数据(experimental data): empiricism 真实数据(anthentic data): empricism,语料库语言学的性质,语料库语言学提倡真实数据 我们不排斥其他数据类型,语料库语言学的性质,即便在语料库语言学阵营之中 Corpus-driven: minimum theory-reliance. Exclusive reliance on corpus data for all theories Corpus-based: Reliance on corpus data for hypothesis-testing Corpus-referenced/informed: Occasionally resorting to corpus data for illustrations,语料库语言学的性质,我们坚决反对不顾语言事实的任何论断 No introspection can claim credence without verification through real language data (Teubert 2005).,几个常用术语,Corpus Corpus linguistics,几个常用术语,Token, type, lemma The little boy looked at the other boys.,几个常用术语,Collocation is defined as a sequence of words which co-occur more often than would be expected by chance. a big smoker a strong smoker a hard smoker a heavy smoker a furious smoker,几个常用术语,It is quite possible, in fact, to describe a woman as handsome. However, this implies that she is not beautiful at all in the traditional sense of female beauty, but rather that she is mature in age, has large features and a certain strength of character. Similarly, a man could be described as beautiful, but this would usually imply that he had feminine features.,几个常用术语,Colligation is defined as a sequence of grammatical categories which co-occur more often than would be expected by chance.,几个常用术语,Semantic prosody is instantiated when a word such as CAUSE co-occurs regularly with words that share a given meaning or meanings, and then acquires some of the meaning(s) of those words as a result. This acquired meaning is known as semantic prosody. (Stewart 2010),语料库研究的基本方法,Corpus-based approach: a hypothesis-testing approach Corpus-driven approach: with as “few preconceived ideas” as possible, “keeping the amount of theory-reliance to a minimum in order not to hinder the process of discovering new phenomena” (Römer 2005),语料库研究的基本方法,Both approaches almost always involve a comparion of some kind.,语料库研究的基本方法,Sizes of corpora in comparison (Rayson 2003) Small big Equal sizes,语料库研究的基本方法,Types of comparison Across genres Across users Across different times Across (varieties of) language(s),语料库研究的基本方法,Corpus comparability,语料库研究的基本方法,Linguistic features in corpus comparison Lexical Lexico-grammatical Syntactic Discoursal,语料库研究的基本方法,Statistic tests in corpus comparison Simple: Relationship (correlation, etc) Difference (chi-square, loglikelihood, etc.) Complicated: regression analysis, factor analysis, cluster analysis, correspondence analysis,语料库研究的基本方法,语 料 库,研究问题,软件,词汇 短语 搭配 语义韵 类联接 句式 等,内容5,Thank you.,