Data: 2000questions.txt
The dataset and annotation scheme is described in the paper
Mukhin, Mikhail and Braslavski, Pavel. What and How Do People Ask on Community Question Answering Services in Russian? In Proc. of the Dialog-2012 Conference on Computational Linguistics and Intelligent Technologies, 2012, pp. 137-150. (preprint, conference paper)
Each question is annotated with a line of tab-delimited values of the following format:
MailRuQuestionId ConvOrInf ExpectedAnswerType QuestionType [InfQuestionType] [RecommendationType|FactualQuestionType]
Based on MailRuQuestionId
you can compose an URL and access the actual question, e.g. 30001000 → http://otvet.mail.ru/question/30001000/
Question info is also available through Mail.Ru API, e.g. http://otvet.mail.ru/api/v2/question?qid=30001000
Each questions is primerely labeled either conv(ersational)
or inf(ormational)
. Inf
questions are labeled with regard to expected answer type and question type.
ExpectedAnswerType
values:
y(es/)n(o) single mult(iple) descr(iption)
Question types are as follows; two major classes (rec(ommendation)
and fact(ual)
) are further labeled with finer-grained labels:
rec(ommendation) method inf(ormation)search loc(ation) fact(ual) attr(ibute) obj(ect) possib(ility) reason time purpose fav(our) opin(ion) soc(ial) offer inv(itation)Please refer to the above mentioned publication when using the data.
Please sent questions and suggestions to pbras àÒ yandex dîÒ ru