Demo: a demo version of the tagger can be viewd from here, this version may often be down because of the VPS breakdowns or code updates. thanks to Mr.Khalili for the design.
recognizing and extracting exact name entities, like Persons, Locations and Organizations from natural language text is called Named Entity Recognition (NER). for example in the sentence "[PER Murdoch] discusses future of [ORG News Corp]" understanding that the word "Murdoch" is a Person and the entity "News Corp" is the name of an Organization is the task of an NER agent.
Abstract:
in this paper, firstly, we are going to explain different techniques, methods, discuss design challenges, misconceptions and feature extraction which underline the development of an efficient and robust NER system. then we are going to introduce Iris: the Farsi name entity recognizer agent which is based on statistical distribution techniques and machine learning methods. Iris uses techniques corresponding to natural language processing and Farsi words and sentence's syntax and structure.

1. Introduction

Natural Language Processing applications are characterized by making complex interdependent decisions that require large amounts of existing knowledge. in this paper we discuss one such application, Named Entity Recognition (NER). the sentence below shows the necessity of using prior knowledge and non-local decisions. in the absence of such method it is difficult to understand "Blinker" is a person.

[PER BLINKER] BAN LIFTED . [LOC LONDON] 1996-12-06 [MISC Dutch] forward [PER Reggie Blinker] had his indefinite suspension lifted by [ORG FIFA] on Friday and was set to make his [ORG Sheffield Wednesday] comeback against [ORG Liverpool] on Saturday. [PER Blinker] missed his club’s last two games after [ORG FIFA] slapped a worldwide ban on him for appearing to sign contracts for both [ORG Wednesday] and [ORG Udinese] while he was playing for [ORG Feyenoord].>

it is also not obvious that the last mention of "Wednesday" is an organization. the whole task of an NER system is to identify, extract and label the entity names in a text.
data is being generated with a fast pace and with it the need to use them, but most of the data created is unstructured and unlabeled, the basic task of labeling this data, i.e. giving meaning to the text in a way computer and other algorithms understand is very important. it is the basis of many other problems such as Question Answering, Summarization Systems, Information Retrieval, Machine Translation, Semantic Web Search and Bioinformatics.
at this time researchers use different methods for this task which we will discuss. NER involves two tasks, which are firstly identification of proper names in text, and secondly the classification of these names into a set of predefined categories of interest.

1.1. Name entity types

different tasks require differnet number of types, but the most common types are PER for person names, ORG for organizations (companies, governmetnt organizations, committees, etc), LOC for locations (cities, countries, etc).
The CoNLL¹ shared task concentrated on four name entities: persons (PER), locations (LOC), organizations (ORG) and names of miscellaneous entities (MISC) that do not belong to the previous three groups.
MUC² in their MUC7 dataset which is a subset of the North American News Text Corpora annoted entitites including people, locations, organizations, monetry units, and so on.

1.2. NER tagging schemes

before we go along, in the task of NER there are two main tagging schemes, BIO and BILOU.

BIO: 'B' stands for 'beginning' and signifies the beginning of a name entity, 'I' stands for 'inside' and signifies that the word is inside of a name entity, and 'O' stands for 'outside' which signifies that the word is just a regular word outside of a name entity[1].
table 1-1.an example for name entity tags based on BIO scheme.

NE tag	word of sentence
B-PER	John
I-PER	Kerry
O	will
O	fly
O	to
B-LOC	paris

BILOU: it stands for Beginning, Inside and Last tokens of multi-token chunks, Unit-length chunks and Outside, which is self explanatory, in the evaluation section we'll discuss the evaluation process based on each scheme and problems that exist[1].
table 1-2.an example for name entity tags based on BILOU scheme.

NE tag	word of sentence
B-PER	John
L-PER	Kerry
O	will
O	fly
O	to
U-LOC	paris

1.3. Different NER approaches

in references [2,3] different approaches of NER are introduced, each with different usage:

1.3.1. Rule-based NER

hand made rule-based approach focuses on extracting name entities using a lot of human-made rule sets, generally the tagger system consists of a set of patterns using grammatical (e.g. part of speech), syntactic (e.g. word precedence) and orthogiraphic features (e.g. capitalization, prefixes, etc) in combination with dictionaries. these kind of models have better results for restricted domains and have the ability to detect complex entities that learning models have difficulty with. how ever these rule-based systems lack the ability of portability and even though the data is slightly changed, they have a high cost to change the rules. these approaches are domain and language specific and do not easily adapt to new domains and languages[2].
FASTUS is a system based on handcrafted regular expressions which divided the task into three main steps: recognizing phrases, recognizing patterns & groups and in the third stage merging the incidents, in which distinct structures that describe the same entity are identified and merged[4].

1.3.2. Machine learning-based NER

in these systems which is mostly the focus of this article, the NER task is converted into a classification problem and statistical models are employed to solve it, which will be discussed later on. in this type of approach, the system looks for patterns and relationshios in the text to make a model using statistical models and machine learning algorithms. unlike the rule-based method, these types of approaches can be easily changed and used at different domains and languages[2,3].
in this task two main methods of machine learning are used[2]:

supervised learning: it involves using a system that can learn to classify a given set of labled examples, the supervised term is used because the people who marked the training examples are teaching the program the right distinctions. this approach requires preparing a large amount of labeled training data. many methods such as HMM³[5], CRF⁴[6], perceptron[7] of this type of learning are proposed for NER which will be discussed later.
semi-supervised: a recent approach is to use both labeled and unlabelled data, this approach benefits from both supervised and unsupervised learning and has some interesting techniques. labled data is hard to find and expensive and time consuming to produce, in contrast unlabelled data is cheap and easy to obtain a huge amount of. recent result and especially in this task have shown that one could reach better results, simply by adding unlabelled data to their supervised system. some commonly used methods for semi-supervised learning are co-training, graph-based methods and so on[2].

1.3.3. Hybrid NER

the approach is to combine the rule-based and machine learning-based methods and make new methods from strongest points from each method. although this type of method might get better results than other approaches, the weakness of handcraft rule-based NER remains the same[2].

2. Related works

In recent years, automatic named entity recognition and extraction systems have become one of the popular research areas that a considerable number of studies have been addressed on developing these systems. NE⁵ recognization like many NLP⁶ tasks, can be broken down to two stages, preprocessing and processing.
the focus of this paper will mostly be on machine learning methods, so the features and methods discussed are mostly in same context.

2.1. Preprocessing

in an NLP problem we are dealing with a natural language, in which some things are fixed and hardly change, so we can use them as features to solve our problem. as illustrated in the introduction, NER is a knowledge-intensive task so in this section some important knowledge rescources will be discussed.

2.1.1. Gazetteers

an imortant question at the inception of NER was whether machine learning techniques are necessary at all, and wheter simple dictionary lookup would be suffiient for a good performance. indeed the baseline for this task was just a dictionary lookup for entitites which are in the training data[8]. it turns out that while problems of coverage and ambiguity prevent a simple straightforward lookup, using gazetteers as a feature in machine-learning based approches is critical for good performance[3].
knowing this, one can extract gazetters from the web and large collections of unlabeled text.

2.1.2. Brown clustering

it's been illustrated that unlabeled text can be used to improve the performance of the NER system[3]. brown clustering is a hierarchical word clustering algorithm[9]. the intuition behind this algorithm is that similar words appear in similar contexts, more precisely similar wods, have similar distribution of words to their immediate left and right. for example before the word "the" it's likely that you see "in" or "on" and many similar words. the algorithm takes sentences as input and based on the words and their similarities in the contexts and it outputs each word and its cluster. for example here is an input and output of the algorithm[10].
input:

the cat chased the mouse
the dog chased the cat
the mouse chased the dog

table 2-1. output of the brown clustering algorithm

cluster represented as bit string	word
0	the
10	chased
110	dog
1110	mouse
1111	cat

2.2. Processing

in this section we study some local and non local features.

2.2.1. Lexical information

using n-gram one can get a good improvement in results. n-gram is a language model for predicting the next item in a sequence of items, in this case characters of the words. an n-gram of size one is called unigram and size 2 is called bi-gram, which are the common n-grams used in this task. for example for a sequences of characters, the 3-grams that can be generated from "good morning" are "goo", "ood", "od ", "d m", " mo", "mor" and so forth[2].

2.2.2. Affix

affix is an addition to the base form of a word in order to change its meaning or create a new word, e.g. suffix and prefix letters.
using this feature one can once again identify similar words[2].

2.2.3. POS⁷ tags

part of speech tags which are givent to a word based on its role in the sentence as well as its definition and the context i.e. relationship with adjacent and related words in a sentence or phrase.
table 2-2. an example of POS tagging[2,3].

POS tag	word
DET	the
N_SING	waiter
VBD	cleared
N_PL	plates
P	from
DET	the
N_SING	table

2.2.4. Previous NE information

previous NE information also called UniChunk, can be used as a feature in many cases. the intuition behind it, is to find a similar situation, which had been recognized before in the train set and compare it to current situation of work chunks, knowing a number of predictions before e.g. two last name entity predictions can help find the similar situation[2,11].

2.2.5. Some simple patterns

capitalization of the word(in languages that have capitalization)[3].
number normalization: normalizing dates and numbers, that is 12/8/2004 becomes Date, 1980 becomes DDDD allows a degree of abstraction to years, phone numbers, etc[3].
word shapes: map words to simplified representation that encodes attributes such as length, campitalization, numerals, Greek letters, internal punctuation, etc[11].
table 2-3. an example of word shape[11].

shape	word
Xx-xxx	varicella-zoster
xXXX	mRNA
XXXd	CPA1

2.2.6. Classifier

in this task the most common classifiers used, are called sequence classifiers which in this case are ME⁸ sequence models and CRFs which predict labels from data[12].
in ME, there are words and the goal is to assign classes or tags to each of them. it's common to better explore the search space i.e. at a certain position one NE tag is the best, but later on once in another state it gives one a reason that maybe the former tags shoud have been chosen differently. in this section we study two of these inferences.

Beam inference: in Beam inference at each position rather than just deciding the most likely label, we can keep several posibilities, so we might keep the top K complete sub sequences up to the point where we are at so far, and so at each state, we'll consider each partial subsequence and extend that to one further position. it's very fast and easy to implement and it turns out, at Beam sizes of 3-5, gives enough posibilities and works as well as exact inference in many case. but one gets no guarantees that has find the globally best name entity tag sequence, and even worse the best sequence might not even be checked, or in terms of Beam inference, might fall off the Beam[12].

Figure 2-1. showing sequence inference based on sequence model
Viterbi inference: it's a form of dynamic programming and requires a small window of state influence (e.g. past two states are relevant). it guarantees that the global best sequence is returned, but it's harder to implement and forces one to a fixed small window[12].
CRFs: conditional random fields are another kind of probabilistic sequence model. CRFs are undirected graphical models used to calculate the conditional probability of values on particular output nodes given values assigned to other input nodes[13]. suppose the observed input data sequence:

o = \left \{ o_{1},o_{2},...o_{T} \right \}
such as a sequence of words in a document, and let 'S' be a set of FSM states, each associated with a tag, such as ORG, having the sequence of states (the values on T output nodes):
s = \left \{ s_{1},s_{2},...s_{T} \right \}
CRF can define the conditional probability of a state sequence given an input sequence as:
P(s|o) = \frac{1}{Z_{0}} exp(\sum_{t=1}^{T}\sum_{k}\lambda _{k}f_{k}(s_{t-1},s_{t},o,t))
where Z_{0} is a normalization factor over all state sequeces, f_{k} is an arbitary feature function over its arguments, and \lambda_{k} is a learned weight for each feature function[13]. we'll discuss the CRFs with more detail later in the implementation section.

3. Evaluation

the evaluation of name entity task is a little tricky, firstly becuse there are different number of name entity tag sets and also different schemes (e.g. earlier discussed BIO, and BILOU, etc) and secondly because it is done per entity and not per token, because in this task the goal is to find both the bandries and the class of a name entity, in other words our unit of interest is the whole entity.

Figure 3-1. example of NE evaluation per entity, showing the unit of interest

so in the contingency table the values will be at the level of entities.

3.1. Precision and Recall

precision and recall are standart tools in IR⁹ evaluation. in this task TP¹⁰ are the entities that system labeled correctly, FP¹¹ are the entities that system labeled incorrectly, FN¹² are the entities that should have been labeled, but the system did not labeled them. with that in mind Precision will be:

Precision = \frac{TP}{TP+FP}

and Recall would be:

Recall = \frac{TP}{TP+FN}

Precision and Recall are straight for tasks like IR and text categorization, where there is only one grain size document. these measure behave a bit differently for NER when there are boundary errors, which are very common. for example in the sentence "First Bank of Chicago announced earnings ..." if the system labeled "Bank of Chicago" as an organization instead of labeling the "First Bank of Chicago" it would count as both a FP and a FN, so selecting nothing would have been better.

3.2. F-score

F-score is a harmonic mean of precision and recall.

F_{\beta } = (1+\beta ^{2}).\frac{precision . recall}{(\beta ^{2} . precision)+recall}

despite the challenges in these measurements, they're the most common measures for evaluation in NER.

4. Experiments

What comes next is a briefing about the corpuses used in the experiments and evaluations, first method and features that were taken in action in each one and the evaluation of each trained model.

4.1. Corpus

The data used to train NER taggers conventionally consists of a huge number of rows, each one consists of a number of columns which are the actual word, part of speech tag and NER tag which sequentially form sentences.

First corpus that has been used in the experiments has the same conventional format, it consists of 208165 words. the data is distributed to total number of 9 classes. the NER tagging schema is BIO¹³ and the entity tags are PERS (for person), ORG (for organization), LOC (for location) and MISC (for miscellaneous).

Figure 4-1.showing a number of rows out of data set 'A' as an example

as the courtesy of Mr.Asgari.
But the problem with this data set is the ambiguity of entity names in Persian language (not to be confused with Name Entity Ambiguities which is another problem), for example in the sentences

ایران با واشنگتن در دیداری .....
or
پوتین، آنکارا را تهدید کرد.

the word ایران in here does not mean a location and neither does واشنگتن or آنکارا and should not be tagged LOC, both of them should be tagged as organizations.
Another important things is the lack of Ezafe construction annotation in the data set which can be used for phrase bounderies and sequences[14].

وزارت (کسره) آموزش و پرورش(کسره) ایران (کسره) اسلامی ...

Next problem with this data set is the lack of normalization in Persian characters, e.g. جمهوری and جمهوری are different, these problems can be seen in words with ی ,ک ,ه.

The second data set which is the courtesy of Mr.Abdous is far more better, richer and delicate than the first one. for the sake of referencing this data set will be called 'B'. this data set consists of 400 thousand rows, which like data set 'A' has BIO tagging schema and has 13 number of classes consisting 6 entity types: Facility, Organization, Person, Location, Event, Product.

Figure 4-2. showing a number of rows out of data set 'B' as an example

Unlike the data set 'A' this one benefits from Ezafe annotations as it can be seen in Figure 4-2 there are more columns in the data set, the 3rd column shows more information about the part of speech tag, showing Ezafe construction as 'GEN' which can be seen in some of the rows. In addition, this data set is normalized and because of its covering number of entity types, does not have the former discussed ambiguity.

4.2. Test 'A'

A couple number of tests have been done on each data set, only one of each will be discussed here. as discussed in many papers, CRFs are one of the best tools for the task of named entity recognition[15,16]. for both of these tests Stanford'CRF has been used, which is a very powerful and well-engineered CRF classifier[16]. this classifier is a part of Stanford CoreNLP which provides a set of natural language analysis tools.
The features used in this experiment are as follows:

Class Feature: which includes a feature for the class (as a class marginal) in other words puts a prior on the classes which is equivalent to how often the feature appeared in the training data.
N-grams: makes features from letter n-grams, i.e. substrings of the word, not the words.
No mid grams: which Does not include character n-gram features for n-grams that contain neither the beginning or end of the word.
Previous Sequences: using any class combination features using next classes.
Maximum left and right words (window size) used is 2.
Word Shape: Stanford NER feature factory has a couple of word shapers which can be used to return the shape of the word that can be used as a feature.
Disjunctive words: which include features giving disjunctions of words anywhere in the left or right disjunction-width which preserves direction but not position.
Quasi-Newton optimizer (L-BFGS): which actually is not a feature but is worth mentioning, it is an optimizer which maintains a number of past guesses which are used to approximate the Hessian. Having more guesses makes the estimate more accurate, and optimization is faster, but the memory used by the system during optimization is linear in the number of guesses.

Figure 4-3. an example of the model trained on data set 'A'

As it can be seen in Figure 4-3,this model tagged most of the entities right but failed to tag "فرهنگستان زبان و ادب فارسی ایران" as an organization, one of the reasons could be the lack of Ezafe construction annotations in the data set which was discussed earlier. training another model on this data set using more and better features would surely make it more precise.

4.3. Test 'B'

In this test data set 'B' has been used to train the model and the added features are as follows:

POS tags: the part of speech information about each word alongside other features could be very helpful.
Using Titles: matches a word against a list of name titles (آقا,خانم, etc.).
Using word pairs: adds features for (pw, w, c) and (w, nw, c), in which p is the position of the word in the sentence, w is the word, c is class and n is the n-gram for the word. this kind of feature would create a lot of additional features which may not be all useful.
In addition to these features this data set 'B' has more classes and rows.

Figure 4-4. an example of the model trained on data set 'B'

5. Performance Evaluation

Note: due to the highly time and resource consuming task of training the CRF on the data sets(1 day avg), Brown Clustering task(2 days avg) and time limitations it was not possible to add Gazettes and Brown-Clustering features to the CRF which would be a major help to the performance. the models in this phase will be used to gather a huge amount of Gazettes to be used alongside Brown-Clustering results in the next phase.

Figure 5-1. showing a number of Brown-Clustering results on the WikiPedia texts, as it can be seen, cities have the same binary code or class

Also due to the same reasons it was not possible for me to do a k-fold cross-validation on the data sets. these will be done in the next phase.

Test 'A': to evaluate the performance of this model, a number of 20 thousand rows of the data set 'A' was taken, tagged by the model and tested against the gold tags which lead to the F1-score of 90.36 %. this is an token-based evaluation not an entity-based one.
table 5-1. Precision,Recall and F1-score of model 'A'.

F1-Score	Precision	Recall
90.36	94.36	87.13

Test 'B': The same was done for model 'B' which lead to a F1-score of 93.25%.
table 5-2. Precision,Recall and F1-score of model 'B'.

F1-Score	Precision	Recall
93.25	99.40	87.83

6. Future work

As stated earlier, in the next phase new classifiers will be trained with more features such as Brown-Clusters, Gazettes and using Stems and Lemmas of the word as features. in addition, data sets are going to be separated into 5 parts, each time using 4 out of 5 as train set and the rest as test set to perform a 5-fold validation, repeating this process 5 times, with each of the 5 subsample used exactly once as the validation data.

7. References

[1] http://stackoverflow.com/questions/17116446/what-do-the-bilou-tags-mean-in-named-entity-recognition. as available on November 2015.
[2] Mansouri, Alireza.Affendey, Lilly Suriani,Mamat, Ali. Named Entity Recognition Approaches. Journal of Computer Science. 2008. vol 8. pages 339-344.
[3] Lev Ratinov, Dan Roth. Design Challenges and Misconceptions in Named Entity Recognition. Proceedings of the Thirteenth Conference on Computational Natural Language Learning. June 2009. pages 147-155.
[4] Jerry R. Hobbs, Douglas Appelt, John Bear,David Israel, Megumi Kameyama, Mark Stickel, and Mabry Tyson. FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text. “SRI International FASTUS
[5] Zhou, GuoDong and Su, Jian. Named entity recognition using an HMM-based chunk tagger. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics ACL 02. July 2002. pages 473-480.
[6] Vijay Krishnan and Christopher D. Manning.(july 2006) An Effective Two-StageModel for Exploiting Non-Local Dependencies in Named Entity Recognition. Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL ACL 06
[7] Lars Buitinck and Maarten Marx. (2012). Two-stage named-entity recognition using averaged perceptrons system MUC-6 test results and analysis”, Proceedings of the MUC-6, NIST, Morgan-Kaufmann Publisher, Columbia, 1995
[8] Erik F. Tjong Kim Sang and Fien De Meulder. Introduction to the Conference on Natural Language Learning-2003 Shared Task:Language-Independent Named Entity Recognition. (2003)
[9] Peter F. Brown; Peter V. deSouza; Robert L. Mercer, Vincent J. Della Pietra, Jenifer C. Lai (1992). Class-based n-gram models of natural language. Computational Linguistics 18.
[10] Percy Liang. Semi-Supervised Learning for Natural Language. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science Journal. 2005. pages 1-86.
[11] Professor Dan Jurafsky & Chris Manning. Sequence Models for Named Entity Recognition. Coursera Stanford NLP course. video 9.3. as view on November 2015.
[12] Professor Dan Jurafsky & Chris Manning. Maximum Entropy Sequence Models. Coursera Stanford NLP course. video 9.4. as view on November 2015.
[13] Mccallum, Andrew. Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. Conference on Computational Natural Language Learning. 2003. pages 1-4.
[14] Alireza Nourian, Mohammad Sadegh Rasooli􏰀, Mohsen Imany, and Heshaam Faili. On the Importance of Ezafe Construction in Persian Parsing. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics
and the 7th International Joint Conference on Natural Language Processing (Short Papers).Association for Computational Linguistics. 2015. pages 877–882
[15] Wenhui Liao and Sriharsha Veeramachanen. A Simple Semi-supervised Algorithm For
Named Entity Recognition. Proceedings of the NAACL HLT Workshop on Semi-supervised Learning for Natural Language Processing.2009. pages 58–65
[16] Jenny Rose Finkel, Trond Grenager, and Christopher Manning. Incorporating Non-local Information into Information
Extraction Systems by Gibbs Sampling. Association for Computational Linguistics. 2005. pages 363-370.

Due to privacy and policy of both data sets, either data sets or either models cannot be uploaded. the code for web demo is available at github.

the Conference on Natural Language Learning
Message Understanding Conference
hidden Markov model
conditional random field
name entity
natural language processing
part of speech
maximum entropy
information retrieval
true positives
false positives
false negatives
Beginning-Inside-Outside

تایید شده

با تشکر از زحمات فراوان و زمانی که مشخص است برای این پروژه صرف کرده اید,مواردی به نظر میرسد که خدمت شما عرض می کنم که اگر مایل بودید, اصلاح بفرمایید:
1- سیستم پیاده سازی شده در زبان فارسی است, اما گویا اصلا بحثی در مورد چالش های زبان فارسی نشده است.
2- ویژگی مورد استفاده مثل حروف بزرگ و کوچک در زبان انگلیسی بعضی مواقع رعایت نشده است.
3- اگر لینک مربوط به سورس کد در گیت هاب رو در بالا و درقسمت سنجش و ارزیابی قرار می دادید,بهتر بود.
4- در مورد تست هایی که گرفته شده صحبتی از اینکه داده آزمون به چه نحوی بوده است نشده است.
با تشکر

رد شده

سلام خدمت شما و خسته نباشید ....
با توجه به این که پروژه خود را به صورت انگلیسی نوشته اید ، مشکلاتی در فهم آن به وجود آورد ولی با بررسی آن مشخص شد که متن شما توضیح های نسبتا کاملی درباره موضوع بود و کدهای زیاد و خوب شما نشان از زحمت شما می دهد ولی متاسفانه ، مثل های کمی درباره شیوه انجام پروژه داشته اید تا روش های مختلف را نشان دهد .

باسلام.
به شما تبریک می‌گویم بابت کیفیت کم‌نظیر ارائه مقاله‌تان.
1-از موارد مثبت کار شما، می‌توان به قرار دادن demo در ابتدای پروژه اشاره کرد تا افرادی که قصد مطالعه مقاله شما را دارند، در ابتدا با دیدن این demo نتیجه نهایی پژوهش شما را بصورت interactive خواهند دید و این باعث ترغیب آنها به مطالعه جدی‌تر مقاله شما خواهد شد. اساساً یکی از دلایل ارائه مقالات بصورت آنلاین (مثلاً در بوته) استفاده از همین ظرفیت‌هایی است که محیط مجازی در اختیار می‌گذارد تا بعنوان مثال بتوان پروژه پیاده‌سازی شده را بصورت آنلاین مورد تست و ارزیابی قرار داد که شما به خوبی از این ظرفیت استفاده نموده‌اید.
2-شما در برخی عناوین در طول مقاله ارجاعات مستقیم به وب‌سایت ویکی‌پدیا داده‌اید از آنجا که میدانیم ارجاعات مستقیم به این وب‌سایت بعنوان منبع، در محافل علمی از ارزش ذاتی برخوردار نیست، بهتر است پس از استفاده از مطالب درج شده در وب‌سایت ویکی‌پدیا منابع اصلی که خود ویکی‌پدیا از آنها برای جمع آوری مطالب خود استفاده کرده را بعنوان منبع با درج در قسمت منابع مقاله‌تان استفاده کنید.
موفق باشید.

محسن ایمانی

آیریس: پیاده سازی یک عامل شناسایی موجودیت نامدار فارسی