The preprint ‘Early insights into the Arabic Citation Index’ is now available.
The Arabic Citation Index (ARCI) was launched in 2020. This study gives an overview of the scientific literature available in this new database. Such index will provide more visibility to research published in Arabic facilitating contribution to local and international research efforts.
By using metadata available in scientific publications, I analyse ARCI to characterize the scientific literature published in Arabic. The main objective was to provide a brief profile of ARCI.
First, I describe the data and the methods used in the analyses. As of October 2020, ARCI indexed 65,208 records covering the 2015-2019 period. This database is well structured with 48 fields of information in each record allowing multiple bibliometric analyses. As of now, most of the content found in ARCI is composed of articles (98.8% of the database).
Second, I explore the literature distributions at various levels (research domains, countries, languages, open access). I relied on the journal category and not on the topic covered in the individual publications to analyse the disciplinary coverage in ARCI. These categories or areas, which are defined at the journal level, are used as proxies for scientific fields. Results reveal the concentration of publications in the Arts & Humanities and Social Sciences fields. These categories represent 79% of ARCI’s total coverage. Since the Humanities tend to traditionally rely on book chapters and books, it will be interesting to analyse the evolution of the coverage by document type. ARCI also offers its own research categories. Some categories standout such as Islamic Studies, Islamic Jurisprudence, Islamic Creed, Poetry and Hadith which are fields well studied in the Arab region.
Most journals indexed in ARCI are currently published from Egypt, Algeria, Iraq, Jordan and Saudi Arabia. Unsurprisingly, ARCI has a great share of papers published in Arabic (93% of the database). As ARCI aims to provide more exposure to journals published in the Arab League countries, it is no surprise to see Arabic as the dominant language in this database. However, English and French are two other languages well represented in ARCI. Other languages suggest research published in ARCI journals may also tackle regional issues of interest with neighbour regions such as Europe and Asia. Around 7% of publications in ARCI are published in languages other than Arabic. Since ARCI is still new and under development, it will be interesting to track its evolution over time.
Around 24% of the content indexed in ARCI is openly accessible. This share is below the proportion of Open Access (OA) documents in the Web of Science Core Collection for the same period. The OA information available in ARCI is particularly useful to better share scientific knowledge as well as to track the adoption of local OA mandates by research managers. The Global Open Access Portal (GOAP) presented a snapshot of the status of Open Access (OA) to scientific information worldwide. Low level of awareness of the OA potential for researchers, publishers and policy makers tops the list of challenges. Lack of policy regulation, research funders’ OA mandates and resources to manage OA projects also contribute to the low OA penetration in the Arab world. Nevertheless, several projects and initiatives have been undertaken to promote OA in the Arab region.
Then, I use an unsupervised machine learning model, LDA (Latent Dirichlet Allocation) and the text mining algorithms of VOSviewer to uncover the main topics in ARCI. These methods are particularly useful to better understand the topical structure of ARCI. Titles, abstracts as well as author keywords have been concatenated into a single string which has been used by the text mining algorithms of VOSviewer. I limited my study to words written in Roman script alphabets. All records have at least the title written in English. The clustering is useful in delineating the topics covered as well as highlighting the relatedness between them. There is a clear heterogeneity in terms of topics covered in ARCI. Overall, the clusters found with VOSviewer seem to be closely related and show a broad coverage of ARCI. The title, abstract and keywords in Arabic were not included in the topic analysis. It might be interesting to characterize the literature in ARCI by focusing on the Arabic content as well.
Finally, I suggest few research opportunities after discussing the results of this study: detailed mappings of ARCI to better understand its structure and impact, and tracking its development and evolution by using dynamic topic models to study the time evolution of topics by using the text available in English as well as Arabic.
To read the full text: Early insights into the Arabic Citation Index