1

I am having a problem where there is too much indentation in the abstract, tables captions, and figure captions. Does this have anything to do with the compiler? and since \noindent did not work. What is the best way to fix this problem? I could not remember when this problem was introduced. I had to change the compiler a few times to be able to write Arabic.

Here is the code:

% This is samplepaper.tex, a sample chapter demonstrating the
% LLNCS macro package for Springer Computer Science proceedings;
% Version 2.20 of 2017/10/04
%
\documentclass[runningheads]{llncs}

\usepackage[T1]{fontenc}

\usepackage{caption}

\usepackage{graphicx}
\graphicspath{{images/}}

\usepackage{hyperref}
\usepackage{color}
\renewcommand\UrlFont{\color{blue}\rmfamily}

\usepackage[nil,bidi=basic]{babel}
\babelprovide[import,language=Default,main]{english}
\babelprovide[import,language=Default]{arabic}
\babelfont{rm}{Noto Serif}
\babelfont[arabic]{rm}{Noto Naskh Arabic}
\babelfont{sf}{Noto Sans}
\babelfont[arabic]{sf}{Noto Sans Arabic}





\usepackage{comment}
\usepackage{tabularx}

\begin{document}

\title{Named Entities As A Way To Identify Homogeneous News Stories In Arabic And English In Cross-Language Information Retrieval}


\author{Hussam Hallak\inst{1}\orcidID{0000-0002-0754-9747} \and
Michael Nelson\inst{2}\orcidID{0000−0003−3749−8116}}

\institute{Old Dominion University, Norfolk VA 23529, USA\\
\email{\{hhallak,mln\}@cs.odu.edu}}

\maketitle  

\begin{abstract}
The remarkable increase in news outlets' presence on the internet has spawned a wide range of research topics; one of which is news similarity. While there has been a large amount of literature around news written in English, only a few addressed Arabic news.

\keywords{Arabic News Similarity  \and Arabic Named Entity Recognition and Classification \and Named Entity Normalization.}
\end{abstract}

\section{Introduction}
Information extraction from unstructured textual data using Named Entity Recognition (NER) techniques has been studied extensively over the past few decades due to its various Natural Language Processing (NLP) applications \cite{Balla2020ExplorationOA} such as paraphrase detection, documents classification. text similarity, duplicate questions identification, plagiarism detection, smart chat bots, and question answering \cite{Nadeau2007ASO}. Named entities (NEs) extracted from an unstructured text using Named Entity Recognition and Classification (NERC) techniques hold valuable information about the text.

\begin{figure}
    \includegraphics[width=\textwidth]{2.png}
    \caption{ALP results (full credit given for all extracted NEs regardless of classification.}
    \label{fig:2}
\end{figure}

\begin{table}
\caption{Results from comparing ALP to our new approach (GTS)}\label{tab1}
\begin{tabular}{|c|c|c|c|c|c|c|}
\hline
Tool &  P & R & F & TNR & A & BA \\
\hline
ALP (full credit) &  0.907 & \textbf{0.863} & 0.876 & 0.994 & 0.985 & \textbf{0.93}\\
GTS (full credit) &  \textbf{0.977} & 0.816 & \textbf{0.881} & \textbf{1.0} & \textbf{0.989} & 0.91\\
\hline
ALP (partial credit) &  0.842 & \textbf{0.851} & 0.835 & 0.991 & 0.981 & \textbf{0.923}\\
GTS (partial credit) & \textbf{0.925} & 0.805 & \textbf{0.849} & \textbf{0.996} & \textbf{0.985} & 0.903\\
\hline
\end{tabular}
\end{table}

The results of our approach were higher than those produced by the ALP in terms of Precision, F-measure, True Negative Rate, Accuracy. ALP metrics are higher in terms of Recall and Balanced Accuracy.

\begin{figure}
    \includegraphics[width=\textwidth]{8.png}
    \caption{ALP vs GTS (full credit for wrong classification)}
    \label{fig:8}
\end{figure}


\subsection{Observed Shortcomings}
Some of the observed cases where one or both tools, ALP and GTS, failed to extract or classify named entities are analyzed and briefly discussed to spark possible improvements. Although this comparison is not performed on a large dataset, it is meant to serve as a proof of concept.


\begin{figure}
    \includegraphics[width=\textwidth]{10.png}
    \caption{Khadija transliteration Variants}
    \label{fig:10}
\end{figure}



\begin{table}
\caption{Results from using Levenshtein Distance to match name transliteration variants}\label{tab2}
\begin{tabular}{|c|c|c|c|}
\hline
\textbf{First Transliteration}  &  \textbf{Second Transliteration} & \textbf{Score} & \textbf{Result} \\
\hline
Muhammed & Mohamed & 80 & True Positive\\
Muhammad & Mohammed & 75 & True Positive\\
Mohamad &  Muhammed & 67 & False Negative\\
Abdel aziz & Abdul aziz & 90 & True Positive\\
Raqia & Rakia & 80 & True Positive\\
Hiba & Heba & 75 & True Positive\\
Ahmed & Ahmad & 80 & True Positive\\
Noor & Nour & 75 & True Positive\\
Hamid & Hameed & 73 & False Negative\\
Hamid & Hamed & 80 & False Positive\\
Fatima & Fatema & 83 & True Positive\\
Hussam & Hosam & 73 & False Negative\\
Husam & Houssam & 83 & True Positive\\
Hussam & Housam & 83 & True Positive\\
Hussam & Houssam & 92 & True Positive\\
Widad & Noor & 0 & True Negative\\
Widad & Wael & 44 & True Negative\\
Widad & Waleed & 55 & True Negative\\
Abdul aziz & Abdul rahman & 64 & True Negative\\
Esam & Husam & 67 & True Negative\\
Wesam & Husam & 60 & True Negative\\
Wesam & Esam & 89 & False Positive\\
Hussam & Hassan & 67 & True Negative\\
Hasan & Hassan & 91 & False Positive\\
Eman & Emad & 75 & False Positive\\
Hussam Hallak & Hussam Masri & 64 & True Negative\\
Muhammed & Muhanned & 75 & False Positive\\
\hline
\end{tabular}
\end{table}


Fig~\ref{fig:16} shows that results of comparing all algorithms in one direction. 

\begin{figure}
    \includegraphics[width=\textwidth]{16.png}
    \caption{Matching English transliterations of Arabic names running algorithms in one direction}
    \label{fig:16}
\end{figure}


2. Using all algorithms to code names in both directions and compare the results:

a. Consider two names a match if their codes in both directions match.

Fig~\ref{fig:17} shows that results of comparing all algorithms in both directions (AND). 

\begin{figure}
    \includegraphics[width=\textwidth]{17.png}
    \caption{Matching English transliterations of Arabic names running algorithms in both direction (codes must match in both directions)}
    \label{fig:17}
\end{figure}

b. Consider two names a match if their codes in either of the two directions match.

Fig~\ref{fig:18} shows that results of comparing all algorithms in both directions (OR). 

\begin{figure}
    \includegraphics[width=\textwidth]{18.png}
    \caption{Matching English transliterations of Arabic names running algorithms in both direction (codes must match in one of the two directions)}
    \label{fig:18}
\end{figure}

We found that running Double Metaphone in one direction produced the best results when compared to the results from running other phonetic algorithms on our dataset.

\subsection{Matching English transliterations of Arabic names using Google Translate}
We tested Google Translate on the same dataset we constructed to examine phonetic algorithms. We found that Google Translate is an effective way for matching Arabic names written in English (transliterations). Table~\ref{tab7} shows the input (Arabic names transliterations) and the output of Google Translate (the Arabic name). 

\begin{table}
\caption{Examples of the Input and Output of Google Translate}\label{tab7}
\begin{tabular}{|c|c|c|}
\hline
\textbf{Arabic Name} & \textbf{Transliteration/Input}  &  \textbf{Google Translate Output}\\
\hline
\foreignlanguage{arabic}{حسن} & Hasan & \foreignlanguage{arabic}{حسن}\\
\foreignlanguage{arabic}{حسني} & Housny & \foreignlanguage{arabic}{حسني}\\
\foreignlanguage{arabic}{حسين} & Housien & \foreignlanguage{arabic}{حسين}\\
\foreignlanguage{arabic}{حسان} & Hassan & \foreignlanguage{arabic}{حسان}\\
\foreignlanguage{arabic}{حسنيه} & Housneya &
\textcolor{red}{\foreignlanguage{arabic}{حسينيا}} \\
\foreignlanguage{arabic}{حسونه} & Hassouna &
\textcolor{red}{\foreignlanguage{arabic}{حسونة}} \\
\foreignlanguage{arabic}{حسام} & Hussam & \foreignlanguage{arabic}{حسام} \\

\hline
\end{tabular}
\end{table}

Furthermore, Google Translate was able to map transliteration variants to the same name. 

Table~\ref{tab8} shows the input (Faisal Mekdad's different transliterations) and the output of Google Translate (the Arabic name). 

\begin{table}
\caption{Examples of the Input and Output of Google Translate for Faisal Mekdad's different transliterations}\label{tab8}
\begin{tabular}{|c|c|c|}
\hline
\textbf{Arabic Name} & \textbf{Transliteration/Input}  &  \textbf{Google Translate Output}\\
\hline
\foreignlanguage{arabic}{فيصل مقداد} & Faisal Mekdad & \foreignlanguage{arabic}{فيصل مقداد}\\
\foreignlanguage{arabic}{فيصل مقداد} & Faisal Mukdad & \foreignlanguage{arabic}{فيصل المقداد}\\
\foreignlanguage{arabic}{فيصل مقداد} & Faisal Mokdad & \foreignlanguage{arabic}{فيصل المقداد}\\
\foreignlanguage{arabic}{فيصل مقداد} & Faisal Miqdad & \foreignlanguage{arabic}{فيصل المقداد}\\
\foreignlanguage{arabic}{فيصل مقداد} & Faisal Maqdad & \foreignlanguage{arabic}{فيصل المقداد} \\
\hline
\end{tabular}
\end{table}

The first variations of ``Faisal Mekda" is mapped to the same name as the rest of the variations but without the article ``The" added in the beginning of the last name.

\foreignlanguage{arabic}{الـ} is the Arabic equivalent to the article ``The" in English. Adding/omitting the article ``The" is common in Arabic names, especially in last names. We suspect that this is why Google Translate gives the option to edit the translation as shown in Fig~\ref{fig:19}. 

\begin{figure}
    \includegraphics[width=\textwidth]{19.png}
    \caption{Translation options by Google Translate}
    \label{fig:19}
\end{figure}

\subsubsection{Biblical Names}

Mapping Biblical names to their Arabic counterpart using string matching and/or phonetic algorithms is not easy because the pronunciation of English Biblical names is different from its pronunciation in Arabic (Biblical names are not originally Arabic). The spelling and pronunciation of Biblical names in Arabic is often dictated by their Hebrew pronunciation (unless the names are of an Arabic origin). For Example, the Biblical name ``Michael" is \foreignlanguage{arabic}{ميخائيل} in Arabic and is pronounced ``mikhayiyl". Fig~\ref{fig:20} shows how Google Translate was able to map ``Michael" and its various transliterations to the same Arabic name \foreignlanguage{arabic}{ميخائيل}.

\begin{figure}
    \includegraphics[width=\textwidth]{20.png}
    \caption{Mapping variants of the name ``Michael" using Google Translate}
    \label{fig:20}
\end{figure}

\subsubsection{The tool, Merge Arabic Names (MAN)}
We developed a tool that will take a list of transliterations, leverage Google Translate API to map the list of transliterations to their corresponding Arabic names, and group transliterations by the Arabic name from which they were derived. MAN solves the problem of the presence/absence of the Article ``The" \foreignlanguage{arabic}{الـ} in Arabic names allowing all matching names to be merged.

\paragraph{Sample Input:}
List of transliterations: \\
Muhammed, Ahmad, Muhamed, Hamid, Muhamad, Husam, Mohamad, Mahmood, Hussam, Mahmud, Ahmed, Husam, Mohammed, Mohamed, Housam, Mahmod, Hameed, Muhammad, Houssam, Mohammad

\paragraph{Sample Output:}
List of transliterations grouped by the Arabic name they were derived from: \\
\foreignlanguage{arabic}{محمد}: Muhammed, Muhamed, Muhamad, Mohamad, Mohammed, Mohamed, Muhammad, Mohammad\\
\foreignlanguage{arabic}{أحمد}: Ahmad, Ahmed\\
\foreignlanguage{arabic}{حسام}: Husam, Hussam, Husam, Housam, Houssam\\
\foreignlanguage{arabic}{حميد}: Hamid, Hameed\\
\foreignlanguage{arabic}{محمود}: Mahmood, Mahmud, Mahmod\\

Fig~\ref{fig:21} shows the imput and output of MAN.

\begin{figure}
    \includegraphics[width=\textwidth]{21.png}
    \caption{Mapping transliterations to names using MAN}
    \label{fig:21}
\end{figure}

\subsubsection{Evaluating Google Translate's ability to normalize transliteration variants of Arabic names}
Using MAN, we were able to test Google Translate on our dataset, the same dataset we used to test string matching and phonetic algorithms' ability to normalize person's names. Although Google Translate generated no false positives, Precision is 1.0, it generated enough false negatives to put it behind most phonetic algorithms in Recall. Since we compared all 250 names with each other, the number of true negatives is much higher than the rest of the cases combined (true positive, false positive, and false negative). For such imbalanced cases, we are relying on Balanced Accuracy rather than Accuracy. Nevertheless, Double Metaphone produced the same Accuracy, better Balanced Accuracy, and better F-measure.

Fig~\ref{fig:22} shows the results of comparing Google Translate to the algorithms we tested for normalizing Arabic names transliteration variants.

\begin{figure}
    \includegraphics[width=\textwidth]{22.png}
    \caption{Matching English transliterations of Arabic names using string matching, phonetic algorithms, and Google Translate}
    \label{fig:22}
\end{figure}

\bibliographystyle{plain}
\bibliography{bibliography.bib}
\end{document}

Sam Hall
  • 35
  • 4

0 Answers0