Sumit Gulwani
Hasil untuk "Romanic languages"
Menampilkan 20 dari ~3336057 hasil · dari CrossRef, arXiv, DOAJ, Semantic Scholar
F. Jouault, I. Kurtev
Bharathi Raja Chakravarthi, Vigneshwaran Muralidaran, R. Priyadharshini et al.
Understanding the sentiment of a comment from a video or an image is an essential task in many applications. Sentiment analysis of a text can be useful for various decision-making processes. One such application is to analyse the popular sentiments of videos on social media based on viewer comments. However, comments from social media do not follow strict rules of grammar, and they contain mixing of more than one language, often written in non-native scripts. Non-availability of annotated code-mixed data for a low-resourced language like Tamil also adds difficulty to this problem. To overcome this, we created a gold standard Tamil-English code-switched, sentiment-annotated corpus containing 15,744 comment posts from YouTube. In this paper, we describe the process of creating the corpus and assigning polarities. We present inter-annotator agreement and show the results of sentiment analysis trained on this corpus as a benchmark.
Sangam Balchandar Reddy, Arun Kumar Das, Anjeneya Swami Kare et al.
A Roman dominating function of a graph $G=(V,E)$ is a labeling $f: V \rightarrow{} \{0 ,1, 2\}$ such that for each vertex $u \in V$ with $f(u) = 0$, there exists a vertex $v \in N(u)$ with $f(v) =2$. A Roman dominating function $f$ is a global Roman dominating function if it is a Roman dominating function for both $G$ and its complement $\overline{G}$. The weight of $f$ is the sum of $f(u)$ over all the vertices $u \in V$. The objective of Global Roman Domination problem is to find a global Roman dominating function with minimum weight. The objective of Global Roman Domination is to compute a global Roman dominating function of minimum weight. In this paper, we study the algorithmic aspects of Global Roman Domination problem on various graph classes and obtain the following results. 1. We prove that Roman domination and Global Roman Domination problems are not computationally equivalent by identifying graph classes on which one is linear-time solvable, while the other is NP-complete. 2. We show that Global Roman Domination problem is NP-complete on split graphs, thereby resolving an open question posed by Panda and Goyal [Discrete Applied Mathematics, 2023]. 3. We prove that Global Roman Domination problem is NP-complete on chordal bipartite graphs, planar bipartite graphs with maximum degree five and circle graphs. 4. On the positive side, we present a linear-time algorithm for Global Roman domination problem on cographs.
Kanchon Gharami, Quazi Sarwar Muhtaseem, Deepti Gupta et al.
The development of robust transliteration techniques to enhance the effectiveness of transforming Romanized scripts into native scripts is crucial for Natural Language Processing tasks, including sentiment analysis, speech recognition, information retrieval, and intelligent personal assistants. Despite significant advancements, state-of-the-art multilingual models still face challenges in handling Romanized script, where the Roman alphabet is adopted to represent the phonetic structure of diverse languages. Within the South Asian context, where the use of Romanized script for Indo-Aryan languages is widespread across social media and digital communication platforms, such usage continues to pose significant challenges for cutting-edge multilingual models. While a limited number of transliteration datasets and models are available for Indo-Aryan languages, they generally lack sufficient diversity in pronunciation and spelling variations, adequate code-mixed data for large language model (LLM) training, and low-resource adaptation. To address this research gap, we introduce a novel transliteration dataset for two popular Indo-Aryan languages, Hindi and Bengali, which are ranked as the 3rd and 7th most spoken languages worldwide. Our dataset comprises nearly 1.8 million Hindi and 1 million Bengali transliteration pairs. In addition to that, we pre-train a custom multilingual seq2seq LLM based on Marian architecture using the developed dataset. Experimental results demonstrate significant improvements compared to existing relevant models in terms of BLEU and CER metrics.
L. C. Gilbert
This bachelor's thesis examines the capabilities of ChatGPT 4 in code generation across 19 programming languages. The study analyzed solution rates across three difficulty levels, types of errors encountered, and code quality in terms of runtime and memory efficiency through a quantitative experiment. A total of 188 programming problems were selected from the LeetCode platform, and ChatGPT 4 was given three attempts to produce a correct solution with feedback. ChatGPT 4 successfully solved 39.67% of all tasks, with success rates decreasing significantly as problem complexity increased. Notably, the model faced considerable challenges with hard problems across all languages. ChatGPT 4 demonstrated higher competence in widely used languages, likely due to a larger volume and higher quality of training data. The solution rates also revealed a preference for languages with low abstraction levels and static typing. For popular languages, the most frequent error was "Wrong Answer," whereas for less popular languages, compiler and runtime errors prevailed, suggesting frequent misunderstandings and confusion regarding the structural characteristics of these languages. The model exhibited above-average runtime efficiency in all programming languages, showing a tendency toward statically typed and low-abstraction languages. Memory efficiency results varied significantly, with above-average performance in 14 languages and below-average performance in five languages. A slight preference for low-abstraction languages and a leaning toward dynamically typed languages in terms of memory efficiency were observed. Future research should include a larger number of tasks, iterations, and less popular languages. Additionally, ChatGPT 4's abilities in code interpretation and summarization, debugging, and the development of complex, practical code could be analyzed further. ---- Diese Bachelorarbeit untersucht die Fähigkeiten von ChatGPT 4 zur Code-Generierung in 19 Programmiersprachen. Betrachtet wurden die Lösungsraten zwischen drei Schwierigkeitsgraden, die aufgetretenen Fehlerarten und die Qualität des Codes hinsichtlich der Laufzeit- und Speichereffizienz in einem quantitativen Experiment. Dabei wurden 188 Programmierprobleme der Plattform LeetCode entnommen, wobei ChatGPT 4 jeweils drei Versuche hatte, mittels Feedback eine korrekte Lösung zu generieren. ChatGPT 4 löste 39,67 % aller Aufgaben erfolgreich, wobei die Erfolgsrate mit zunehmendem Schwierigkeitsgrad deutlich abnahm und bei komplexen Problemen in allen Sprachen signifikante Schwierigkeiten auftraten. Das Modell zeigte eine höhere Kompetenz in weit verbreiteten Sprachen, was wahrscheinlich auf eine größere Menge und höhere Qualität der Trainingsdaten zurückzuführen ist. Bezüglich der Lösungsraten zeigte das Modell zudem eine Präferenz für Sprachen mit niedrigem Abstraktionsniveau und statischer Typisierung. Bei Sprachen hoher Popularität trat der Fehler Wrong Answer am häufigsten auf, während bei weniger populären Sprachen Compiler- und Laufzeitfehler überwogen, was auf häufige Missverständnisse und Verwechslungen bezüglich der spezifischen strukturellen Eigenschaften dieser Sprachen zurückzuführen ist. ChatGPT 4 demonstrierte in allen Programmiersprachen eine überdurchschnittliche Laufzeiteffizienz und tendierte diesbezüglich erneut zu statisch typisierten und niedrig abstrahierten Sprachen. Die Werte zur Speichereffizienz variierten erheblich, wobei in 14 Sprachen überdurchschnittliche und in fünf Sprachen unterdurchschnittliche Werte erzielt wurden. Es zeigte sich diesbezüglich eine leichte Tendenz zugunsten von niedrig abstrahierten sowie eine Präferenz zu dynamisch typisierten Sprachen. Zukünftige Forschung sollte eine höhere Anzahl an Aufgaben, Iterationen und unpopulären Sprachen einbeziehen. Darüber hinaus könnten die Fähigkeiten von ChatGPT 4 in der Code-Interpretation und -Zusammenfassung, im Debugging und in der Entwicklung komplexer, praxisbezogener Codes analysiert werden.
Anastasia Mavridou, Marie Farrell, Gricel Vázquez et al.
Integrating autonomous and adaptive behavior into software-intensive systems presents significant challenges for software development, as uncertainties in the environment or decision-making processes must be explicitly captured. These challenges are amplified in safety- and mission-critical systems, which must undergo rigorous scrutiny during design and development. Key among these challenges is the difficulty of specifying requirements that use probabilistic constructs to capture the uncertainty affecting these systems. To enable formal analysis, such requirements must be expressed in precise mathematical notations such as probabilistic logics. However, expecting developers to write requirements directly in complex formalisms is unrealistic and highly error-prone. We extend the structured natural language used by NASA's Formal Requirement Elicitation Tool (FRET) with support for the specification of unambiguous and correct probabilistic requirements, and develop an automated approach for translating these requirements into logical formulas. We propose and develop a formal, compositional, and automated approach for translating structured natural-language requirements into formulas in probabilistic temporal logic. To increase trust in our formalizations, we provide assurance that the generated formulas are well-formed and conform to the intended semantics through an automated validation framework and a formal proof. The extended FRET tool enables developers to specify probabilistic requirements in structured natural language, and to automatically translate them into probabilistic temporal logic, making the formal analysis of autonomous and adaptive systems more practical and less error-prone.
Lauren Aldoroty, Lei Hu, Rob Knop et al.
NASA's Nancy Grace Roman Space Telescope (Roman) will provide an opportunity to study dark energy with unprecedented precision using several techniques, including measurements of Type Ia Supernovae (SNe Ia). Here, we present `phrosty` (PHotometry for ROman with SFFT for tYpe Ia supernovae): a difference imaging pipeline for measuring the brightness of transient point sources in the sky, primarily SNe Ia, using Roman data. `phrosty` is written in Python. We implement a GPU-accelerated version of the Saccadic Fast Fourier Transform (SFFT) method for difference imaging.
Katarzyna Maniowska
Quanto pesa una virgola nel testo? Quanto vale l’aggiunta o l’omissione della virgola nella traduzione? Un diverso uso della punteggiatura può comportare differenze semantiche? La lettura parallela del testo della Costituzione della Repubblica Italiana e della sua traduzione polacca ci offrirà il punto di partenza per ulteriori riflessioni sul significato di questo elemento tanto labile. L’analisi comparativa verrà preceduta da brevi cenni sullo sviluppo dell’interpunzione in italiano e in polacco, nonché da osservazioni sugli usi diversi della punteggiatura in entrambe le lingue. Dal momento che l’interpunzione italiana è di tipo prosodico-sintattica, mentre il polacco opta per l’interpunzione basata su rigide regole sintattiche, quest’incongruenza dei sistemi di interpunzione è rilevante anche dal punto di vista traduttivo. Infatti, una corretta trasposizione del senso viene effettuata anche attraverso la punteggiatura. La Costituzione della Repubblica Italiana appartiene alla categoria dei testi vincolanti, perciò richiede dal traduttore la massima rigidità interpretativa. Si ipotizza che, nonostante diversi criteri di applicazione della virgola in entrambe le lingue, sia il testo originale che il testo tradotto esprimano lo stesso significato senza maggiori deviazioni di senso. Attraverso esempi tratti dal prototesto si individueranno possibili punti di incontro nel sistema di interpunzione italiano e polacco, nondimeno saranno evidenziate le differenze inconciliabili che derivano dalla diversa concezione della punteggiatura in ambedue le lingue.
Daniel S. Katz, Jeffrey C. Carver
This is a virtual dialog between Jeffrey C. Carver and Daniel S. Katz on how people learn programming languages. It's based on a talk Jeff gave at the first US-RSE Conference (US-RSE'23), which led Dan to think about human languages versus computer languages. Dan discussed this with Jeff at the conference, and this discussion continued asynchronous, with this column being a record of the discussion.
Lance Calvin Lim Gamboa, Mark Lee
Bias studies on multilingual models confirm the presence of gender-related stereotypes in masked models processing languages with high NLP resources. We expand on this line of research by introducing Filipino CrowS-Pairs and Filipino WinoQueer: benchmarks that assess both sexist and anti-queer biases in pretrained language models (PLMs) handling texts in Filipino, a low-resource language from the Philippines. The benchmarks consist of 7,074 new challenge pairs resulting from our cultural adaptation of English bias evaluation datasets, a process that we document in detail to guide similar forthcoming efforts. We apply the Filipino benchmarks on masked and causal multilingual models, including those pretrained on Southeast Asian data, and find that they contain considerable amounts of bias. We also find that for multilingual models, the extent of bias learned for a particular language is influenced by how much pretraining data in that language a model was exposed to. Our benchmarks and insights can serve as a foundation for future work analyzing and mitigating bias in multilingual models.
Camila Flores Salvo, Matías Jaque Hidalgo
El presente artículo aborda la relación entre la negación y las perífrasis modales deónticas del español deber + infinitivo y tener que + infinitivo, que difieren en la relación de alcance que establecen con la negación: mientras deber posee un alcance amplio, tener que exhibe preferentemente un alcance estrecho. Nuestro principal objetivo es establecer las propiedades específicas que motivarían la conducta de deber como término de polaridad positiva (TPP). Proponemos que la noción de “obligación” se deriva, siguiendo a Ramchand (2018), de dos operaciones semánticas distintas: tener que expresaría una selección exhaustiva de alternativas; en cambio, deber expresaría una selección exclusiva, por la cual se selecciona la situación más altamente valuada dado un orden de preferencia de base deóntica. El trabajo muestra que una de las principales consecuencias de esta diferencia es la distribución de las lecturas implicativas en indefinido, obligatoria para tener que.
Garrison Koch, Nathan Shank
Given a graph $G=(V,E)$, the dominating number of a graph is the minimum size of a vertex set, $V' \subseteq V$, so that every vertex in the graph is either in $V'$ or is adjacent to a vertex in $V'$. A Roman Dominating function of $G$ is defined as $f:V \rightarrow \{0,1,2\}$ such that every vertex with a label of 0 in $G$ is adjacent to a vertex with a label of 2. The Roman Dominating number of a graph is the minimum total weight over all possible Roman Dominating functions. We consider the $k$-attack Roman Domination, particularly focusing on 2-attack Roman Domination. A Roman Dominating function of $G$ is a $k$-attack Roman Dominating function of $G$ if for all $j\leq k$, any subset $S$ of $j$ vertices all with label 0 must have at least $j$ vertices with label 2 in the open neighborhood of $S$. The $k$-attack Roman Dominating number of $G, \gkaRD{G}$, is the minimum total weight over all possible $k$-attack Roman Dominating functions. We find $\gtaRD{G}$ for particular graph class, discuss properties of $k$-attack Roman Domination, and make several connections with other domination ideas.
Albert Conesa Bausà
El text recull el comentari de les resolucions del Tribunal Suprem que afecten els usos, els drets lingüístics i el règim jurídic de la llengua durant el primer semestre de 2023.
Enisa Romanic
Khang Nhut Lam, Feras Al Tarouti, Jugal Kalita
This paper examines approaches to generate lexical resources for endangered languages. Our algorithms construct bilingual dictionaries and multilingual thesauruses using public Wordnets and a machine translator (MT). Since our work relies on only one bilingual dictionary between an endangered language and an "intermediate helper" language, it is applicable to languages that lack many existing resources.
Manolis Fragkiadakis, Peter van der Putten
Sign language lexica are a useful resource for researchers and people learning sign languages. Current implementations allow a user to search a sign either by its gloss or by selecting its primary features such as handshape and location. This study focuses on exploring a reverse search functionality where a user can sign a query sign in front of a webcam and retrieve a set of matching signs. By extracting different body joints combinations (upper body, dominant hand's arm and wrist) using the pose estimation framework OpenPose, we compare four techniques (PCA, UMAP, DTW and Euclidean distance) as distance metrics between 20 query signs, each performed by eight participants on a 1200 sign lexicon. The results show that UMAP and DTW can predict a matching sign with an 80\% and 71\% accuracy respectively at the top-20 retrieved signs using the movement of the dominant hand arm. Using DTW and adding more sign instances from other participants in the lexicon, the accuracy can be raised to 90\% at the top-10 ranking. Our results suggest that our methodology can be used with no training in any sign language lexicon regardless of its size.
Doost Ali Mojdeh, Iman Masoumi, Lutz Volkmann
For a graph G=(V,E), a restrained double Roman dominating function is a function f:V\rightarrow\{0,1,2,3\} having the property that if f(v)=0, then the vertex v must have at least two neighbors assigned 2 under f or one neighbor w with f(w)=3, and if f(v)=1, then the vertex v must have at least one neighbor w with f(w)\geq2, and at the same time, the subgraph G[V_0] which includes vertices with zero labels has no isolated vertex. The weight of a restrained double Roman dominating function f is the sum f(V)=\sum_{v\in V}f(v), and the minimum weight of a restrained double Roman dominating function on G is the restrained double Roman domination number of G. We initiate the study of restrained double Roman domination with proving that the problem of computing this parameter is NP-hard. Then we present an upper bound on the restrained double Roman domination number of a connected graph G in terms of the order of G and characterize the graphs attaining this bound. We study the restrained double Roman domination versus the restrained Roman domination. Finally, we characterized all trees T attaining the exhibited bound.
H Muhammad Shakeel, Rashid Khan, Muhammad Waheed
Now a day computer is necessary for human being and it is very useful in many fields like search engine, text processing, short messaging services, voice chatting and text recognition. Since last many years there are many tools and techniques that have been developed to support the writing of language script. Most of the Asian languages like Arabic, Urdu, Persian, Chains and Korean are written in Roman alphabets. Roman alphabets are the most commonly used for transliteration of languages, which have non-Latin scripts. For writing Urdu characters as an input, there are many layouts which are already exist. Mostly Urdu speaker prefer to use Roman-Urdu for different applications, because mostly user is not familiar with Urdu language keyboard. The objective of this work is to improve the context base transliteration of Roman-Urdu to Urdu script. In this paper, we propose an algorithm which effectively solve the transliteration issues. The algorithm work like, convert the encoding roman words into the words in the standard Urdu script and match it with the lexicon. If match found, then display the word in the text editor. The highest frequency words are displayed if more than one match found in the lexicon. Display the first encoded and converted instance and set it to the default if there is not a single instance of the match is found and then adjust the given ambiguous word to their desire location according to their context. The outcome of this algorithm proved the efficiency and significance as compare to other models and algorithms which work for transliteration of Raman-Urdu to Urdu on context.
Somnath Banerjee, Maulindu Sarkar, Nancy Agrawal et al.
Hate speech is considered to be one of the major issues currently plaguing online social media. Repeated and repetitive exposure to hate speech has been shown to create physiological effects on the target users. Thus, hate speech, in all its forms, should be addressed on these platforms in order to maintain good health. In this paper, we explored several Transformer based machine learning models for the detection of hate speech and offensive content in English and Indo-Aryan languages at FIRE 2021. We explore several models such as mBERT, XLMR-large, XLMR-base by team name "Super Mario". Our models came 2nd position in Code-Mixed Data set (Macro F1: 0.7107), 2nd position in Hindi two-class classification(Macro F1: 0.7797), 4th in English four-class category (Macro F1: 0.8006) and 12th in English two-class category (Macro F1: 0.6447).
Halaman 9 dari 166803