ScriptGuesser

For: Users, Modders, Developers

Documentation – Overview

This page will provide Documentation of certain features for different audiences. You can find the target audience of each article here in the header of each card.

Users – These articles give a high-level overview of features encountered during casual gameplay. These will help you make the most of this website.
Modders – These articles offer high-level documentation for features which can be modded by modifying the JSON files on this site. If you want to learn how to add scripts or change other high-level functionality of this site by forking or cloning the GitHub repository, these articles are for you.
Developers – These articles offer low-level documentation for JavaScript source code and functions used on this site. If you want to modify core functionality of this site by forking or cloning the GitHub repository, these articles are for you. Since these articles will likely take the most time to write, I will add them from time to time when I am free.

For: Modders, Developers

Scripts.json

Scripts.json is a JSON file containing all scripts available on the site located at /static/json/scripts.json. If you are looking to add, remove or modify script data, that can be done by editing this file. Being a JSON object, each script and their properties are described as key-value-pairs. The root object of each script has its ID as key. Additionally, each script object has the following properties:

Key	Value	Type	Example
id	The same ID of the script object used as root key. The ID is the name of the script in all lowercase. If the script is a direct variant of another script, the name of the parent script is added as prefix and separated with an underscore.	String	"lao" "cyrillic_russian"
code	The ISO 639 code of a language that uses the script. This code is used for Google Translate to retrieve a pseudo sentece of the target script.	String	"lo"
label	The stylized name of the script which will be displayed everywhere the script name appears. If the script is a direct variant of another script, the stylized name of the parent script is used and the name of the child script is added behind in parentheses.	String	"Lao" "Cyrillic (Russian)"
keywords	A list of keywords in all lowercase by which the script can be found. This is the list the user input is validated against to check if their input is correct or not. These are primarily endo- and exonymes. Keywords should not compensate for user typos.	Array[String, ...]	["lao", "laotian", "phasa lao", "akson lao"]
countries	A list of ISO 3166-1 alpha-2 (ISO2 in short) codes of countries the script is used in. The script should either be an official script in the country or used by a significant part of the country. Use in small communes or border areas is ignored. For countries for which an official ISO2 code does not exist, an alternative widely used two letter country code should be used like the Kosovo "XK".	Array[String, ...]	["LA"]
regions	[CURRENTLY NOT IN USE]. A list of subregions of countries where the script is used in. This might be used in the future to increase precision. The format used for those subregions will be specified once they are in use. Right now, this property is just a placeholder whose value should be an empty array.	Array[String, ...]	[]
example_text	A sample text using the script to showcase the appearance and characters of it. I translated several quotes from serials or public people depening on the script in order to keep the text length approximately constant. If possible, the sample text should include the characteristic features or characters of the script if those exist. For very similar scripts (or languages even), the same original text (for example quote) should be used so the user can spot the differences.	String	"ໃນທີ່ສຸດ, ການໂຕ້ຖຽງວ່າທ່ານບໍ່ສົນໃຈສິດທິໃນຄວາມເປັນສ່ວນຕົວເພາະວ່າທ່ານບໍ່ມີຫຍັງປິດບັງແມ່ນບໍ່ແຕກຕ່າງຈາກການເວົ້າວ່າທ່ານບໍ່ສົນໃຈກັບຄໍາເວົ້າທີ່ບໍ່ເສຍຄ່າເພາະວ່າທ່ານບໍ່ມີຫຍັງທີ່ຈະເວົ້າ."
description	A description of the script used in the Learning page. I tried to include some basic information or interesting facts about each script I learned from experience or research. However, there is no concrete template or requirements for this property.	String	"The Lao script was adapted from the Khmer script and is thus an Indic script used to write the Lao language and other languages spoken in Laos. It developed alongside the Thai script, both sharing similarities."
tip	One or multiple tips to recognize the script. This can be in any form deemed useful like appearance, mnemonics, specific characters or similar. The tip stems either from experience, research and comparison or sometimes the PlonkIt guide.	String	"Just like Thai, the Lao script is curvy with many small circles in the letters. It tends to have less straight lines and sharp edges than Thai and looks rounder and simpler in general."
confusables	A list of scripts the script can be confused with. It contains arrays of size 2 that have the script ID at position 0 and the Confusability Level at position 1. For more information about the Confusability Levels see their Documentation.	Array[Array[String, Integer], ...]	[["thai", 4], ["khmer", 3]]

For: Users, Modders

Confusability Levels

The confusability level is defined for each script in the scripts.json file. The levels range from 0 to 8 as defined below:

Level	Definition	Usage	Example
0	Not Confusable	The scripts are usually not confusable, however, they may still occasionally be confused by someone not aware of either of the scripts or due to lack of attention. This level may be used in the future, but is not in use currently as scripts which are not confusable are just left out.	The Latin and Japanese scripts are not considered confusable as there are no shared characters or even significant similarities.
1	Very Low	There may be some confusability between the two scripts like some similar or identical characters, but the scripts are generally still very easy to differentiate even by someone who does not know about the details of the scripts as long as they are aware of their existences.	The Latin and Greek scripts are considered confusable to a Very Low degree as there are some shared or similar characters like a/α or B/Β, but the scripts as a whole do not look very similar and are easily recognizable side-by-side.
2	Low	The scripts have a similar look, share a moderate amount of characters or have a shared defining feature (such as the horizontal line in many Devanagari scripts) which can create confusion when learning the scripts. When compared side-by-side, the scripts are easy to differentiate without needing to look for details.	The Hindi and Gujarati scripts are considered confusable to a Low degree as the general appearance of the characters look similar in shape, but Gujarati lacks the obvious horzontal line characteristic for Hindi making recognition easy despite similar shapes.
3	Moderate	The scripts share a defining feature or are very similar in shape or general look. When compared side-by-side, the scripts may be difficult to differentiate for someone who is not aware of specific details of the scripts. However, there exists at least one defining detail which appears in most characters and can be found in almost all texts of decent size which enables a knowing reader to reliably differentiate the scripts.	The Telugu and Kannada scripts are considered confusable to a Moderate degree as their charactes look very similar and their defining feature is less obvious. They can, however, be reliably differentiated by the checkmark-like diacritics in Telugu or the hook-like diacritics in Kannada if the reader knows what to look out for.
4	High	The scripts share a large amount of identical or nearly identical characters, rendering a reader unaware of specific things to look out for almost incapable of reliably differentiating the scripts. There are still about 2-5 script-specific characters which appear in most sentences by which the two scripts can be recognized. While some (especially very short) sentences may not include any recognizable characters making them almost unintelligible, this is generally not the norm.	The Russian and Ukrainian Cyrillic scripts share the majority of all letters, however, the Ukrainian letters Ґ, Є, І and Ї do not appear in the Russian script and are commonly found in almost all sentences, making a reader aware of this fact able to reliably differentiate the scripts as long as they pay attention.
5	Very High	Most characters are shared among both scripts. There are only about one or two unambigious characters which appear at a decent rate and thus appear in many (but not necessarily all) sentences. Certain character combinations such as two times the same latter in a row may be a crucial way of differentiating the both scripts. The scripts are generally not reliably differentiable for a read unaware of the specific details.	The Bengali and Assamese scripts are considered confusable to a Very High extent as the only characters present in Assamese but not Bengali are ৰ and ৱ, which also have similar looking counterparts in Bengali. These characters, especially ৰ, still appear often, allowing reliable recognition.
6	Extremely High	As with the Very High level, there are only about one or two unambigious characters, the difference being, that those characters appear rarely even in texts of moderate size. Alternatively, certain character combinations may be the only reliable way to differentiate both scripts. While being able to read the script can lower the confusability by a lot, it is not the only way to differentiate them reliably.	The Hindi and Marathi Devanagari scripts are considered confusable to a Extremely High extent as the scripts are identical to the most part except for the character ळ which can sometimes be found in Marathi, but is kind of rare and very similar to क which also appears in Hindi, making reliable recognition very difficult as the text gets shorter.
7	Unintelligible Unless Spoken	The scripts are generally considered unintelligible unless the reader is able to read one (or likely both) of the scripts and can reliably recognize the language that uses it either by certain words, word combinations or general sound. The way of differntiating the scripts is less about the scripts and more about the languages behind it by now.
8	Unintelligible	The scripts are generally considered unintelligible. At this point, we should probably think about whether there is even valid reason to not consider the scripts the same one. As with level 0, there is no use for this level at this point, but there may be some purpose in the future.	There are no scripts considered Unintelligible by me on this site for now and there are not even examples outside of this site I can think of.

About ScriptGuesser – Documentation

Documentation – Overview

Scripts.json

Confusability Levels