Omegat merge segments

3/1/2023

In principle, it should be the most basic phrase-level unit of text that can be correctly understood without relying on any grammatical modifiers that may happen to precede or follow the segment. i General Principle:Ī “segment” of the text constitutes what can be understood as one complete thought. However, if we can create some general principles and guidelines that can be agreed upon, then our TM resources will be much more useful, both for translators retrieving their own past translations for consistency and recall, as well allowing TMs to be archived and shared between translators. If the rules are too rigidly defined, there will inevitably be scenarios where too many of the TM segments will be too long to be of any use, and this is particularly the case with classical Tibetan texts that notoriously use frequent run-on phrases.

After surveying many different scenarios, source-genres, and translation styles, we have determined that it is necessary to leave these rules somewhat flexible. When creating the segmentation, it is expected that there will be some degree of subjectivity onpart of the TM editor defining the length, start, and end of each segment. The examples here all use English as the target language, although since these standards focus on the Tibetan grammar, it is hoped that this methodology may be adapted to be paired with any target-language. The following is a set of recommended standards for segmenting translation memories according to what can be loosely be understood as the “sentence” or “complete thought” found in the source-Tibetan. These standards are summarized in a cheatsheet at the end.

Once you have become familiar with the interface, please read through and use the following guidelines while you are editing the alignment of the texts. For the Tibetan, set both “Paragraphs” and “Sentences” to be separated by “line breaks”. txt files use the following dialog settings: for the English, set the “Paragraphs” to be separated by “line breaks” and the “Sentences” to be “automatically segment text using profile: default”. The two texts have been prepared with a script (for anyone applying this methodology to another TM project, documentation for these scripts may be found here.)Īs mentioned in the tutorial, when you upload the. txt files that you will be aligning from InterText. I have made the following screencast that should show you everything you need to know:Īdditionally, there is an online PDF guide for using InterText, although the guide contains a lot of documentation that is not necessary to read through, to simply work from the app you should just need to read part II, chapters 7–9 here.įor 84000’s TM project, we will provide you with the two. You may download InterText here it is very light weight and it can be run locally on Windows, Mac OS, or Linux. To segment the texts we are using a convenient open source application called InterText. Instructions for Aligning TMs from Pre-Segmented Text Files Using InterText: Words or Phrases Omitted or Added within the English Translationġ. Changing the Sentence Order in the EnglishĢ. Where to Break (Making Additional Breaks Missed by the Script)ġ. Where to Merge (Correcting Breaks Made by the Script)ģ. Pre-segmentation Performed by the Pybo-ScriptĢ. Segmenting from the Perspective of the Tibetan’s Own Grammarġ. Placing Segment Breaks According to Inflected VerbsĢ. Instructions for Aligning TMs from Pre-Segmented Text Files Using InterText:ġ.

0 Comments

Omegat merge segments

Leave a Reply.

Author

Archives

Categories