/***********************************************************************/ /***********************************************************************/ /* */ /* Tibetan Transcript Translator */ /* */ /***********************************************************************/ /***********************************************************************/ /* (C) Dec 1996 / Jan, Feb 1997 by Beat Steiner, Brunngasse 12, 3011 Bern, Switzerland e-mail: Beat.Steiner@gseved.admin.ch Version 0.1 alpha prerelease LEGAL STUFF ----------- Refer to GNU General Public License (feel free to use and distribute this program at your own risk but at no cost, keeping copyright notices intact). DESCRIPTION ----------- This program is intended to convert different transcript standards of the Tibetan language or even to perform a transcript import into a document processing system. At present, it is able to convert transcripts to LaTeX, based on Jeff sparkes' and / or Sam Sirlin's fonts and ideas. Since I got into trouble to reverse engineer Sparkes' code, I have rewriten the complete transcript translator, only picking some ideas from his code. The main additional feature is fully automatic ligature generation, openness towards different transcript standards and inline LaTeX command support. The main disadvantages are: ** Sparkes' and Sirlin's language mix handling is not supported yet. ** vovels on tsa, tsha, dza (resp. TZA, TSA, DZA) do not look nice. WARNING: the ttd file format is still subject to change. If you create your own ttd files now, you risk to spend much work on updating them. INSTALLING ---------- Install Sam Sirlin's fonts from the CTAN archive tex/language/tibetan/sirlin to the appropriate directory or set the Tex font path. Do the same with the tfm files. Look at the README file for help on this. If you dont have metafont, use Jeff Sparkes' pk and tfm files. This is a single file program which can be compiled without complication by gcc ttt.c -o ttt or cc ttt.c -o ttt or whatever ANSI compliant C compiler you have. STARTING THIS PROGRAM --------------------- In order to translate ACIP to LaTeX, use ttt acip.ttd latex.ttd input.tib output.tex where acip.ttd is the input transcript definition file . latex.ttd is the output transcript definition file . input.tib is your ACIP input text . output.tex is the output file to be processed by latex On Linux, you will enjoy the feature of magic line support: In your input.tib file, add the following two lines at the VERY TOP (replacing the respective filenames): #!/bin/sh ttt acip.ttd latex.ttd $0 output.tex; latex output; exit Due to the magic line at the top, (ba)sh will think that your text input.tib is a shell script, executing the 2nd line. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! This program will ignore the FIRST AND THE SECOND line if the FIRST line starts with #! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Do not put anything, even not comments above these lines. $0 is automatically replaced by the file name (e.g. input.tib). Then turn your input.tib file into an executable by chmod 755 input.tib Now, you can start this program by entering input.tib at the shell command prompt. DOS users, please write an apropriate *.BAT file (without $0). Windows users may set the command line options as program properties. RETURN VALUES ------------- 0: Normal program completion. 1: x___error occurred (unknown input key, missing output ligature . and such) 2: Could not open a file or bad number of command line arguments 3: Static memory allocation overrun IMPLEMENTATION NOTICE --------------------- This kind of implementation was motivated by the need of importing texts typed at Sera Monastery into the ClearLook document processing system and solving other conversion problems like differences between ACIP and Wiley transcripts plus the need to import both of them into LaTeX, ClearLook, Lout and other programs. Therefore, a very flexible solution based on Tibetan Transcript Definition (ttd) files has been chosen. They map the different transcribed letters, control sequences and ligatures (btags pa) to common keys. example: ACIP --> LaTeX conversion for the letter 'tsa the acip.ttd file says: c___tsa TZ which associates the key 'tsa' to the value 'TZ' the tex.ttd file says: c___tsa \tibetan\char16 which associates the key 'tsa' to the tex font chracter. The magic characters before the transcript key have the following meaning: 1st char: c: normal consonant p: potential prefix (negative bias for ligature generation) b: potential btagspa (simplified lower part of ligatureand ha, positive bias for ligature generation) .: punctuation (Terminates syllable and suppresses tsheg) x: program control 2nd char and 3rd char: reserved for generic ligature generation (join styles, character substitution) 4th char: _: just to increase readability of ttd file Meaning of x___ keys: General controls: x___preamble Will be at top of output without being asked for x___postamble Will be at bottom of output without being asked for x___error Error message included to output file Generic ligatures: x___inner_box{ Encapsulation of individual characters (no vovels) x___inner_box} x___outer_box{ Encapsulation of whole ligature without vovel x___outer_box} Mixed language handling (subject to change, not fully implemented yet) x___transp_word Input will be put out without modification til space x___transp_char Unmodified character passing x___transp_reg{ Unmodified region passing start x___transp_reg} end x___transp_reg~ toggle WARNING: The values of the input ttd and the keys of the output ttd file MUST be unique, otherwise the result is undefined. The input stream is scanned for the largest matched value in the input ttd file, giving the common key. A lookup in the output ttd file generates the output stream. The ligature generation is a special issue documented at the rather complicated if construct in the main function. To Do: ------ ** Better char comparison in ttread ** Write documentation ** allow comments in ttd files ** Warn on ambiguous transcript (gyin, gyas, ...) ** Full support of Sirlin and Sparkes ** Mixed language support Two flags allow mixing Tibetan, other languages and control sequences: 'active' says wether or not the lookup and ligature generation . take place or not. If not, 'transparent' says wether the input stream is passed unmodified to . the output stream or simply be ignored (crunched). ** Clean escape char handling for binary output ** More intelligence for Generic Ligatures ** Font: Character fragments for generic ligatures ** Vertically stretched (50% or 75%) chars for generic ligatures ** Mirrored gigu ALMOST DONE ----------- ** Clean up and update comments and program style ** Read filenames from command line . (unflexible, no default handling, magic lines for linux only) ** Vovel support for generic ligatures (still a LaTeX problem to solve) ** (Only) two vovels on a character supported PERFORMANCE ----------- The actual lookup process is performed by a binary search although a hash table lookup is expected to be about 5 times faster. This is just caused by my not being that much familiar with C++ and g++. Furtheron, bsearch is better documented and available on more compilers. This program needs less than 1 second to produce 6 a4paper output pages on my 5x86 machine. So there are more important problems to solve now. */ /***********************************************************************/ /* May this program contribute to the */ /* Dharma practice of many people */ /***********************************************************************/ /******************************/ /******************************/ /* */ /* Includes */ /* */ /******************************/ /******************************/ #include #include /* contians qsort and bsearch */ #include /******************************/ /******************************/ /* */ /* Static Memory Allocations */ /* */ /******************************/ /******************************/ const int MAX_SYL_SIZE_LINE = __LINE__ + 1; #define MAX_SYL_SIZE 50 /* maximum characters per syllable (ligature complexity) very cheap and should be enough */ const int MAX_KEY_VAL_LENGTH_LINE = __LINE__ + 1; #define MAX_KEY_VAL_LENGTH 44 /* maximum ttd key/value length critical for output preamble hint: try to include the preamble */ const int TTSIZE_LINE = __LINE__ + 1; #define TTSIZE 300 /* transcript table size critical for predefined ligatures keep in mind that you add an appropriate command margin to the number of characters in your font. */ const int INPUT_LINEBUFFER_LENGTH_LINE = __LINE__ + 1; #define INPUT_LINEBUFFER_LENGTH 180 /* Maximum line length of input text file */ /******************************/ /******************************/ /* */ /* Global stuff */ /* */ /******************************/ /******************************/ typedef struct { char key[MAX_KEY_VAL_LENGTH]; char val[MAX_KEY_VAL_LENGTH]; } ttentry; char ESCAPE_CHAR='\xff'; int KEY_LEN[MAX_KEY_VAL_LENGTH]; int X_ERROR_OCCURRED = 0; static FILE *OUTTEXT; /******************************/ /******************************/ /* */ /* Functions */ /* */ /******************************/ /******************************/ void trace(char *message) { /* if (message == NULL) { fprintf(stderr, ""); fflush(stderr); }else{ fprintf(stderr, "<%s>", message); fflush(stderr); } */ } /******************************/ /* clean interface for strncmp */ int ttcmp(const void *first, const void *second) { return strncmp(((ttentry*)first)->key, ((ttentry*)second)->key, MAX_KEY_VAL_LENGTH); } /******************************/ void ttprepare(ttentry ttable[]) { qsort(&ttable[0], TTSIZE, sizeof(ttentry), ttcmp); } /******************************/ char *ttlookup(const char *key, const ttentry ttable[]) { ttentry *found=NULL; if (key) { found=(ttentry*)bsearch(key, ttable, TTSIZE, sizeof(ttentry), ttcmp); if (found) { return found->val; }else{ return NULL; } } } /******************************/ void ttread(const char *name, int swap, ttentry ttable[]) { /* This procedure reads the transcript definition files. The first column MUST be strictly left aligned. 'name' is the filenamae 'swap' is a flag which swaps the key/value reading order: file contents for swap=0: file contents for swap=1: 'ttable' is the transcript table where the file is read into. 'TTSIZE' is allocated statically and needed to avoid an index overflow */ char c = ' '; /* not EOF */ char *cp = NULL; char *cp_start = NULL; /* Needed for ptr range check */ FILE *ttd = NULL; int idx = 0; void CheckCP(void) { if ((cp - cp_start) >= (MAX_KEY_VAL_LENGTH - 1)) { fprintf(stderr, "Insufficient static memory allocation. "); fprintf(stderr, "Please increase MAX_KEY_VAL_LENGTH in "); fprintf(stderr, "line %d of %s and compile again.\n", MAX_KEY_VAL_LENGTH_LINE, __FILE__); exit(3); } } /* Clear array storing input ttd key length flags for boosting performance in charparse() */ for (idx = 0; idx < MAX_KEY_VAL_LENGTH; idx++) { KEY_LEN[idx] = 0; } idx = 0; /* Clear transcript table struct arrays */ for (idx = 0; idx < TTSIZE; idx++) { *ttable[idx].key = '\0'; *ttable[idx].val = '\0'; } idx = 0; if (!(ttd = fopen(name, "r") )) { fprintf(stderr, "%s %s\n", "could not open ttd file ", name); exit(2); } /* ev. better with strpbrk() */ while (c != EOF && idx < TTSIZE) { /* Read ttd keys/values */ if (swap) cp = ttable[idx].val; else cp = ttable[idx].key; cp_start = cp; c = fgetc(ttd); while ( c != ' ' && c != '\t' && c != '\n' && c != EOF) { *cp++ = c; CheckCP(); c=fgetc(ttd); } *cp++ = '\0'; CheckCP(); /* Eliminate separtor garbage */ while ( c == ' ' || c == '\t' ) { c = fgetc(ttd); } /* Read ttd values/keys */ if (swap) cp = ttable[idx].key; else cp = ttable[idx].val; cp_start = cp; while ( c != '\t' && c != '\n' && c != EOF ) /* ev. add comment char and spaces */ { *cp++ = c; CheckCP(); c = fgetc(ttd); } *cp++ = '\0'; CheckCP(); /* Eliminate trailing garbage */ while ( c != '\n' && c != EOF ) { c = fgetc(ttd); } /* mark existing key lengths */ KEY_LEN[strlen(ttable[idx].key)] = !0; idx++; } if (idx >= TTSIZE) { fprintf(stderr, "Insufficient memory allocation. Please increase"); fprintf(stderr, " TTSIZE in line %d of %s and compile again.\n", TTSIZE_LINE, __FILE__); exit(3); } } /******************************/ void prepare_flags(ttentry inptrans[], ttentry outtrans[], int *active, int *transparent) { char *charptr; /* Set activity flag according to INPUT ttd */ charptr=ttlookup("x___init_active", inptrans); if (charptr) { if ( strcasecmp("yes", charptr) == 0 ) { *active = 1; }else{ *active = 0; } } /* Set transparency flag according to OUTPUT ttd */ charptr=ttlookup("x___init_transprent", outtrans); if (charptr) { if ( strcasecmp("yes", charptr) == 0 ) { *transparent = 1; }else{ *transparent = 0; } } /* Set escape character In this release, escapechar is only accepted if it is the first char in the output ttd */ charptr=ttlookup("x___escape_char", outtrans); if (charptr) { ESCAPE_CHAR = *charptr; } } /******************************/ char *charparse(const char *input, int *parsestep, const ttentry *inptrans) { char yigcpy[MAX_KEY_VAL_LENGTH+1]; char *charptr=NULL; int parselen; int yiglen; strncpy(yigcpy, input, MAX_KEY_VAL_LENGTH); parselen=strlen(input); if (parselen > MAX_KEY_VAL_LENGTH) parselen=MAX_KEY_VAL_LENGTH; yiglen=parselen; while (yiglen > 0 && charptr == NULL) { yigcpy[yiglen] = '\0'; if (KEY_LEN[yiglen]) charptr = ttlookup(yigcpy, inptrans); yiglen--; } *parsestep=yiglen+1; return charptr; } /******************************/ void yigprint(char *yig, ttentry *outtrans) { char *charptr=NULL; if (yig != NULL) { charptr=ttlookup(yig, outtrans); if (charptr) { /* DIRTY: detects escape char only if it is the first one. */ if (*charptr != ESCAPE_CHAR) { if (*charptr != '\0') fprintf(OUTTEXT, "%s", charptr); /* fflush(OUTTEXT); */ }else{ charptr++; /* skip the escape char itself */ fprintf(OUTTEXT, "%c", (char)strtol(charptr, NULL, 0)); /* fflush(OUTTEXT); */ } }else{ /* no output ttd match */ X_ERROR_OCCURRED = !0; fprintf(OUTTEXT, "%s", ttlookup("x___error", outtrans)); fflush(OUTTEXT); trace(yig); } }else{ /* trace("gugus"); */ } } /******************************/ int vovel_left(char *syl[MAX_SYL_SIZE], int start, int syllength) { int i; int result = 0; for (i=start; i= (INPUT_LINEBUFFER_LENGTH-1)) { fprintf(stderr, "Input file contains too long lines. "); fprintf(stderr, "Please keep them shorter than %d ", INPUT_LINEBUFFER_LENGTH); fprintf(stderr, "characters or increase "); fprintf(stderr, "INPUT_LINEBUFFER_LENGTH in line %d of %s", INPUT_LINEBUFFER_LENGTH_LINE, __FILE__); fprintf(stderr, " and compile again.\n"); exit(3); } if (skipline) { *linebuf = '\0'; skipline = 0; } /* Filter out magic lines for Linux */ parseptr = linebuf + 1; if (firstline && (*linebuf == '#') && (*parseptr == '!')) { *linebuf = '\0'; skipline = !0; } firstline = 0; parseptr = linebuf; /* parse until end of line */ while (*parseptr != '\0' && *parseptr != '\n' && *parseptr != '\13') { yig = charparse(parseptr, &parsestep, inptrans); if (yig != NULL) { if (*yig == 'x') trace(yig); if (strncmp(yig, "x___active{", MAX_KEY_VAL_LENGTH) == 0) { parseptr += parsestep; active = !0; if (ttlookup("x___active{", outtrans)) { yigprint("x___active{", outtrans); } } if (strncmp(yig, "x___active}", MAX_KEY_VAL_LENGTH) == 0) { parseptr += parsestep; active = 0; if (ttlookup("x___active}", outtrans)) { yigprint("x___active}", outtrans); } } if (strncmp(yig, "x___active~", MAX_KEY_VAL_LENGTH) == 0) { parseptr += parsestep; trace("active toggle"); active = !active; if (active) { if (ttlookup("x___active{", outtrans)) { yigprint("x___active{", outtrans); } }else{ if (ttlookup("x___active}", outtrans)) { yigprint("x___active}", outtrans); } } } } if (active) { /*** read one syllable ***/ sylindex = 0; do{ yig = charparse(parseptr, &parsestep, inptrans); if (yig == NULL) { sylend = !0; if ( *parseptr == ' ' || *parseptr == '\x0D' || *parseptr == '\0' || *parseptr == '\n' || *parseptr == '\t' ) { }else{ X_ERROR_OCCURRED = !0; yigprint("x___error", outtrans); /* fprintf(OUTTEXT, "Input not defined"); */ } parseptr++; }else{ if (*yig != '.' && *yig != 'x') { sylend = 0; }else{ pending_tsheg = 0; sylend = !0; if (strncmp(yig, ".___shad", MAX_KEY_VAL_LENGTH) == 0) { pending_shad_space = !0; } if (strncmp(yig, ".___tshegshad", MAX_KEY_VAL_LENGTH) == 0) { pending_shad_space = !0; } if (strncmp(yig, "x___transp_line", MAX_KEY_VAL_LENGTH) == 0) { while(!( *parseptr == '\0' || *parseptr == '\n')) { fprintf(OUTTEXT, "%c", *parseptr++); } } if (strncmp(yig, "x___transp_word", MAX_KEY_VAL_LENGTH) == 0) { while(!(*parseptr == ' ' || *parseptr == '\x0D' || *parseptr == '\0' || *parseptr == '\t' || *parseptr == '\n')) { fprintf(OUTTEXT, "%c", *parseptr++); } } if (strncmp(yig, "x___transp_char", MAX_KEY_VAL_LENGTH) == 0) { fprintf(OUTTEXT, "%c", *parseptr++); } } if ( *yig == 'v' || *yig == 'c' || *yig == 'p' || *yig == 'b') { pending_tsheg = !0; } syllable[sylindex++] = yig; if (parsestep > 0) { parseptr += parsestep; }else{ X_ERROR_OCCURRED = !0; yigprint("x___error", outtrans); fprintf(OUTTEXT, "sylr 2"); parseptr++; } } } while (!sylend); /*** trace("syllable read"); ***/ /******************************/ /* */ /* Syllable Analysis */ /* */ /******************************/ /* does syllable contain a prefix ? */ if (sylindex >= 3) /* at least 2 chars + 1 (inherent)vovel */ { if (*syllable[0] == 'p' /* potential prefix marked as such in ttd */ && *syllable[1] != 'v' /* vovel? -> pot. prefix is main letter */ /* is pot. prefix part of a ligature? */ && !(strncmp(syllable[0], "p___'a", MAX_KEY_VAL_LENGTH) != 0 /* 'a is NEVER upper part of a ligature */ /* ( I at least hope so ) */ && *syllable[1] == 'b' /* ya, ra, la, wa subjoined to pot. prefix */ && *syllable[2] == 'v' /* ligature complete -> no prefix examples: bya, gya .... negative examples: brgyad ... */ /* WARNING: gyin, gyas ... give wrong result (ambiguous transcript). Work around this using the dummy vovel (usually '-'). Examples: g-yin, g-yas, bu -ddha, ... */ ) ) { yigprint(syllable[0], outtrans); /* output prefix */ for (i = 0; i < sylindex; i++) { /* shift away prefix */ syllable[i] = syllable[i + 1]; } sylindex--; } } /* min length for prefix */ /******************************/ /* */ /* Ligature Generation */ /* */ /******************************/ i = 0; while (vovel_left(syllable, i, sylindex)) { if (*syllable[i] == 'v' && strncmp(syllable[i], "v___dummy", MAX_KEY_VAL_LENGTH) != 0) { strncpy(lig_key, "c___a", MAX_KEY_VAL_LENGTH); } else if(*syllable[i + 1] == 'v') { strncpy(lig_key, syllable[i++], MAX_KEY_VAL_LENGTH); } else { geni = i; strncpy(lig_key, "l__", MAX_KEY_VAL_LENGTH); while (i < sylindex && *syllable[i] != 'v') /* compose ligature key */ { charptr = syllable[i]; charptr += 3; /* cut off type */ /* example p___ba + b___ya => l___ba_ya */ strncat(lig_key, charptr, MAX_KEY_VAL_LENGTH); i++; } } strncpy(inner_vovel_key_buf, syllable[i++], MAX_KEY_VAL_LENGTH); if (i < sylindex) { if (*syllable[i] == 'v') { strncpy( outer_vovel_key_buf, syllable[i++], MAX_KEY_VAL_LENGTH); strncat( outer_vovel_key_buf, "{", MAX_KEY_VAL_LENGTH); yigprint(outer_vovel_key_buf, outtrans); }else{ *outer_vovel_key_buf = '\0'; } }else{ *outer_vovel_key_buf = '\0'; } strncat( inner_vovel_key_buf, "{", MAX_KEY_VAL_LENGTH); yigprint(inner_vovel_key_buf, outtrans); if (ttlookup(lig_key, outtrans)) { /* Predefined ligature */ yigprint(lig_key, outtrans); }else{ /* Automatic ligature generation */ yigprint("x___outer_box{", outtrans); while (*syllable[geni] != 'v') { yigprint("x___inner_box{", outtrans); /* Look for predefined two-letter ligatures within generic ligature */ charptr = NULL; if (*syllable[geni+1] != 'v') { strncpy(lig_key, "l__", MAX_KEY_VAL_LENGTH); charptr = syllable[geni]; charptr += 3; strncat(lig_key, charptr, MAX_KEY_VAL_LENGTH); charptr = syllable[geni+1]; charptr += 3; strncat(lig_key, charptr, MAX_KEY_VAL_LENGTH); charptr = ttlookup(lig_key, outtrans); } if (charptr) { yigprint(lig_key, outtrans); geni += 2; }else{ /* Single-letter generic ligature part */ yigprint(syllable[geni++], outtrans); } yigprint("x___inner_box}", outtrans); } yigprint("x___outer_box}", outtrans); } inner_vovel_key_buf[strlen(inner_vovel_key_buf) - 1] = '}'; yigprint(inner_vovel_key_buf, outtrans); if (*outer_vovel_key_buf != '\0') { outer_vovel_key_buf[strlen(outer_vovel_key_buf) - 1] = '}'; yigprint(outer_vovel_key_buf, outtrans); } } /* while vovel left */ while (i < sylindex) /* output suffix */ { yigprint(syllable[i], outtrans); i++; } if (pending_tsheg) { yigprint(".___tsheg", outtrans); pending_tsheg = 0; } if (pending_shad_space && (*parseptr == ' ' || *parseptr == '\t' || *parseptr == '\x0D' || *parseptr == '\0' || *parseptr == '\n')) { yigprint(".___shad_space", outtrans); } pending_shad_space = 0; /* MUST be after '}' */ }else{ /* not active */ // if (strncmp(parseptr, "%%", 2) == 0) // { // active = !0; fprintf(OUTTEXT, "%c", *parseptr++); } } /* while parseptr */ fprintf(OUTTEXT, "\n"); } /* while fgets */ yigprint("x___postamble", outtrans); fprintf(OUTTEXT, "\n"); if (X_ERROR_OCCURRED) { exit(1); }else{ exit(0); } }