ISO INTERNATIONAL STANDARD 24614-2 First edition 2011-09-01 Language resource management - Word segmentation of written texts - Part 2: Word segmentation for Chinese, Japanese and Korean Gestion des ressources langagieres Segmentation des mots dans lestextesécrits- Partie 2: Segmentation des mots pour le chinois, le japonais et le coreen Reference number ISO 24614-2:2011(E) @ISO 2011 y IHS under ed without license from IHS Not for Resale ISO 24614-2:2011(E) COPYRIGHTPROTECTEDDOCUMENT @ ISO2011 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either isO at the address below or IsO's memberbody in the country of the requester. ISO copyright office Case postale 56. CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail [email protected] Web www.iso.org Published in Switzerland @ ISO 2011 - All rights reserved by IHS unde itted without license from IHS Not for Resale ISO 24614-2:2011(E) Contents Page Foreword. Introduction. 1 Scope 2 Normative references.. 3 Terms and definitions. 4 Overview.... 4.1 Introduction. 4.2 Markup convention... 4.3 Review of the concept of word segmentation unit ......... 4.4 Features common to Chinese, Japanese and Korean... 5 General rules for identifying WsUs in Chinese, Japanese and Korean ... 5.1 Words... 5.2 Derivationally formed words 5.3 Word compounds. 5.4 Phrasal compounds 5.5 Idioms.. 5.6 Fixedexpressions 5.7 Abbreviations... 10 5.8 Transliteratedloanwords.. 10 5.9 Strings of foreign or special characters .... 5.10 Components of a WsU.. 11 6 SpecificrulesforidentifyingWsUsinChinese 6.1 Lexical items followed by the suffix JL(r)... 12 6.2 Lexical items .... 6.2.1 Nouns. 12 6.2.2 Verbs.. 17 6.2.3 Adjectives.. 20 6.2.4 Pronouns 22 6.2.5 Numerals 23 6.2.6 Measure words 25 6.2.7 Adverbs..... 6.2.8 Prepositions 26 6.2.9 Conjunctions. 26 6.2.10 Auxiliary words. 26 6.2.11 Modal words.... 27 6.2.12 Exclamations 27 6.2.13 Imitative words 27 7 Specific rules for identifying WsUs in Japanese text . 27 7.1 Bunsetsus ..... 7.2 Lexical items 27 7.2.1 General rule. 27 7.2.2 Nouns.. 28 7.2.3 Verbs..... 7.2.4 Adjectives 33 7.2.5 Adnouns 34 7.2.6 7.2.7 Conjunctions 35 7.2.8 Exciamations . 35 Copyright International Organization for Standardizalon ghts reserved ili Not for Resale

.pdf文档 ISO 24614-2 2011 Language resource management — Word segmentation of written texts — Part 2 Word segmentation for Chinese, Japanese and Korean

文档预览
中文文档 50 页 50 下载 1000 浏览 0 评论 309 收藏 3.0分
温馨提示:本文档共50页,可预览 3 页,如浏览全部内容或当前文档出现乱码,可开通会员下载原始文档
ISO 24614-2 2011 Language resource management — Word segmentation of written texts — Part 2  Word segmentation for Chinese, Japanese and Korean 第 1 页 ISO 24614-2 2011 Language resource management — Word segmentation of written texts — Part 2  Word segmentation for Chinese, Japanese and Korean 第 2 页 ISO 24614-2 2011 Language resource management — Word segmentation of written texts — Part 2  Word segmentation for Chinese, Japanese and Korean 第 3 页
下载文档到电脑,方便使用
本文档由 人生无常 于 2024-08-31 13:19:53上传分享
站内资源均来自网友分享或网络收集整理,若无意中侵犯到您的权利,敬请联系我们微信(点击查看客服),我们将及时删除相关资源。