N-Gram数据库

By admin, 9 六月, 2017

这是Google扫描书本时生成的数据库:http://storage.googleapis.com/books/ngrams/books/datasetsv2.html

1-gram的数据库即单词的频率,例如:

circumvallate   1978   313    215   85
circumvallate   1979   183    147   77

The first line tells us that in 1978, the word "circumvallate" (which means "surround with a rampart or other fortification", in case you were wondering) occurred 313 times overall, on 215 distinct pages and in 85 distinct books from our sample.

关于N-Gram的介绍可参考:http://blog.sina.com.cn/s/blog_4b2ddd15010151th.html 

标签

评论

Restricted HTML

  • 允许的HTML标签:<a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id> <img src>
  • 自动断行和分段。
  • 网页和电子邮件地址自动转换为链接。
验证码
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
请输入"Drupal10"

最新评论