Skip to content

Commit b845dcc

Browse files
Blogs for indic NLP and a Sanskrit model
Added a section of blogs and tutorials for indic Langugaes and added an ALBERT model trained on Sanskrit
1 parent 552bed1 commit b845dcc

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -403,13 +403,19 @@ NLP as API with higher level functionality such as NER, Topic tagging and so on
403403
- [IIT Bombay NLP Resources](http://www.cfilt.iitb.ac.in/Sentiment_Analysis_Resources.html) Sentiwordnet, Movie and Tourism parallel labelled corpora, polarity labelled sense annotated corpus, Marathi polarity labelled corpus.
404404
- [TDIL-IC aggregates a lot of useful resources and provides access to otherwise gated datasets](https://tdil-dc.in/index.php?option=com_catalogue&task=viewTools&id=83&lang=en)
405405

406+
### Blogs and Tutorials
407+
408+
- [Training ALBERT on Sanskrit from scratch](https://parmarsuraj99.github.io/suraj-parmar/jupyter/nlp/huggingface/2020/05/02/SanskritALBERT.html)🤗
409+
410+
406411
### Language Models and Word Embeddings
407412

408413
- [Hindi2Vec](https://nirantk.com/hindi2vec/) and [nlp-for-hindi](https://github.com/goru001/nlp-for-hindi) ULMFIT style languge model
409414
- [IIT Patna Bilingual Word Embeddings Hi-En](https://www.iitp.ac.in/~ai-nlp-ml/resources.html)
410415
- [Fasttext word embeddings in a whole bunch of languages, trained on Common Crawl](https://fasttext.cc/docs/en/crawl-vectors.html)
411416
- [Hindi and Bengali Word2Vec](https://github.com/Kyubyong/wordvectors)
412417
- [Hindi and Urdu Elmo Model](https://github.com/HIT-SCIR/ELMoForManyLangs)
418+
- [Sanskrit Albert](https://huggingface.co/surajp/albert-base-sanskrit) Trained on Sanskrit Wikipedia and OSCAR corpus
413419

414420
### Libraries and Tooling
415421

0 commit comments

Comments
 (0)