Mining Text Data

NLP

深度学习

发布日期: 2020-09-11

更新日期: 2021-03-15

文章字数: 696

阅读时长: 3 分

阅读次数:

文本挖掘

1. 教学信息

崔万云
- cui.wanyun@sufe.edu.cn
- 答疑时间：周五：13:30-15:00,请事先邮件预约
- 办公室：信息管理学院306室
助教：闫森
- Kiiiiii1@163.com
- 请将课程作业等发至助教邮箱

运用Kaggle网站

2. 参考书目

自然语言处理
- Speech and Language Processing, Daniel Jurafsky
- Oxford Deep NLP 2017 course: https://github.com/oxford-cs-deepnlp-2017/lectures
深度学习
- Deep Learning, lan Goodfellow and Yoshua Bengio and Aaron Courville,
- https://www.deeplearningbook.org/
Pytorch
- 官方教程
- 中文文档
神经科学、哲学类
- Learning how to learn https://www.coursera.org/learn/ruhe-xuexi/home/welcome
- The book of why: the new science of cause and effect, Pearl
- 思考，快与慢丹尼尔·卡尼曼

3. Information Extraction<信息抽取>

Hi Dan,we’ve now scheduled the curriculum meeting. It will be in Gates 159 tomorrow from 10:00-11:30. Chris

Event
Time
Place

4. Information Extraction && Sentiment Analysis

先通过知识抽取，把一条淘宝的评论中的信息抽取出来，作为tag，比如服务、物流……再通过情感分析判断这句话是评价good or bad。

5. Google翻译

6. Language Technology

mostly solved （目前钓鱼邮件分辨很难处理）
- Spam detection 垃圾邮件监测
- Part-of-speech(POS) tagging 词性分析
- Named entity recognition (NER) 寻找出主谓宾
making good progress
- Sentiment analysis 情感分析
Word sense disambiguation
- Parsing
Machine translation(MT)
- Information extraction(IE)
- Question answering (Q & A) 单轮
still really hard
- Paraphrase 判断两句话是否是一个意思
- Summarization
- Dialog 多轮对话
- Coreference resolution 代词指代谁
  - Jim comforts Kevin because he is sympathetic/crying

7. Why else is natural language understanding difficult?

non-standard English
segmentation issues
idioms
neologisms
world knowledge (self-supervised learning)
tricky entity names

8. Sentence representation

Bag-of-words 词袋模型 {Jim,comforts,Kevin,……}={comforts,Kevin,Jim,……}无序
N-gram model
- 2-gram: Jim-comforts,comforts-Kevin,Kevin because
- 3-gram: Jim comforts Kevin,comforts-Kevin-because
Embedding 基于神经网络表示

9. 质疑与进展

SQuAD1.1 Leaderboard
Optimization： neural network + attention + self-supervision
神经网络 + 注意力机制（Bengio） + 自监督学习（Google）
2020年NLP技术的国内外前沿对比

10. Skills you’ll need

Simple linear algerbra(vectors,matrices)
Basic probability theory
Python programming
Neural networks
AND Pytorch!

11. Outline

Paty I -Neural Networks are our friends
Model = function + params
y=wx+b
w,b : params
y : output
x : input
Input - Fixed, comes from data
Parameters - Need to be estimated
yhat: true data

12. Loss/Cost Function are our friends

L(model) ->R 模型对训练数据的损失

等价于参数对训练数据的损失——因为参数是未知的，model=function+params

L(params)->R

输入一个model 得出一个损失，输入一组params输出一个损失。

13. 小样本学习

通常的机器学习任务：给定模型（人定），计算机求解模型

模型搜索任务：计算机求解合适的模型，及该模型的参数

neural architecture search

Into Deep Learning

Nonlinear Neural Models

Shen Hao

https://shenhao-blog.site/post/a2fb.html

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Shen Hao !

NLP

Information Security

信息安全导论-清华大学出版社

2020-09-15 运维

密码学

centos 7-nginx的安装

阿里云centOS7安装Nginx及简单配置

2020-09-03 运维

nginx centos

文本挖掘

1. 教学信息

2. 参考书目

3. Information Extraction<信息抽取>

4. Information Extraction && Sentiment Analysis

5. Google翻译

6. Language Technology

7. Why else is natural language understanding difficult?

8. Sentence representation

9. 质疑与进展

10. Skills you’ll need

11. Outline

12. Loss/Cost Function are our friends

13. 小样本学习

Into Deep Learning

Nonlinear Neural Models

你的赏识是我前进的动力