Posts by Tags

Attention

GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers

Published:

(Ali Modarressi*, Mohsen Fayyaz*, Yadollah Yaghoobzadeh, Mohammad Taher Pilehvar)

NAACL 2022

  • We expand the scope of analysis from attention block in Transformers to the whole encoder.
  • Our method significantly improves over existing techniques for quantifying global token attributions.
  • We qualitatively demonstrate that the attributions obtained by our method are plausibly interpretable.

    read more read paper

BERToids

A Layer-wise Probing on BERToids’ Representations

Published:

(Mohsen Fayyaz*, Ehsan Aghazadeh*, Ali Modarressi, Hosein Mohebbi, Mohammad Taher Pilehvar)

EMNLP 2021 (BlackboxNLP)
In this work, we extend the probing studies to ELECTRA and XLNet, showing that variations in pre-training objectives can result in different behaviors in encoding linguistic information. We show that

  • Weight mixing results in edge probing does not lead to reliable conclusions in layer-wise cross model analysis studies and MDL probing is more informative in this setup.
  • XLNet accumulates linguistic knowledge in the earlier layers than BERT, whereas that of ELECTRA is in the final layers.
  • ELECTRA undergoes a slight change during fine-tuning, whereas XLNet experiences significant adjustments.

    read more read paper poster

Natural Language Processing

GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers

Published:

(Ali Modarressi*, Mohsen Fayyaz*, Yadollah Yaghoobzadeh, Mohammad Taher Pilehvar)

NAACL 2022

  • We expand the scope of analysis from attention block in Transformers to the whole encoder.
  • Our method significantly improves over existing techniques for quantifying global token attributions.
  • We qualitatively demonstrate that the attributions obtained by our method are plausibly interpretable.

    read more read paper

A Layer-wise Probing on BERToids’ Representations

Published:

(Mohsen Fayyaz*, Ehsan Aghazadeh*, Ali Modarressi, Hosein Mohebbi, Mohammad Taher Pilehvar)

EMNLP 2021 (BlackboxNLP)
In this work, we extend the probing studies to ELECTRA and XLNet, showing that variations in pre-training objectives can result in different behaviors in encoding linguistic information. We show that

  • Weight mixing results in edge probing does not lead to reliable conclusions in layer-wise cross model analysis studies and MDL probing is more informative in this setup.
  • XLNet accumulates linguistic knowledge in the earlier layers than BERT, whereas that of ELECTRA is in the final layers.
  • ELECTRA undergoes a slight change during fine-tuning, whereas XLNet experiences significant adjustments.

    read more read paper poster

Probing

A Layer-wise Probing on BERToids’ Representations

Published:

(Mohsen Fayyaz*, Ehsan Aghazadeh*, Ali Modarressi, Hosein Mohebbi, Mohammad Taher Pilehvar)

EMNLP 2021 (BlackboxNLP)
In this work, we extend the probing studies to ELECTRA and XLNet, showing that variations in pre-training objectives can result in different behaviors in encoding linguistic information. We show that

  • Weight mixing results in edge probing does not lead to reliable conclusions in layer-wise cross model analysis studies and MDL probing is more informative in this setup.
  • XLNet accumulates linguistic knowledge in the earlier layers than BERT, whereas that of ELECTRA is in the final layers.
  • ELECTRA undergoes a slight change during fine-tuning, whereas XLNet experiences significant adjustments.

    read more read paper poster

Transformers

GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers

Published:

(Ali Modarressi*, Mohsen Fayyaz*, Yadollah Yaghoobzadeh, Mohammad Taher Pilehvar)

NAACL 2022

  • We expand the scope of analysis from attention block in Transformers to the whole encoder.
  • Our method significantly improves over existing techniques for quantifying global token attributions.
  • We qualitatively demonstrate that the attributions obtained by our method are plausibly interpretable.

    read more read paper