Joonwon Jang

I’m interested in NLP, especially in parametric knowledge of LLMs.

Especially understanding (Evaluation on benchmark, membership attacking), unlocking (In-context learning, SFT, alignment learning, reasoning), and expanding (Continual learning) their parametric knowledge to meet ultimate human needs.

I'm currently starting my research on real-world LLM coding ability evaluation and their RL training.

Email: joonwon.lainshower@gmail.com

Curriculum Vitae : JOONWON_CV

LINKEDIN : MY_LINKEDIN_URL

Huggingface : MY_HF

GitHub:

GitHubLainshower - Overview

Blogs: Personal & My Teams (Unknown NLP LAB)

Education

•

Master of Science in Graduate School of Artificial Intelligence, POSTECH. (2023.02 – 2025.02)

•

Bachelor's degree in Hotel &Tourism Management, Sejong University.(2017- 2023) (GPA: 4.37/4.5, Summa Cum Laude)

(International Student in School of Hotel and Tourism Managment HongKong Polytechnic University, 2018.09-2018.12)

Working Experience

•

LG AI Research @EXAONE LAB, LLM Research Intern (2025.03~)

•

ONOUT, LLM Research Engineer (Freelancer, 2024.07-2024.12)

Publications

[C]: Conference [J]: Journal [W]: Workshop [P]: Preprint

•

[C] Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study, EMNLP 2025 Main
DongGeon Lee*, Joonwon Jang*, Jihae Jeong, Hwanjo Yu.

•

[C] How Diversely Can Language Models Solve Problems? Exploring the Algorithmic Diversity of Model-Generated Code, EMNLP 2025 Findings
Seonghyeon Lee, HeeJae Chon, Joonwon Jang, Dongha Lee, Hwanjo Yu.

•

[P] EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes, Technical Report
LG AI Research.

•

[C] Verbosity-Aware Rationale Reduction: Effective Reduction of Redundant Rationale via Principled Criteria, ACL 2025 Findings
Joonwon Jang, Jaehee Kim, Wonbin Kweon, Hwanjo Yu.

•

[C] Eliciting Instruction-tuned Code Language Models' Capabilities to Utilize Auxiliary Function for Code Generation, EMNLP 2024 Findings
Seonghyeon Lee, Suyeon Kim, Joonwon Jang, Heejae Chon, Dongha Lee, Hwanjo Yu.

•

[W] EPR: An Expert Behavior-enhanced Paper Ranking Framework for the Automotive Industry, EMNLP 2024 Workshop
WooJoo Kim, Joonwon Jang, Jinyi Yu, Yunsu Jeon, Hwanjo Yu.

•

[C] Rectifying Demonstration Shortcut in In-Context Learning, NAACL 2024 Main
Joonwon Jang, Sanghwan Jang, Wonbin Kweon, Minjin Jeon, Hwanjo Yu.

•

[C] Hierarchical Graph Convolutional Network Approach for Detecting Low-Quality Documents, LREC-COLING 2024
Jaeyoung Lee, Joonwon Jang, Misuk Kim.

•

[C] Fixed Input Parameterization for Efficient Prompting, ACL 2023 Findings
Eunbi Choi, Yongrae Jo, Joel Jang, Joonwon Jang, Minjoon Seo.

•

[J] Eco-friendly platooning operation algorithm of the autonomous vehicles, JITS, 2023
Joonwon Jang, Sung Il Kwag, Young Dae Ko.

•

[C] Headline Token-based Discriminative Learning for Subheading Generation in News Article, EACL 2023 Findings
Joonwon Jang, Misuk Kim.

•

[J] Detecting incongruent news headlines with auxiliary textual information, ESWA, 2022
Joonwon Jang, Minju Kim, Yoonsik Cho, Misuk Kim.

Projects

•

EXAONE 4.0 (LG AI Research, 2025.03 – 2025.08)

Large Language Model, Vision Language Model

◦

Enhancing coding abilities: code/vision-language corpora for pre/post-training.

◦

Near SOTA on LiveCodeBench among open-source LLMs (strong at 32B).

results

◦

Spark-YARN cluster for large-scale preprocessing.

•

Continual Learning LLM toward Legal Domain (ONOUT, 2024.09 -2024.12)

Domain-Adaptive Pre-training

◦

Data crawling & preprocessing (Spark); DAPT with distributed training (FSDP/DeepSpeed). 

Pyspark - How to preprocess Large Scale Data with Python / How to Train LLM? - From Data Parallel To Fully Sharded Data Parallel / DeepSpeed - Sharding Optimizer, Gradients, Parameters, and Reducing Activations for Efficient Training

◦

Achieved ×2 KMMLU in-domain; preserved general knowledge after DAPT.

•

AI Ranking Model for Promising Technologies Selection (Hyundai, 2023.08 -2024.08)

NLP Application, Recommender System

◦

Data crawling & Post-trained LMs using citation networks (SPECTER framework).

◦

Achieved a 42.8% improvement over a feature-based baseline.

◦

[W] EPR: An Expert Behavior-enhanced Paper Ranking Framework for the Automotive Industry

•

Fashion Advertisement Generation via Quantized LLM (ONOUT, 2023.06 -2023.09)

Supervised Fine-Tuning, Quantization

◦

Fine-tuned Polyglot 5.8B/12.8B with QLoRA (quantization-aware SFT & instruction tuning)

◦

Designed augmentation prompts with the GPT API; combined human-labeled data with self-instruct.

•

Prompt Injection in Chatbot System (KAIST, AI  2022.07-2022.12) 

Long Context Handling & Chatbot 

◦

Parameterized long dialogue context/persona (MSC) into student models to reduce inference cost.

◦

[C] Fixed Input Parameterization for Efficient Prompting.

•

YoYak (Long Sequence Summarization Framework For Korean) (, 2021.09 -2021.12)

Long Context Handling & Summarization

◦

TAPT Longformer training with Pegasus Objective Function for KoBART Model

◦

Performance comparison with vanilla KoBART

•

Incongruent News Detection (KOCCA, 2020.09-2022.06) 

◦

Generating dataset for detecting incongruent news

◦

[J] Detecting incongruent news headlines with auxiliary textual information, ESWA, 2022

◦

[C] Hierarchical Graph Convolutional Network Approach for Detecting Low-Quality Documents,  LREC-COLING 2024

•

Optimization of autonomous vehicle platooning (2019.05-2020.12)

◦

[J] Eco-friendly platooning operation algorithm of the autonomous vehicles, JITS, 2023

Activities

•

Tobigs (2021.07-2022.07, 16th president)

◦

BigData & AI & ML/DL

•

Wemajor (2018.03-2019.12)

◦

Introducing Major to middle/high school students (volunteering)

Awards & Scholarships

•

2024 POSTECH Best Paper Awards (Excellence Award)

•

2024 Hyundai MOBIS AI (Infotainment) Industry–Academic Cooperation Contest, 3rd prize

•

Academic Excellence Scholarship (Spring 2017, Fall 2017, Spring 2018, Fall 2021, Spring 2022)

•

Coding Challenge in Sejong University – Python (Fall 2021, 4th prize)

•

KIISE Undergraduate Student Paper Award (2021, 3rd prize)

Patents

•

가짜 뉴스 탐지용 데이터셋 생성 장치 및 이의 실행 방법, 출원번호: 10-2022-0025684 (2022)

Paper Review and Additional Study

 Recent Personal Paper Review 

 Recent Team Paper Review (2023~)

 Past Personal NLP Paper Review (2021~2022)

Linear Algebra

Probability & Statistics

Data Structure

CS224N (2020.09-2020.12)

Skills

•

Programming languages – python, C, R

•

Data Mining - numpy, pandas, sklearn, MYSQL, SAS, SPARK

•

Machine Learning – tensorflow, keras, pytorch, pytorch-lightning, huggingface

•

Web Crawling – request, BeautifulSoup, FastAPI

•

Others – Git & Github, Flask, gephi, Docker

Related Course Work

•

Advanced Machine Learning (A+)

•

Linear Algebra and Programming (A+)

•

Introduction to Open Source (A+)

•

Data Problem&Solution and Practice (A+)

•

C Programming and Lab (A+)

•

Python Programming (A0)

•

Computer Structure (A+)

•

Database (A+) 

Research Experience

Undergraduate intern at LKLAB (KAIST AI), supervised by Minjoon Seo. (2022.07-2022.12)

Undergraduate intern at IMLL (Sejong University), supervised by Misuk Kim. (2020.08-2022.07)