I’m interested in NLP, especially in parametric knowledge of LLMs.
Especially understanding (Evaluation on benchmark, membership attacking), unlocking (In-context learning, SFT, alignment learning, reasoning), and expanding (Continual learning) their parametric knowledge to meet ultimate human needs.
I’m currently start my research on VLM’s jailbreaking and LLM coding abilities.
Check out my and my teammate’s PaperReview (e.g., Agent Survey, Recent Research Paper) and BlogPost here!
===================================== CV =====================================
================================== LINKEDIN ==================================
================================ HF (Open Source) ===============================
Education
•
Master of Science in Graduate School of Artificial Intelligence, POSTECH. (2023.02 – 2025.02)
•
Bachelor's degree in Hotel &Tourism Management, Sejong University.(2017- 2023) (GPA: 4.37/4.5, Summa Cum Laude)
(International Student in School of Hotel and Tourism Managment HongKong Polytechnic University, 2018.09-2018.12)
Working Experience
•
ONOUT, LLM Research Engineer (Freelancer, 2024.07-2024.12)
•
LG AI Research @EXAONE LAB, LLM Research Intern (2025.03~)
Publications
•
International
◦
DongGeon Lee*, Joonwon Jang*, Jihae Jeong, Hwanjo Yu. Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study (Arxiv)
◦
Joonwon Jang, Jaehee Kim, Wonbin Kweon, Hwanjo Yu. Verbosity-Aware Rationale Reduction: Effective Reduction of Redundant Rationale via Principled Criteria (ACL 2025 Findings)
◦
Seonghyeon Lee, HeeJae Chon, Joonwon Jang, Dongha Lee, Hwanjo Yu. How Diversely Can Language Models Solve Problems? Exploring the Algorithmic Diversity of Model-Generated Code (Arxiv)
◦
WooJoo Kim, Joonwon Jang, Jinyi Yu, Yunsu Jeon, and Hwanjo Yu. EPR: An Expert Behavior-enhanced Paper Ranking Framework for the Automotive Industry (EMNLP 2024 Workshop)
◦
Seonghyeon Lee, Suyeon Kim, Joonwon Jang, Heejae Chon, Dongha Lee, Hwanjo Yu. Eliciting Instruction-tuned Code Language Models' Capabilities to Utilize Auxiliary Function for Code Generation (EMNLP 2024 Findings)
◦
Joonwon Jang, Sanghwan Jang, Wonbin Kweon, Minjin Jeon, and Hwanjo Yu. Rectifying Demonstration Shortcut in In-Context Learning, 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics. (NAACL 2024 Main)
◦
Jaeyoung Lee, Joonwon Jang, and Misuk Kim. Hierarchical Graph Convolutional Network Approach for Detecting Low-Quality Documents, The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. (LREC-COLING 2024)
◦
Eunbi Choi, Yongrae Jo, Joel Jang, Joonwon Jang, and Minjoon Seo. Fixed Input Parameterization for Efficient Prompting, Findings of the Association for Computational Linguistics. (ACL 2023 Findings)
◦
Joonwon Jang, Sung Il Kwag, and Young Dae Ko. Eco-friendly platooning operation algorithm of the autonomous vehicles, Journal of Intelligent Transportation Systems: Technology, Planning, and Operations. (JITS, 2023)
◦
Joonwon Jang and Misuk Kim. Headline Token-based Discriminative Learning for Subheading Generation in News Article, The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL) findings. (EACL 2023 Findings)
◦
Joonwon Jang, Minju Kim, Yoonsik Cho, and Misuk Kim. Detecting incongruent news headlines with auxiliary textual information, Expert Systems with Applications. (ESWA, 2022) (IF 6.954)
•
Domestic
◦
장준원 & 김미숙 (2022). 추출 기반의 뉴스 부제목 생성 프레임워크
◦
장준원, 조하현, 이재영, & 김미숙. (2021). 제목과 본문이 다른 가짜뉴스 탐지를 위한 계층적 딥러닝 모델 개발 및 가짜 뉴스 데이터셋 구축. 한국정보과학회 학술발표논문집
Projects
•
Continual Learning Large Language Model toward Specific Domain | (2024.09 -2024.12, Onout)
Domain-Specific LLM (Pre-Training, Distributed Training)
◦
Data Crawling (General Corpus & Domain Corpus)
◦
Data Preprocessing (Deduplication & Preprocessing through SPARK :
Pyspark - How to preprocess Large Scale Data with Python )
◦
Continual Learning LLM with Distributed Training :
How to Train LLM? - From Data Parallel To Fully Sharded Data Parallel &
DeepSpeed - Sharding Optimizer, Gradients, Parameters, and Reducing Activations for Efficient Training )
◦
(Reserved) Token expansion using the domain-specific corpus
◦
Evaluation on Domain specific task (e.g., knowledge-probing, generation)
▪
Achieved 2x KMMLU in-domain score
▪
Generalized on unseen format
▪
Preserved General Knowledge Shift After DAPT (e.g., KMMLU, HARAE, …)
•
Constructing an AI Ranking Model for Promising Technologies Selection (2023.08 -2024.08, Hyundai)
◦
Crawling and Preprocessing Data with scopus api
◦
Post-Training LMs via Citation Networks (Using Specter Framework)
•
Fashion advertisement generation through quantized LLM | (2023.06 -2023.09, Onout)
Supervised Fine-Tuning & Instruction Tuning & Quantization
◦
Applying QLoRA on polyglot (5.8B, 12.8B)
◦
Design Prompt for Augmentation (gpt api) / Instruction Tuning
◦
Instruction tuning using human-labeled dataset / Self-instruct using ChatGPT-pro & GPT-4 api
•
Prompt Injection in chatbot system (2022.07-2022.12)
Long Context Handling & Chatbot
◦
Parameterize long prompt (ex. previous session dialogs or persona in MSC dataset) to non-prompt model (i.e., student model) for efficient inference. (Mentor: Eunbi Choi)
•
YoYak (Long Sequence Summarization Framework For Korean) (2021.09 -2021.12) |
Long Context Handling & Summarization
◦
Domain-agnostic TAPT (Task Adaptive Pre-Training) Longformer with Pegasus Objective Function for KoBART Model
◦
Performance comparison with vanilla KoBART
•
Incongruent News Detection (2020.09-2022.06, KOCCA)
◦
Generating dataset for detecting incongruent news
◦
Developing a method to detect incongruent news using auxiliary textual information |
◦
Developing Hierarchical Graph Convolutional Model for Incongruent Headline News Detection
•
Monetary Policy Board Announcement Analysis (2020.09-2020.10)
Financial Documents
◦
Monetary Policy Statements, Financial Currency Transcript Text EDA
◦
EDA with TF-IDF/Fasttext Embedding, LDA, and ML Models (RF, SVM)
•
Optimization of autonomous vehicle platooning (2019.05-2020.12)
◦
Optimize the operation of platooning of the electric autonomous vehicles with hybrid genetic algorithm
Activities
•
•
Wemajor (2018.03-2019.12)
◦
Introducing Major to middle/high school students (volunteering)
Awards & Scholarships
•
2024 POSTECH Best Paper Awards (Excellence Award)
•
2024 Hyundai MOBIS AI (Infotainment System) Industry-Academic Cooperation Contest 3rd prize
•
Academic Excellence Scholarship (Spring 2017, Fall 2017, Spring 2018, Fall 2021, Spring 2022)
•
Coding Challenge in Sejong University - python (Fall 2021, 4th prize)
•
Korean Institute of Information Scientists and Engineers Undergraduate Student Paper Award (2021, 3rd prize)
Patents
•
가짜 뉴스 탐지용 데이터셋 생성 장치 및 이의 실행 방법, 출원번호: 10-2022-0025684 (2022)
Paper Review and Additional Study
Linear Algebra
Probability & Statistics
Data Structure
CS224N (2020.09-2020.12)
Skills
•
Programming languages – python, C, R
•
Data Mining - numpy, pandas, sklearn, MYSQL, SAS, SPARK
•
Machine Learning – tensorflow, keras, pytorch, pytorch-lightning, huggingface
•
Web Crawling – request, BeautifulSoup, FastAPI
•
Others – Git & Github, Flask, gephi, Docker
Related Course Work
•
Advanced Machine Learning (A+)
•
Linear Algebra and Programming (A+)
•
Introduction to Open Source (A+)
•
Data Problem&Solution and Practice (A+)
•
C Programming and Lab (A+)
•
Python Programming (A0)
•
Computer Structure (A+)
•
Database (A+)